Site Reliability Engineer/Architects (All Levels)
Salesforce - San Francisco, CA

This job posting is no longer available on Salesforce. Find similar jobs: Site Reliability Engineer Architect jobs - Salesforce jobs

Founded in 1999, salesforce.com is the enterprise cloud computing company that is leading customers in their transformation to become social enterprises . Social enterprises are able to connect with customers, partners and employees in entirely new ways. Based on salesforce.com's real-time, multitenant architecture, the company's platform and application services give customers the tools to create a true social front office and revolutionize the way they sell, service, market, collaborate, work, and innovate. With more than 9,000 employees, the first enterprise cloud computing company to exceed $2.5B in annual revenue run rate, and more than 100,000 customers worldwide, we are proud to contribute to the success of companies of all sizes and industries, around the globe. We're also one of the "Best Places to Work" (FORTUNE). If you're passionate about innovation, come help revolutionize how companies collaborate and communicate with customers.

Site Reliability Engineer (All Levels, including Architect)

Salesforce.com is looking to hire an experienced technologist to join the Site Reliability team. If you enjoy an entrepreneurial role where you can develop solutions, solve complex computing problems and drive teams to succeed, then this is the opportunity you have been waiting for! Why wait any longer? Do it.

This is a cutting edge opportunity with a team focused on continuous availability, destructive testing, incident management, and always-on architecture for the greatest Enterprise Cloud Computing company in the world!

Things break, we know this. The primary responsibility of the Site Reliability team is to identify, lead and develop designs that will heal automatically and at very large scale.

Responsibilities:
Continuous service availability of massive-scale cloud platforms

Shepherd remediations to long running or complex incidents, in partnership with R&D and Operations teams, bridging the two disciplines, and providing continuity and direction to the process.

Drive cross functional collaboration with the development organization to build survivability, fast failover and ease of operations into our product lines.

Evangelize solutions, emerging technologies and our mission across the Senior Executive Management strata, the Technology organization as well as Sales, Support and Marketing organizations. Drive global adoption of best practices, tools and processes across the entire Technology organization to support continuous availability.

Market our wares and provide us the street credibility to obtain more freedom to do cooler things around continuous availability.

Develop and participate in destructive testing scenarios; drive the development of solutions that test failures in production before they happen. Identify, design and execute these tests

Tools - define and guide the tools that will provide continuous service availability

Self-Healing Systems - Identify and write automation that fixes things before they become service impacting and tell us when they fail.

Collaborate - Work with the development, infrastructure engineering and architecture teams to drive site reliability requirements, sustainable operations and alert notification into every product deployed.

Requirements:
BS in Computer Science or related degree or equivalent industry experience

A minimum of 5 to15 years of experience in a large scale, high-transaction OLTP internet service engineering, development or architecture role

Expertise in TCP/IP in enterprise networking environment

Expertise in CLI enterprise support of Unix variants (Linux/Solaris/BSD)

Significant Java/Perl/Python or C++ development experience

Expert Level Knowledge in multiple of the following areas: High Availability architecture; high-end SAN storage (HDS, EMC) solutions; large scale NAS filers (HNAS, NetApp), and how to make them perform; Linux kernel performance tuning; Block-based, file system-based, and log based replication schemes; modern distributed data storage technologies: Hbase/Hadoop HDFS; Enterprise RHEL/Debian/BSD Linux systems management; J2EE application development (understanding J2EE application configuration and ability to read and interpret source code)

Candidate must take and pass a Moderate Public Trust background investigation or have taken and passed a Moderate Public Trust background investigation or higher within the last 2 years.

Desired:
MS or PhD in Computer Science, Mathematics or similar

Chef/Puppet enterprise design/deployment

Large scale automation architecture/development

Product Owner/Scrum Master ADM

Social Enterprise Platform design/system support

Salesforce.com application development

Salesforce - 21 months ago - save job
About this company
81 reviews