Lead Site Reliability Engineer, Senior - GFS 812531 Job
Microsoft - Redmond, WA

This job posting is no longer available on Microsoft. Find similar jobs: Lead Site Reliability Engineer jobs - Microsoft jobs

Job Category: Operations
Location: Redmond, WA, US
Job ID: 812531-101196
Division: Online Services Division

Global Foundation Services is the team behind the cloud. GFS is responsible for delivering over 200 Microsoft web portals, Live and Online Services around the world including infrastructure, security and compliance, operations, globalization, and manageability. Our focus is on smart growth, high efficiency, and delivering a trusted experience to customers and partners worldwide. We are looking for a passionate, high energy individual to help build the network that powers the world’s largest online services.

The new paradigm is massively distributed software, where a typical service runs on hundreds, thousands, sometimes tens of thousands of servers at the same time. Executing on this strategic shift to highly available, secure and high-performing online services requires a transformation of infrastructure management capabilities. We develop and maintain the tools and technology that efficiently and reliably plan, build, deploy, monitor and manage services, from bare metal to the application stack.

The Manageability Services Group (MSG) has it all: complex algorithmic tasks, opportunity to solve large-scale problems, potential to influence the entire industry, and the running of hundreds of thousands of servers. As a Senior Lead Site Reliability Engineer, you will drive our tactical and strategic architectural investments necessary to deliver highly scalable, available, and secure online services. The work is varied and has many tough problems.

To qualify for this exciting opportunity, you must possess strong communication, organizational, technical and documentation skills. You must function well in a fast-paced collaborative environment and be able to apply critical thinking and strong problem solving skills to complex production environment scenarios to ensure high availability. And finally, you must be service oriented and customer focused, driving for results over technique.

Roles & Responsibilities:
Drive Incident Management, Problem Management, Change Management, as part of a learning organization.
Drive complex Live Site issues through to resolution.
Translate requirements from custom code applications on service dependencies in packaged software services. Obtain deep SME-level knowledge and understanding of these dependencies.
Work with SE management, PM and Dev Leads and Managers to drive operability, stability, and resilience into the architecture and design of the product.
Mentor and coach team members, assist in overall technical development of the entire SE organization.
Be accountable for the overall service manageability and operability of several different systems, including by making suggestions for dependent systems to SE management and PM and Dev peers.

Skills & Qualifications:
Demonstrated ability in designing, implementing, and supporting multi-tiered services, particularly with a focus on operability.
Strong networking skills, including TCP/IP fundamentals, switching/routing concepts, troubleshooting techniques particularly with packet capture analysis, and configuration of Load Balancers, preferably F5 or NetScaler.
Strong understanding of network security concepts such as ACLs, firewalls, and encryption.
Understanding of server virtualization concepts.
Experience with the SDLC and its tools such as source control, quality engineering concepts, release engineering, and change control.
Strong scripting skills in at least one of the following languages: PowerShell, Perl, or Python
Experience with at least one of the following languages: C/C++, Java, or C#
Strong understanding of OS internals and troubleshooting techniques.
Understanding of ITIL or MOF.
Exhibit a strong understanding of software deployment strategies and configuration management.
Understanding of network performance monitoring, such as Smarts or Cricket.
Strong communication and collaboration skills to work with people from a variety of technical backgrounds.
Experience as part of a 24x7 on-call escalation path.

Experience Required:
10+ years in a variety of roles, primarily in software development and operations, 15+ years preferred.
2+ years supporting a highly scalable, highly available online service.
2+ years demonstrated experience effectively managing people and developing teams.
CS or Math degree preferred.

Microsoft is an Equal Opportunity Employer (EOE) and strongly supports diversity in the work place.



Microsoft - 23 months ago - save job
About this company
968 reviews
Microsoft Corporation develops, manufactures, licenses and supports a range of software products for computing devices. The Company's...