Director, Site Reliability Engineering
Zynga - Mountain View, CA

This job posting is no longer available on Zynga. Find similar jobs: Director jobs - Zynga jobs

The Director of Site Reliability at Zynga reports to the VP of GFS Infrastructure and is responsible for 24x7x365 operational support of Zynga's large production infrastructure, focusing on monitoring, rapid response, and customer service for our Studio and Shared Service Engineering teams. In this role, you will be managing junior and mid-level Systems Administrators and Engineers with a constant eye on areas for process improvement, automation and better visibility into production health and reliability. You will be working with teams in the US and India.
Responsibilities

Manage staff for 24x7 organization responsible for proactive infrastructure monitoring and Tier 1 and 2 incident response and escalation

Work closely with partner teams and internal customers to develop, report, refine, improve and enforce SLAs over time.

Define requirements for process automation and manage the selection and implementation of tools and applications to support NOC processes

Measure and report on the responsiveness, effectiveness, and efficiency of the NOC team

Provide procedural training to NOC staff

Mentor and train junior administrators and engineers into positions of more responsibility within Zynga

Provide hands-on leadership during service impacting events

Coordinate system maintenance and changes, while minimizing customer impact and maximizing the productivity of company resources

Oversee Root Cause Analysis and Corrective Actions necessary to improve reliability and supportability

Ensure support procedures are current and well documented.

Requirements

7+ years managing teams and managers in a technical operations support role

Experience with large scale, geographically disperse production environments

The candidate should have a firm grasp on monitoring solutions, support incident automation applications, and escalation processes

Ability to support multiple concurrent projects

Is customer focused and inspires customer service within the NOC team

Experience with production LAMP environments

Experience in ITIL Best Practices including problem, incident and change management

Experience with network and server diagnostic, monitoring tools

Strong verbal and written communication skills

Strong understanding of common networking concepts

Outstanding leadership skills demonstrated by accomplishment

Excellent communication, interpersonal, and organizational skills

Bachelor's degree in computer science, information systems or equivalent. Masters degree is a plus

Zynga - 19 months ago - save job
About this company
26 reviews