The Director of Site Reliability at Zynga reports to the VP of GFS Infrastructure and is responsible for 24x7x365 operational support of Zynga's large production infrastructure, focusing on monitoring, rapid response, and customer service for our Studio and Shared Service Engineering teams. In this role, you will be managing junior and mid-level Systems Administrators and Engineers with a constant eye on areas for process improvement, automation and better visibility into production health and reliability. You will be working with teams in the US and India.
Manage staff for 24x7 organization responsible for proactive infrastructure monitoring and Tier 1 and 2 incident response and escalation
Work closely with partner teams and internal customers to develop, report, refine, improve and enforce SLAs over time.
Define requirements for process automation and manage the selection and implementation of tools and applications to support NOC processes
Measure and report on the responsiveness, effectiveness, and efficiency of the NOC team
Provide procedural training to NOC staff
Mentor and train junior administrators and engineers into positions of more responsibility within Zynga
Provide hands-on leadership during service impacting events
Coordinate system maintenance and changes, while minimizing customer impact and maximizing the productivity of company resources
Oversee Root Cause Analysis and Corrective Actions necessary to improve reliability and supportability
Ensure support procedures are current and well documented.
7+ years managing teams and managers in a technical operations support role
Experience with large scale, geographically disperse production environments
The candidate should have a firm grasp on monitoring solutions, support incident automation applications, and escalation processes
Ability to support multiple concurrent projects
Is customer focused and inspires customer service within the NOC team
Experience with production LAMP environments
Experience in ITIL Best Practices including problem, incident and change management
Experience with network and server diagnostic, monitoring tools
Strong verbal and written communication skills
Strong understanding of common networking concepts
Outstanding leadership skills demonstrated by accomplishment
Excellent communication, interpersonal, and organizational skills
Bachelor's degree in computer science, information systems or equivalent. Masters degree is a plus