Reporting to the Manager, Web Foundations, you will design, document, build, configure, verify, monitor, and support evolving system and cloud infrastructure for our Web applications, with a focus on uptime, application scalability and rapid code deployment.
Your responsibilities will include the following:
- Identify architecture / infrastructure problems and architect solutions involving both development and IT groups.
- Help the web development teams promote system stability, scalability, measurability and flexibility.
- Manage monitoring and alerting infrastructure, to resolve and prevent problems.
- Define and maintain production persistence infrastructure, including data backups/restores.
- Provision systems according to the evolving needs of the team, which today include, but are not limited to Tomcat, Apache httpd, nginX, MySQL, Hadoop, Hbase, Hornet-Q, Redis, MongoDB, Google App Engine, Amazon EC2.
- Manage and improve deployment of OS and applications.
- Work with a unified team of developers, systems administrators and DevOps to facilitate continuous deployment.
- Support code launches on production servers.
- Provide on-call support for Linux and cloud infrastructure.
You have the following qualifications:
- Degree in Computer Science, Computer Engineering or related field.
- Demonstrated troubleshooting and problem solving skills.
- Strong written and verbal communication skills with a focus on collaboration between teams.
- At least 6 years experience in Linux/Unix system administration or system operations engineering for web-based application platform.
- Strong scripting and automation skills using bash, sed, and awk. Familiarity with perl, python, ruby, or php would be an asset.
- Experience with scalable high availability 99.99% environments, including load balancers, forward-backward compatibility, and redundancy.
- Experience administering large-scale production environments with a large user base and high volume of transactions per second.
- Ability to measure production environment performance to identify and fix bottlenecks and to guide the developer teams in providing the information required to make this possible.
- Experience with monitoring and alerting tools such as Ganglia, Nagios, Cacti, or NimSoft.
- Experience operating on virtualization platforms either locally or in the cloud, e.g. using VMware, Xen, or EC2.
- Experience with RedHat.
- Familiarity with content delivery networks.
- Experience with DevOps methodologies and tools such as Puppet, Razor and Chef.