We are looking for a Lead Operations Engineer with experience managing cloud-hosted infrastructure. You will work closely with our CTO, QA, and engineering teams to maintain and deploy cloud server infrastructure (using chef and AWS) and ensure performance, uptime, and scalability. This position has an expectation of on-call time for unexpected off-hours outages. You can be based in either of our locations (San Francisco, CA or Portland, OR).
Responsibilities:
Continually refining processes for code deploys, server maintenance, etc.
Growing the ops team in-house.
Putting in place NOC support.
Managing team including on call rotation
Automate failover and scaling infrastructure
Maintain and (where possible) improve supportability of service applications
Track and ensure known application issues reach resolution with QA and engineering
Develop tools or implement initiatives to improve uptime, maintainability, and user-facing experience
Maintain server and deployment documentation
Required Skills:
Linux System Administration (Centos / RHEL or Debian / Ubuntu systems)
nginx, apache or lighttpd
Chef, Puppet, or other configuration management
AWS
Desired:
Python, Perl, Ruby, or other high-level scripting language