Systems Reliability Engineer
Grit Matters - San Francisco, CA
Mission

To ensure that our public facing, cloud-based services are reliable, scalable and secure in the face of rapid growth and change.

Outcomes

What we'd like you to do in the first few months

Develop and implement a strategy to ensure our monitoring/alerting systems are complete and escalate reliably

Improve our reliability story by ensuring deployment works across multiple cloud providers

Analyze our fleet performance on different providers and ensure we're being cost effective

Competencies

How we expect you to operate to achieve the above outcomes

Automation - We automate anything and everything to ensure predictable, fast deployments.

Consistency - We're moving fast, but it's a marathon, not a sprint. We need people who know how to pace themselves.

Performance - We need someone who can reason about and optimize performance in a highly distributed, cloud-based environment with hundreds of instances and 10s of billions of objects.

Communication - We are a team distributed across the US. The ability to effectively communicate asynchronously in a variety of mediums (Email, IM, Group Chat, Skype/Hangouts) is critical.

Responsive - We provide infrastructure for other apps. We are looking for someone who is responsive when things go south (with the rest of the eng. team!)

Platform diversity - We work across multiple cloud providers. Being comfortable working across these platforms is essential.

Technology and Concepts you'll be working with

AWS, EC2, S3, Joyent Cloud, Ubuntu, Solaris, SmartOS, node.js, JSON, Ruby, Chef, Python, Fabric, OAuth, HTTP/S, MySQL, Redis

Grit Matters - 17 months ago - save job - block
Recommended Jobs
DevOps Lead
Bluenose Analytics - San Francisco, CA
Bluenose Analytics - 1 day ago

Corporate Systems Administrator
SunRun - San Francisco, CA
SunRun - 10 hours ago

Principal Drupal Developer
Pac-12 Networks - San Francisco, CA
Pac-12 Networks - 8 days ago