Systems Reliability Engineer
Grit Matters - San Francisco, CA

This job posting is no longer available on Grit Matters. Find similar jobs: System Reliability Engineer jobs - Grit Matters jobs


To ensure that our public facing, cloud-based services are reliable, scalable and secure in the face of rapid growth and change.


What we'd like you to do in the first few months

Develop and implement a strategy to ensure our monitoring/alerting systems are complete and escalate reliably

Improve our reliability story by ensuring deployment works across multiple cloud providers

Analyze our fleet performance on different providers and ensure we're being cost effective


How we expect you to operate to achieve the above outcomes

Automation - We automate anything and everything to ensure predictable, fast deployments.

Consistency - We're moving fast, but it's a marathon, not a sprint. We need people who know how to pace themselves.

Performance - We need someone who can reason about and optimize performance in a highly distributed, cloud-based environment with hundreds of instances and 10s of billions of objects.

Communication - We are a team distributed across the US. The ability to effectively communicate asynchronously in a variety of mediums (Email, IM, Group Chat, Skype/Hangouts) is critical.

Responsive - We provide infrastructure for other apps. We are looking for someone who is responsive when things go south (with the rest of the eng. team!)

Platform diversity - We work across multiple cloud providers. Being comfortable working across these platforms is essential.

Technology and Concepts you'll be working with

AWS, EC2, S3, Joyent Cloud, Ubuntu, Solaris, SmartOS, node.js, JSON, Ruby, Chef, Python, Fabric, OAuth, HTTP/S, MySQL, Redis