Site Reliability Engineer
Apsalar Inc - San Francisco, CA
Site Reliability Engineer
        Apsalar is looking for a dynamic engineer to join our rapidly growing, top tier operations team. Apsalar processes billions of events and is the leading mobile engagement management company. As our SRE, you will report to our Director of Operations and be responsible for operating an extremely high performance and scalable platform built on cutting edge technologies. Some of the technologies we employ include OpenIndiana, ZFS, RAM based PCIe hard drives backed by battery and flash, SSDs for secondary cache, SAS, Zones, all in a private cloud environment. Come and check us out. You’ll be impressed with the size and scope of the challenges and how smoothly we operate.

        Responsibilities:
        • Operate both front and back-end systems consisting of nginx, Postgres, HA Proxy, and in-house software all running on OpenIndiana
        • Participate in on-call rotation (on-call, not scheduled torture)
        • Develop, maintain, and expand the following architectures
        • Zabbix based monitoring system
        • Graphite based systems analytics
        • Chef based configuration management system
        • Develop tools as needed to streamline operations
        • Support other Apsalar teams as required
        • Document all of the above as needed to build institutional knowledge
        Desired Skills and Experience:
        • Intermediate knowledge of OpenIndiana/OpenSolaris/Solaris or intermediate knowledge of at least two other Unix (like) operating systems such as FreeBSD and Linux
        • Working knowledge of at least two high level languages such as, but not limited to, PERL, Ruby, Python, and BASH
        • Experience with graphing tools such as RRD, Graphite, or Cacti
        • Knowledge of Apache, Nginx, HTTP, HTTPS, OpenSSL, SSH, x86 Hardware, basic storage concepts, and COW based file systems
        • Working knowledge of a well known configuration management system such as CfEngine, Chef, or Puppet
        • Experience operating network gear from a major vendor such as Juniper, Arista, Force10, or Cisco
        • Basic system troubleshooting skills and methodology for both hardware and software
        Top candidates will have one or more of the following:
        • Practical knowledge of ZFS, DTRACE, Zabbix, Graphite, Chef, and OpenIndiana or Solaris
        • Strong understanding storage fundamentals
        • Experience with Graphite
        • Detail oriented
        • Experience with build automation systems such as JumpStart, Solaris AI, or Kickstart in combination with a configuration management engine
        About yourself:
        • Minimum two years of related work experience
        • Open-minded and quick to learn new skills
        • Self starter and results oriented
        opsjobs@apsalar.com

        Apsalar Inc - 15 months ago - save job - block
        Recommended Jobs
        Lead Site Reliability Engineer
        Salesforce - San Francisco, CA
        Salesforce - 2 days ago

        Site Reliability Engineer- Zookeeper
        Twitter - San Francisco, CA
        Twitter - 14 days ago

        Sr. Site Reliability Engineer, Data Operation...
        TubeMogul, Inc. - Emeryville, CA
        TubeMogul, Inc. - 8 days ago