In this role you will be responsible for our production, staging, development and lab environments, including backup, high-availability setup, system monitoring and alerts, patch levels, intrusion detection, etc... You will also be an integral part in scaling out our platform as we grow. You will also be responsible for maintaining documentation on operations processes.
OneLogin is an extremely fast-paced environment with high demands on our engineers. A successful candidate will have had experience in such an environment and has high standards set for their own delivery of quality in everything that they do.
Understand and take responsibilities for all operational workflows and standard operating procedures, down to a granular, detailed level.
Measurement, optimization, and tuning of system performance and ensuring that systems will run reliably and are highly available in a 24/7 production environment.
Set up and maintain physical and virtual machines in our production and staging environments, utilizing KVM/QEMU virtualization technologies and tools such as libvirt.
Set up and maintain physical and virtual machines in our lab environment, including Linux servers, Windows servers, Active Directory, and various third-party appliances, such as Cisco ASA.
Set up and maintain third-party software in our lab environment, including various authentication solutions from RSA, VASCO, SafeNet, etc...
Learn new third-party vendor software, hardware, and other solutions quickly and integrate them within our lab, application, and other deliverables.
Create and maintain operational documentation pertaining to deployment, management and administrative processes, including data recovery requirements for operating systems, application configuration, and data, then executing and monitoring back-up schedules.
Participate in 24/7 on-call rotation policy by responding to system and emergency problems.
Maintain high standards for consumer and customer service touch-points affected by operations.
Proactively identify, manage and mitigate risks.
Identify and escalate issues or root causes of systemic issues; lead, facilitate or participate in prompt resolution.
Perform miscellaneous job-related duties as assigned, including work off-hours on occasion to maximize production uptime.
The ideal candidate has 3-5 years experience with managing SaaS applications infrastructures.
Mastery of Linux including configuration, networking, hardening, shells, package management and basic scripting. Experience specifically with Ubuntu Linux a plus.
7+ years experience with Linux system administration
Expert Linux and network troubleshooting skills
Experience with the Rails stack and associated deployment technologies (unicorn, thin, nginx, Mongrel, etc...)
Experience with KVM/QEMU virtualization
Experience with PostgreSQL (replication, backups, tuning)
Experience with nginx
Experience with application monitoring tools, such as Nagios
Experience with deployment tools
Strong scripting skills (bash, Ruby, Perl)
Experience with Ruby and Capistrano a plus
Experience with libvirt
Experience with RabbitMQ an advantage
Experience with AWS a big plus
Experience with puppet / Chef
Experience with various AAA technologies and protocols, such as RADIUS
Experience with Dell DRAC
OneLogin, Inc. - 16 months ago