Boku is the world leader in online payments, enabling consumers to buy goods and services with nothing more than their wireless phone number. We reach over 3 billion consumers all over the world. We're looking for a Site Reliability Engineer to join our close knit and highly skilled IT Operations team and help us scale and evolve our platform to the demand of our ever growing business and customer needs.
Reporting to the IT Operations Team Lead, this is a hands-on role working with an agile, highly skilled group of software engineers, systems administrators, QA and release engineers.
Collaborate with various teams and provide a high level of service and support on our development and high-volume production systems.
Design, install, configure, monitor and optimize enterprise computer systems and networks.
Identify and improve information security implementations as it relates to the network, systems, devices, and access for regulatory and compliance requirements.
Participate in software and system performance analysis and tuning, service capacity planning and demand forecasting
Create comprehensive documentation to a level by which processes are reproducible by others.
Stay abreast of the latest trends in technology and in IT processes
Maintain a positive, can-do attitude in fast moving highly motivated startup.
Participate in on-call rotation to support our production systems.
Bachelor’s degree in Computer Science, Information Systems, Engineering, or other related disciplines and 10+ years of experience in IT or related field with at least 5+ years of RHEL, CentOS, Ubuntu or Debian Linux experience. Additional training, technical certification, and/or years’ experience may be substituted in lieu of a degree.
Demonstrated technical capabilities are a must, as are people skills. This position requires an ability to work collaboratively and effectively with a range of stakeholders.
An in-depth knowledge of Linux fundamentals, including networking, file systems, security, and the kernel.
Strong experience in application design, design patterns and performance tuning.
Excellent technical troubleshooting skills
Ability to provide high quality technical assistance to both internal and external clients
Strong sense of ownership, ability to work in dynamic environment with minimal supervision.
Excellent decision maker who approaches problems pragmatically
Ability to carry out monitoring and performance metrics analysis using common monitoring tools. eg. Zabbix, Cacti, Icinga.
Excellent skills in scripting and automation in bash, Perl, Ruby and/or Python.
Experience being on-call in a 24x7 production environment.
Meticulous attention to detail and strong organization skills.
Excellent communication skills.
Experience working in a team utilizing agile methodologies and tool sets.
US work authorization required.
Experience with large systems deployments using Puppet or other automated tools eg. Chef.
Experience with KVM, Xen and cloud service virtualization. eg. AWS, S3, EC2, etc.
Experience with storage technologies & arrays e.g. NFS, SANs, DAS etc.
Experience with Atlassian products (JIRA, Confluence, Bamboo).
Experience working in SAS-70 or PCI environments
Squid, Pound, HAProxy, Keepalived, SVN, Git, Tomcat, Apache, MySQL, Jetty, EnginX, MRTG, cacti, ganglia, Kickstart, Puppet, Splunk, Syslog-NG, Zabbix, LDAP, Mail/Postfix, DNS, DHCP, KVM/Xen/VMWare, VPN, Snort, SSH, etc.