Site Reliability Engineer
Palantir - Los Angeles, CA

This job posting is no longer available on Palantir. Find similar jobs:Site Reliability Engineer jobs - Palantir jobs

Palantir is deployed in many countries and datacenters around the world and is used 24/7 in many of these environments. Uptime is critical to ensuring our customers can accomplish their mission when the need arises. Every week, new customers are deploying our solutions against important objectives at a phenomenal rate. Your goal will be to monitor, maintain and pre-empt problems that would otherwise sidetrack the valuable work our users are performing. You will do this by combining your engineering prowess with a strict sense of priority and discipline in order to ensure that fielded systems are always available. Additionally, you will develop systems and solutions that can proactively ensure that problems on the horizon never affect our users' experience. Finally, you will work closely with our product development and systems engineering teams to advance the way our hardware, software and network solutions are deployed in order to minimize failure rates and increase overall system reliability.

  • Monitor the health, uptime and system alerts on production Palantir systems around the world
  • Apply OS, application and hardware updates and upgrades to production systems in Palantir and customer datacenters
  • Secure and harden operating systems and software dependencies as per customer specifications and general industry best practices
  • Develop and maintain scripts to automate tasks and deployments across clusters in different geographic locations
  • Help specify and develop product enhancements and features that will increase uptime and decrease overall maintenance cost
  • Utilize testing environments to ensure that all changes will not break existing functionality across our deployments
  • 2-3+ years of experience with Linux system administration (CentOS or RHEL preferred) w/ strong knowledge of UNIX
  • Experience administering and managing collocated servers including hardware troubleshooting
  • Willing to participate in a 24x7 on-call rotation
  • Experience configuring systems for use with one of the major database packages
  • Strong documentation and communication skills
  • Ability to work independently with minimal supervision
  • Ability to travel to co-lo and customer sites as needed (up to 25% of time)
  • Moderate networking experience
  • A strong desire to interact with and support Palantir customers
  • Ability to obtain a TS/SCI clearance
  • BS/MS in Computer Science
  • Experience with systems management tools such as puppet or cfengine
  • Experience with VMware-based server virtualization for staging and DR
  • Experience maintaining Oracle installations, 11GR2 preferred
  • Experience with NIST, NISPOM, DoD, DCID, ICD certification and accreditation processes and tools