Site Reliability Engineer
ShareThis - Palo Alto, CA

This job posting is no longer available on ShareThis. Find similar jobs: Site Reliability Engineer jobs - ShareThis jobs

ShareThis is currently searching for a seasoned site reliability
engineer to join our technology operations team. Since inception,
ShareThis exclusively operates in the Amazon EC2 cloud. With explosive
growth, the company is one of the largest deployments of EC2 and uses
the latest open-source technologies to manage vast amounts of data
while servicing web traffic loads of 10+ billion hits monthly with
high availability and performance.

We are seeking a hardcore technical site reliability engineer who is
capable of operating at and above this scale to support our continued
business growth. As a company that thrives on innovation, is focused
on customer value, and delivery using agile methods, the role demands
an individual who is passionate about Big Data/NOSQL, Hadoop, systems
engineering, fault-tolerant computing, and operating discipline based
on data and metrics.

Technologies We Use

Amazon Linux, Chef
Perl, Python, Ruby

Cassandra, Hadoop
Nagios, Graphite
Nginx,
PHP
Tomcat, Java

Responsibilities

Manage our compute and storage infrastructure on AWS. Ensure
high availability with adequate monitoring and instrumentation.

Maintain and extend operational processes to insure high
availability and service of our entire technology stack – front-end
web traffic and back-end big-data infrastructure (EC2, S3,
Cassandra, Hadoop, Hive)
Implement best practices to manage
utilization, optimization, and monitoring of our cloud
services.
Provide front-line data processing services based
on incoming web traffic data. This function is the most critical
element of the company’s assets since it forms the basis for all
downstream insights from our data.
Participate in oncall
rotation to handle production issues.

Qualifications

The ideal candidate is talented across a variety of disciplines
including: System Administration, Network Operations, Software
Development, Build and Release Engineering, Performance Engineering
and Site Operations. Must be willing to absorb new technologies
quickly and become expert in it. Prior experience in building,
monitoring and maintaining high-volume, low-latency systems.

ShareThis - 16 months ago - save job