- Analyze disparate data sources to design, implement and develop ETL that provide structured and timely access to large datasets.
- Build fault tolerant, self-healing, adaptive and highly accurate ETL transformations.
- Developing data cleansing functions, system management functions including load automation, data acquisition functions and others.
- Respond quickly to issues which emerge in Production ETL processes
Excellent problem solving and analytical skills
- Must be a self-starter and be able to work on own initiative with minimal supervision in very fast paced environment.
- Bachelor’s degree in Computer Science or relevant fields
- 2 to 3 years of experience using Pentaho Kettle/PDI
- Experience in working with Amazon S3, EMR, Java MapReduce, Hive or Pig programming language is highly desirable
- Experience in extracting data from disparate and heterogeneous data sources like Oracle, Flat files, XML and weblogs into target ODS and Data warehouse
- ETL job orchestration, batch load process automation
The New York Times Company - 21 months ago