Data Science is about trying to answer a business question/problem by applying scientific and mathematical techniques to data that is acquired from multiple disparate sources and communicating the finding to the business.
The big data scientist needs to be able to program, preferably in different programming languages such as Python, R, Java, Ruby, Clojure, Matlab, Pig or SQL. They need to have an understanding of Hadoop, Hive and/or MapReduce.
To be successful, the candidate must have good working knowledge in one or several related disciplines such as:
Data Mining, etc.
Natural Language Processing: the interactions between computers and humans
Machine learning: using computers to improve as well as develop algorithms
Conceptual modeling: to be able to share and articulate modeling
Statistical analysis: to understand and work around possible limitations in models
Predictive modeling: most of the big data problems are towards being able to predict future outcomes
Hypothesis testing: being able to develop hypothesis and test them with careful experiments
The candidate must also be proficient with several commonly used data science techniques. One such technique is data processing that involves understanding how to integrate multiple systems and data sets. They need to be able to link and mash up distinctive data sets to discover new insights. This often requires connecting different types of data sets in different forms as well as being able to work with potentially incomplete data sources and cleaning data sets to be able to use them.
Strong written and verbal communication skills
Being able to work in a fast-paced multidisciplinary environment as in a competitive landscape new data keeps flowing in rapidly and the world is constantly changing
Having the ability to query databases and perform statistical analysis
Being able to develop or program databases
Being able to advise senior management in clear language about the implications of their work for the organization
Having at least a basic understanding of how a business and strategy works
Being able to create examples, prototypes, demonstrations to help management better understand the work
Having a good understanding of design and architecture principles
Being able to work autonomously
In short, the big data scientist needs to have an understanding of almost everything. Depending on the industry the big data scientist wants to work, they will need to specialize even further as for example a marine big data specialist requires a different set of skills than a historical big data scientist.
About MapR Technologies:
MapR Technologies provides a leading distribution for Hadoop. MapR’s Hadoop stack combines optimized versions of open source technologies including Apache HBase, Hive, Pig, and MapReduce with a powerful data management layer and enterprise reliability features to power companies’ Big Data analytical needs. Founded in 2009, MapR is well funded by leading VCs, already has substantial revenue and carries an enviable customer base that includes leading Fortune 100 companies as well as top technology companies.
MapR is an equal opportunity employer.
MapR Technologies, Inc. - 7 months ago