A big data scientist understands how to integrate multiple systems and data sets. They need to be able to link and mash up distinctive data sets to discover new insights. This often requires connecting different types of data sets in different forms as well as being able to work with potentially incomplete data sources and cleaning data sets to be able to use them.
The big data scientist needs to be able to program, preferably in different programming languages such as Python, R, Java, Ruby, Clojure, Matlab, Pig or SQL. They need to have an understanding of Hadoop, Hive and/or MapReduce.
In addition they need to be familiar with disciplines such as:
Natural Language Processing: the interactions between computers and humans
Machine learning: using computers to improve as well as develop algorithms
Conceptual modeling: to be able to share and articulate modeling
Statistical analysis: to understand and work around possible limitations in models
Predictive modeling: most of the big data problems are towards being able to predict future outcomes
Hypothesis testing: being able to develop hypothesis and test them with careful experiments
The exact background of a big data scientist is of less importance. Great big data scientists can have different backgrounds such as econometrics, physics, biostatistics, computer science, applied mathematics or engineering. Most of the time the background is a Master’s Degree or even PhD. However, to be successful big data scientists should have at least some of the following capabilities:
Strong written and verbal communication skills
Being able to work in a fast-paced multidisciplinary environment as in a competitive landscape new data keeps flowing in rapidly and the world is constantly changing
Having the ability to query databases and perform statistical analysis
Being able to develop or program databases
Being able to advice senior management in clear language about the implications of their work for the organization
Having an, at least basic, understanding of how a business and strategy works
Being able to create examples, prototypes, demonstrations to help management better understand the work
Having a good understanding of design and architecture principles
Being able to work autonomously
In short, the big data scientist needs to have an understanding of almost everything. Depending on the industry the big data scientist wants to work, they will need to specialize even further as for example a marine big data specialist requires a different set of skills than a historical big data scientist.
About MapR Technologies:
MapR Technologies provides a leading distribution for Hadoop. MapR’s Hadoop stack combines optimized versions of open source technologies including Apache HBase, Hive, Pig, and MapReduce with a powerful data management layer and enterprise reliability features to power companies’ Big Data analytical needs. Founded in 2009, MapR is well funded by leading VCs, already has substantial revenue and carries an enviable customer base that includes leading Fortune 100 companies as well as top technology companies.
MapR is an equal opportunity employer.
MapR Technologies, Inc. - 5 months ago