Note: This part-time position is based out of San Mateo, CA. The intern is required to work in-office in order to coordinate with Data Scientists and other employees.
Quad Analytix is an exciting, early stage Data Analytics/SaaS startup in the Bay Area-CA, focused on harnessing e-commerce information gathered from a variety of sources on the classic-web and the social-web. Our products then normalize, analyze and create insights that are visualized by our target customers (merchandisers, marketeers and manufacturers) to enable them to do their job more effectively. If you are obsessed with the application of Data Science Techniques to solve classification, information-extraction and clustering problems, want an opportunity to learn and grow with our company and are excited about seeing your work directly impact the entire business, Quad could be a great fit for you. We have an exciting environment with smart, down-to-earth people that enjoy having fun while working together.
You will work with our Data Science Teams to build training Data-Sets and build/fine-tune models for classification and information extraction problems. You will also write scripts/code that measure the performance of these models in training and post deployment (e.g.Confusion matrices, F1 measures, ROC curve fitting etc.). You will also interface cross-functionally with Engineers who productionize Data Science algorithms and models and our IAI (Information Acquisition and Insights) team to ensure that the data extraction pipeline that feeds our Hyper-Cube of Ecommerce data, which includes scaled-out Crowd Operations - is humming smoothly.
We are looking for someone who is passionate about the management of massive amounts of data and enjoys building data extraction pipelines that suck the entropy out of non-structured, semi-structured data assets to deliver ordered data. The ideal candidate loves working with Data-Scientists in the areas of text, video, image, document morphology etc. to prepare the data and analyze the performance of algorithms and gets satisfaction when things that hitherto needed humans, now gets automated.
DESIRED SKILLS AND QUALIFICATIONS:
- Strong background in Applied Mathematics and Statistics.
- Very proficient with scripting skills - with Python, R, other languages.
- Strong skills with Excel including Macros.
- Strong skills with SQL
- Strong interest in ML and NLP techniques with some exposure to extraction from text/image/video processing.
- Prior experience working with a crowd based application on a platform such as ODesk, Mechanical Turk is very desirable
- Prior experience and/or some knowledge of NoSQL databases such as Mongo, HBase and other projects in the Hadoop Ecosystem etc. is desirable
- Programming Skills - with Java/other languages desirable.
- Strong oral and written communication skills and ability to work effectively independently and in teams (both local and distributed).
- Intellectually curious, with passion for learning and growing professionally
- Strong work ethic and proactive approach to problem solving
- Ability to multitask, prioritize, show initiative, and respond quickly in a fast paced environment
- Enjoy having fun at work, and desire to collaborate with smart, humble people every day