Information Extraction Engineer
Quad Analytix - San Mateo, CA

This job posting is no longer available on Personforce. Find similar jobs:Information Extraction Engineer jobs - Quad Analytix jobs

(Quad Analytix) - San Mateo -

Quad Analytix is an exciting, early stage internet and SAAS startup in the Bay Area-CA, focused on structuring information gathered from a variety of Ecommerce and social sources leading to big data persistence, analysis and mining coupled with beautiful visualizations to highlight insights.

  • A Software Engineer who is an expert at building scalable, self-healing, “operationally-friendly” solutions based open source technologies.
  • Working with the product team to understand requirements and articulate approaches and zero in on chosen design and implementation.
  • Implementation responsibility for both prototypes and production, including working with dev ops and others to ensure the implementation is “operationally friendly” and cloud-ready
  • Bachelors in Computer Science.
  • At least 3 or more years of experience delivering Web/Image/Social-Data Scraping and Mining applications based off of such data.
  • Data acquisition: Significant experience with implementing industrial-strength, distributed but polite Crawlers (URL Retrieval) and Social API sources and/or integration with companies that provide social-data feeds (such as DataSift, Gnip). This includes experience with Distributed Crawling, Wrapper Generation (WG), Information Extraction (IE) fields with some knowledge of Information Retrieval (IR) technologies.
  • Information Extraction: Data mining/Entity and Attribute-extraction from unstructured/semi-structured Data Sources such as Dirty HTML, Sentiment analysis from Tweets and unstructured Text (Experience with Extraction from Images would be nice).
  • In-depth experience with:
o 2-3 Scraping Tools/Frameworks such as - JSoup, Scrapy.
o Multiple Data Formats: XHTML/HTML5, XML/Parsers, RDF, JSON etc.
o Programming languages – Java, Perl
o AI technologies such as NLP, Ontology Management.
  • Desirable:
o Knowledge of Lucene (especially text processing) and/or other similar technologies.
o Working knowledge of Hadoop and Mahout
o At least one of Python, Ruby, PHP programming languages
o Big Data noSQL Persistence and retrieval – e.g. to/from HBase, Mongo.
o Experience implementing robots with platform-technology from companies such as Kapow, Connotate etc.

  • Strong communication skills and ability to work effectively in teams
  • Intellectually curious, with passion for learning and growing professionally
  • Strong work ethic and proactive approach to problem solving
  • Enjoy having fun at work, and desire to collaborate with smart, humble people every day