Agilex is hiring Knowledge Management/Data Scientists to work in a Systems Engineering environment.
The Data Scientist will support the Electronic Records Management project in its efforts to characterize and categorize (classify) large volumes of unstructured content. The characterization effort will make use of the file or message metadata and textual content to measure or derive the volume, origin, location, purpose, and retention needs of the data as well as to identify trends inherent in these metrics. The categorization effort will build on record control policies and the results of the characterization effort to develop and evaluate computational methods for associating files or messages with record control categories based on their metadata and content.
The ultimate objective of the Data Scientist will be to design and oversee the development of a prototype tool for document retention which makes use of leading edge supervised and unsupervised machine learning techniques.
The Data Scientist will have the following responsibilities:
- Assist with the design and implementation of statistically sound metadata measurement and trend extraction activities on large data-sets
- Propose, implement, and evaluate content analytic strategies for characterizing and categorizing large data-sets of unstructured files and messages using COTS/GOTS/Open Source tools as well as custom software or modules to be written by the Data Scientist
- Oversee construction of an annotated data set for training and evaluation of the prototype document retention tool
- Serve as a Subject Matter Expert in discussions with analytic tool developers and enterprise IT management
Candidate must have all skills listed:
TOP SECRET CLEARANCE WITH POLY
- Mastery of statistical methods and experiment design
- Experience with machine learning methods for document categorization and clustering.
- Experience with natural language processing
- Experience with content analysis on large unstructured data-sets
- Experience with large data-sets
- Familiarity with cloud computing concepts and environments
2011 National Capital Business Ethics Award Winner (Medium Company Category
- Experience with big data analytics in cloud computing environments
- Experience with R; *R is the name of the software. It is a statistical analysis language from the "R Foundation". www.r-project.org
- Experience with Mahout
- Experience with Content Analyst or Autonomy
- Experience with Hadoop and Pig
Agilex Technologies, Inc. is an Equal Employment Opportunity Employer. M/F/V/D.
Agilex - Realize the Value of Information
Agilex - 13 months ago