web crawler - C Java C++
Elance - United States
Contract

This job posting is no longer available on Elance. Find similar jobs: Web Crawler jobs - Elance jobs

I need a small web crawler app built. It needs to run in either C or Java on a linux aws cloud server. It needs to pull data from a mysql database. the table will have 5 fields.

id, int domain, varchar(255) page, varchar(255) date_crawled, datetime(4) page_content, ntext status_code,int The app needs to read from the table, get a record, fetch content from the domain + page fields, set a specific http_user_agent, and store the data in a database in the page_content field. The app needs to be multi-threaded and be able to process multiple pages at the same time. I need to be able to set the number of concurrent threads/pages/downloads so that I can upgrade or downgrade the cloud server dependign on available resources. I need to be able to set the default timeout limit for a single http request.

After a success/failure of an http request, the app needs to update the status_code, page_content, & date_crawled. If you need to add another field to the table to handle processing a record, we can do that as well. The app needs to remove all non-printable characters from the source code before it saves it in the database. And lastly, the app needs to output its progress to a log file that I can monitor with tail -f logname.log if this project works out - i will have alot more in the future.

Desired Skills: C Java C++

Elance - 7 months ago - save job - block
Recommended Jobs
JAVA
innovationittech - Seattle, WA
innovationittech - 10 days ago

Java Developer
Lorven Technologies - Denver, CO
Lorven Technologies - 14 hours ago

Java Developer
HP - Columbus, OH
HP - 50 minutes ago
About this company
10 reviews
Elance is at your service. Formerly a provider of software that businesses used to streamline the way they bought and managed various servic...