Wikipedia and its sister projects consist of content that is free to study, share, improve and reuse. One of the ways we make the content readily available and searchable is by indexing changed contents at regular intervals and make them available to our search engines.
The Wikimedia Foundation deploys Lucene as the Search Engine backbone for its Wikimedia projects. We're looking for a consultant who will assist in the continuing development and operational work of the Search software stack and the infrastructure. The candidate is a subject matter expert on Lucene Search technology and will provide guidance to the Foundation Technical Operations team on maintaining, improving and migrating the Search infrastructure. Documentation on current deployment can be found at - http://wikitech.wikimedia.org/view/Search .
Scope of Work
Work on the enhancement and the daily operational matters such as improving efficiency, capacity and redundancy of the Lucene Search infrastructure
Help in troubleshooting unexpected outages and identifying operational issues
Profile and locate performance bottlenecks
Make use of Puppet as the the Configuration management tool in maintaining the manifest for the Lucene configuration
Deploy Lucene Search infrastructure at our new data center.
Upgrade and migrate current Search software stack to work with the latest Lucene version
Upgrade to current new release of Lucene
Develop and upgrade Mediawiki search extensions (MWSearch and Lucene-search) to work with the new Lucene release. MWSearch extension is a MediaWiki backend to fetch search results from MediaWiki Lucene-based search engine. Lucene-search extends the Apache Lucene search API to rank pages based on number of backlinks, distributed searching and indexing, parsing of wiki text, incremental updates, etc.
Automate, optimize and document the indexing and deployment process.
Have strong knowledge of Lucene, Java, Php and Linux
Experience with configuration management systems and concepts (e.g. puppet, chef, cfengine)
Experience with operating system distribution packaging systems (e.g. dpkg, RPM)
Have solid experience with production and processing of large datasets
Be able to work independently where needed, and can work remotely as part of a globally distributed team
Have relevant hands-on experience and eagerness to learn and try new concepts
Be comfortable in a highly collaborative, consensus-oriented environment
Be a proficient speaker in the English language
Prior work experience implementing Lucene / Solr Search engines
Bachelor's degree in related field or equivalent experience
About the Wikimedia Foundation
The Wikimedia Foundation is the non-profit organization that operates Wikipedia, the free encyclopedia. Our commitment: Imagine a world in which every single human being can freely share in the sum of all knowledge. According to comScore Media Metrix, Wikipedia and the other projects operated by the Wikimedia Foundation receive more than 482 million unique visitors per month, making them the fifth-most popular web property world-wide (comScore, January 2012). Available in 282 languages, Wikipedia contains more than 21 million articles contributed by a global volunteer community of more than 100,000 people. Based in San Francisco, California, the Wikimedia Foundation is an audited, 501(c)(3) charity that is funded primarily through donations and grants. The Wikimedia Foundation was created in 2003 to manage the operation of Wikipedia and its sister projects. It currently employs 150 staff members. Wikimedia works with local chapter organizations in 39 countries or regions to advance the mission of the Wikimedia movement.