The Apache Lucene Ecosystem: My View Of 2010
After a week off to enjoy time with my family
, I thought I would kick off the last week of 2010 with a look back at the year as it relates to the Apache Lucene ecosystem. For anyone who follows the amalgamation of projects that I like to call the Lucene Ecosystem (the Apache projects: Lucene, Solr, Nutch, Mahout, Tika, PyLucene, Lucy, Lucene.NET, Droids, ManifoldCF Lucene Connector Framework, OpenNLP and UIMA) you know it has been an amazingly busy and fruitful year. Instead of going through each project like last years review, Im just going to be a bit less formal and hit on the highlights as I see them.
Before I dig in too much, though, a special thanks to all our customers at Lucid Imagination as well as to my coworkers. Im coming up on 15 years out in the real world and I can honestly say Ive never enjoyed what I do as much as I do here and that even accounts for the normal rough patches one goes through in any job. As an engineer, there are few things as cool as getting to work with customers who are not only using, but pushing your work/project/product on a daily basis to do new and interesting things (I think this is a direct result of the project being Open Source, which I believe has an inherently lower cost of experimentation). Ive been fortunate enough to meet and talk with many people doing all kinds of things with Lucene and Solr ranging from the mundane of basic keyword search to those building next generation search capabilities at incredible scale. Through it all, Im constantly amazed at the flexibility and efficiency of Lucene and Solr. For instance, Ive been working with one customer now whose Solr-based solution (for the exact same content) will use ~50% less hardware and will have an index that is 1/6 the size of their FAST index all while saving them major dinero.
Speaking of Lucid, one of the highlights of the year for us that relates directly to Lucene and Solr is the launch of our enterprise version: LucidWorks Enterprise. I like to think of it as Apache Solr with a whole lot of Lucid expertise on how to use Solr baked in and topped off with other features and functionality to make building search applications easier.
OK, time to move on to the open source projects
1.Without a doubt, the biggest news of the year is the merging of the Lucene and Solr code base as well as the graduation of several subprojects to Apache Soft. Foundation Top Level Projects (TLP). The graduating projects are Tika, Nutch, and Mahout. We also spun Lucy (a C port) to the Incubator, where it is working on its own community. These moves were primarily done to focus the project management on single code base, but they also demonstrate the project has reached a level of maturity at the ASF. The move also has the side benefit of bringing each project higher visibility.
2.Im particularly excited about the addition of OpenNLP to the Apache umbrella. OpenNLP is a nice open source Java project for natural language processing that has lived at Source Forge for quite some time. I would expect development to grow quite a bit under the ASF community based model. Also, integrating OpenNLP with Solr and Lucene is pretty easy to do. I would be remiss if I didnt also give a nod to the addition of the ManifoldCF project to the ASF. ManifoldCF will help unlock content in Sharepoint, Documentum and other repositories for users of Lucene and Solr.
3.Lucenes trunk code base now implements our Flex APIs, which should allow users to have near total control over what goes in the index as well as alternate compression techniques, different scoring models, etc. See Michael McCandless excellent talk at Lucene Revolution for more details.
4.With all the location aware devices and capabilities on the market, geo-spatial search is a hot topic and Lucene and Solr have been adding quite a bit of capabilities in this regard with the ability to filter, boost and sort results based on location information in documents. See Solrs Spatial Search Wiki page for more info as well as several of my past blog posts.
5.Of course, everyone was a buzz about the cloud this year. For Solr, this translates into greater efforts to make Solr easier to scale to very large installations (100s to 1000s of nodes and billions and billions of documents) via the Solr Cloud project that Yonik Seeley and Mark Miller have been spearheading.
6.On the user side, one of the biggest pieces of buzz this year related to Lucene was the migration of Twitter search to Lucene. At 1 billion queries per day and 50 million posts per day (all indexed and searchable in near real time), Twitters search system certainly has its work cut out for itself. However, as Michael Busch outlined at Lucene Revolution, Apache Lucene was up to the task! Naturally, there were lots of other companies that migrated to Solr and Lucene as well. Have you shared your use case?
Well, Ive no doubt missed a bunch of other things, but those items, to me, are some of the bigger highlights. Looking forward, there are some other exciting things coming to Lucene and Solr. In particular, Im working on adding language identification, related searches and point in polygon filtering to Solr. I would also expect we will release Lucene/Solr 3.1 fairly soon, too, but you cant pin me down on a date just yet.
Heres hoping you all have a Happy Holidays and a Happy New Year!
To know more about
Lucidworks Enterprise and
Apache Solr check out Lucid Imagination website
www.lucidimagination.com
by: Lucid Imagination
Life Coaching London Mens Enhancement Remedies Lead Loads of Great Instances for Rachel 2 Relatively Easy Approaches To Assist You Build a Huge and Vibrant Member Can I Ignore Toothaches? Skin Bleaching Pills Mazda reaches 9 million milestone HTC desire : Desires have reached the peak Atypical Migraine - Not Your Average Headache The Best Natural Remedies for Migraines Headache Bachelor TV show: A Battle to become a Bride Swedish Massage: Remedy Aches More Effectively How To Make A Good First Impression When Approaching Girls Migraine Cure - Strategies for Achieving Permanent Migraine Headache Relief