legoboku.eng: 02/01/2014

Feb 18, 2014

Apache Lucene: Then and Now, Java User Group meetup at Twitter HQ

Meetup.com - Apache Lucene: Then and Now

This time, Doug Cutting (@cutting) talked about the history of Apache Lucene and how Apache Lucene is used for the implementation of Internet search engines and local, single-site searching in major tech companies like Linkedin or Twitter. He also mentioned this project is integrated with Hadoop and still evolving.

Doug Cutting (@cutting) is the founder of numerous successful open source projects, including Lucene, Nutch, Avro, and Hadoop. Doug joined Cloudera in 2009 from Yahoo!, where he was a key member of the team that built and deployed a production Hadoop storage and analysis cluster for mission-critical business analytics. Doug holds a Bachelor’s degree from Stanford University and sits on the Board of the Apache Software Foundation. (cited from Meetup.com)

His talk is based on the following blog entries.

As you see, his roles in his company (Cloudara) is the implementation of the new open source project, Blur.

Blur is an Apache Incubator project that provides distributed search functionality on top of Apache Hadoop, Apache Lucene, Apache ZooKeeper, and Apache Thrift. When I started building Blur three years ago, there wasn’t a search solution that had a solid integration with the Hadoop ecosystem. Our initial needs were to be able to index our data using MapReduce, store indexes in HDFS, and serve those indexes from clusters of commodity servers while remaining fault tolerant. Blur was built specifically for Hadoop — taking scalability, redundancy, and performance into consideration from the very start — while leveraging all the great features that already exist in the Hadoop stack. (cited from blog.cloudera.com)

And Cloudera is providing a better way for non-programming users interact with Hadoop data.

In the context of our platform, CDH (Cloudera’s Distribution including Apache Hadoop), Cloudera Search is another framework much like MapReduce and Cloudera Impala. It’s another way for users to interact with Hadoop data and for developers to build Hadoop applications. Each framework in our platform is designed to cater to different families of applications and users (cited from blog.cloudera.com)

See Cloudera blog for more details.

It seems that there are meetup of Java user group in SF once or twice a month. I am planning to continue joining meetup.