Open Source Full Text Search


I've played around with a few different searching utilities over the past few years, hoping to find a single solution that I could latch on to that would handle multiple interfaces - whether it was a forum, web pages, mailing list archive, blog posts, or whatever I happened to be working on.

Right now, I'm intrigued with Apache Lucene. It's a java application which is good and bad in my book - it's portable, but it's probably not as fast as it could be. There are C and C++ implementations of Lucene available, and even a few others like Perl, Ruby, and even dotNet. htDig is based on the C++ port of Lucene, but that project unfortunately hasn't been updated in a while, and in it's current state does not support proximity matching, which is a bit of a problem in my book. The project was once very popular, but development has all but died off from what I can tell.

There are certainly other solutions out there that don't involve setting up software on your server - but running your own software gives you the ability to tweak the engine so that it works best in your own particular situation. Lucene is really just the search engine, and one piece of the puzzle, but the project does seem very mature at this point and I think it bears looking into.

I've toyed with the idea of building my own full text search indexing software - it's not rocket science to build, but my holdback really has been that I don't want to end up maintaining something like that long term, and with the wealth of possibilities running anywhere from database based FTS indexing, mature open source software like Lucene, various paid and free search engines, and proprietary possibilities like the Google Search Appliance, it really doesn't seem worth the effort at the moment.

The downside to a lucene solution is that it will require extensive setup to get going, but the upside is that once set up, it can be extremely powerful. I don't know if I'm going to take the time to set it up right now, but it's on my mind so I figured I'd pass the info on.



Open Source Full Text Search Interaction