Following up on yesterday's post, I spent a good deal of time today running through some other apache projects. Holy cow, are they doing a lot of interesting things over there. Unfortunately for me, most of the apache projects are java based, and my java knowledge is pretty much limited to base implementations and acute syntactical recognition. (Fancy words for I don't know jack).
But following up on the Lucene project, I came across two other related projects Nutch, and Solr which are both applications which promise to make a web based searching application much easier to implement.
Solr was actually donated to the foundation by CNET - who developed the product in-house for their own usage.
The Nutch project has been running much longer, but documentation is not of paramount importance to the developers apparently. They don't even have a documented list of features!
In the end, though, both of these projects show a lot of promise for an easy to implement web site searching solution. They both require Tomcat or Jetty or some servlet engine to get going, which I'm not running on my own server at the moment, but I can tell right now that this is something I'll be looking into in the next month or so. Having a decent search engine at my disposal is too nifty a possibility to pass up.
I have no idea what performance is going to show me, but this not only has promise in aggregating the results of multiple publishing interfaces (message board, wiki, blog, custom html, document downloads, etc), but it also has some decent promise for a small scale vertical search engine.
Batik looks interesting to me as well, though all I really can think of using it for right now is outputting pretty looking graphs. The Forest Project seems to have come along pretty well since last I viewed it. It still seems to be a lot simpler to just build a site or use an existing CMS than to go through the trouble of setting up Forest, though. I'm going to take a pass on that one again and come back to it in another year or so. Another piece of the searching pie is the CyberNeko HTML Parser, though I don't think it's officially part of apache. Powerful, but some things just don't need to be written in Java. Hadoop is a distributed computing platform that shows promise for projects where scalable architecture is a necessity. I would really like to see more little guys starting to experiment with scaled and distributed architecture - renting a few CPU hours from amazon could turn into a pretty big thing with the right idea behind it. Elephant Drive came into being in what, 30 days? It worked out for them pretty well. The problem is that most people simply don't have a mindset that can realize (and then utilize) the power of the concept.
Discuss Apache Projects besides the Web Server
