About Full Text Searching
From KallestadWiki
What is Full Text Searching
When we talk about searching in the RDBMS world, usually we are talking about using SQL queries. Queries like this:
SELECT title FROM books WHERE isbn = '0012345'
SQL has it's advantages - it is excellent at searching structured data, and database vendors (both commercial and OSS) have done wonders with performance consideration. The one drawback of SQL has to do with searching long or unlimited character fields. You can search for specific terms, but performance is generally not very good, and you get results based on your query criteria, ordered in either the default sort order, or by a field based sort order that you specify.
To be fair, many database vendors have put forth effort to implement full text searching, but the results have been less than desirable in my experience with various vendors.
Full text searching changes things in a few ways. First off, long and unlimited character data is indexed - which goes a long way towards improving performance - and they are indexed in a way that is different than traditional database indexing (but I won't go into the details right now). Full Text Search generally offers more than traditional sorting - results can be sorted by algorithmic relevancy, and with structured data you can generally specify a way to weigh certain elements more than others. Full Text indexes generally include positioning information for words or terms - which allows the underlying engine to utilize proximity information with matched data sets. Another common feature is word stemming - so that you don't lose relevant results that are approximate matches.
Think about Google, Yahoo, and Alta Vista. These are large full text search databases. They each work a little bit differently, but the underlying concept is the same - find relevant search results within a large database of textual documents and do it quickly. Now consider other benefits of similar search engines within a general application framework.
The mere presence of multiple Web-based search engines implies that not all full text search engines are the same. Different algorithms produce different results on the same data set. Depending on your data set and your end user community, you want to do things different ways so that the desired results get floated to the top of the result set more often than not. Full Text Searching is one of the many areas where computer programming gets a little bit fuzzy because the desired results are not always logically obvious.
Full Text Search Index
- About Full Text Searching
- Lucene
- Lucene Documents
- Lucene Indexes
- Integration Points with Lucene
- Related Results with Carrot
