"... Latent semantic indexing adds an important step to the document indexing process. In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant. This simple method correlates surprisingly well with how a human being, looking at content, might classify a document collection. Although the LSI algorithm doesn't understand anything about what the words mean, the patterns it notices can make it seem astonishingly intelligent. ..."
Description taken from: http://javelina.cet.middlebury.edu/lsa/out/lsa_definition.htm
Algorithm for LSI:
To perform Latent Semantic Indexing on a group of documents, you perform the following steps:
Uses for LSI
LSI has been used in several ways. The most obvious and common way is to analyze the similarity between bodies of text. This can be used in dozens of interesting ways, from finding related documents in a group to doing paragraph-wise LSI to find site summaries. It can also be used to facilitate a "smart" search of your document space, and even do document categorization (read: SPAM filtering!)
The most common operation is to take the inner product (dot product) of two of the LSI vectors and perform operations with that information. Sometimes, it may make sense to do this operation on your unit LSI vectors, particularly when you are searching on a vector that is a search term, it will almost never have the magnitude that a full document would have.
Example Implementation
A working example of LSI in the RubyLanguage can be found at http://classifier.rubyforge.org/. prior link was bad, but rufy.com includes this link
Tutorials
A tutorial, part of a five-part series on SVD and LSI, is given at http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-4-lsi-how-to-calculations.html