Data Mining The Web

The lack of consistent structure to various Web sites causes some to be skeptical regarding successful Web mining. Perhaps a way to render some organizational order to this chaotic structure is to systematically retrieve Web based content to a local storage medium, strip out the nuances such as HTML annotations, classes of semi-structured documents, etc. rendering the text substance from various sources to a common form. Then using local tools the information can be reorganized to have a consistent structure including indexes that allow BooleanLogic to be applied across information gathered from a variety of sources.

As a non-programmer, end-user I have found Folio's WebRetriever? to be useful in this regard. For instance I can schedule off-hours gathering/indexing rendering a full word searchable, hyper-linked, local repository of data. This data is closer to being information and knowledge than the unmined Web.

GenePrescott (2/18/97, 21:50 EST)


CategoryDataMining 20080702 Thanks Gene


EditText of this page (last edited July 3, 2008) or FindPage with title or text search