Conclusion: yes, Web search engines can index wikis. GoogleLovesWiki.
Therefore, a HTTP search engine is a useful tool to provide increased search capability for any wiki.
However, the Robots.txt file should be properly configured.
And the appropriate META tags should be added to "EditPage" and "EditCopy" to exclude all robots from indexing obsolete text.
Robots.txt can be configured to allow a local search engine to index the site while excluding public Internet search engines, something like this:
To allow a single robot
User-agent: FriendlyWebCrawler Disallow: User-agent: * Disallow: /See RobotsDotTxt for a link to the file format spec.
Discussion
Any HTTP search engine can index any wiki. It is just a matter of combination of configuration, URL space and page content (META tags) whether it is possible. There are no inherent technical limits. (copied from WikiNavigationPattern)
Note the following:
I don't know if there is a way for a wiki (or other Web site) to force a searchbot to update its index.
References
A sample robots.txt (for the Portland Pattern Repository):
# Prevent all robots from getting lost in my databases User-agent: * Disallow: /cgi/ Disallow: /cgi-bin/
Comments
Sure, why not? MeatballWiki is indexed by Google. Some searches are bizarre. We got one trolling for kiddie porn once. Recently, Cliff decided to exclude bots from hitting the edit pages, but they are greatly encouraged to index the whole site there. On the other hand, Wiki is not indexed, nor will it likely ever be directly, although it is through the mirrors that aren't blocked. -- SunirShah
I wrote up a document about how to give the wiki pages some "fake" URLs that end in .html. This doc is at http://www.riceball.com/wikiwiki/MakingWikiWorkWithSearchEngines.html -- JohnKawakami
Aug 6, 2004: Google no longer indexes the EditPage or EditCopy links. The wiki software now emits
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">in the HTML header of the Edit page and the Edit Copy page. Google, like all properly-written robots, ignore such pages. See WikiSpam for details.
Feb 2. 2005: But, using the above implementation, the spider must still access the EditPage or EditCopy links to then find out they aren't supposed to be indexed. Now google and blogs are implementing a rel="nofollow" tag in links. This could be added to the EditPage button on each page to give robots a heads-up to not even bother trying to access the EditPage. What is still unclear is whether the rel="nofollow" tag really means "do not follow this link" to robots, or whether it means "follow this link, but don't give it any credit in your page ranking scheme".
--- May 02, 2005: Meta Tag Tutorial Available http://www.rahsoftware.com/tutorials/meta_tag_tutorial.aspx