Robots Dot Txt

Actually, this should be "robots.txt".

As described on http://www.robotstxt.org/wc/robots.html, robots.txt is a standard file which you can place on your Web server to influence the behavior of WebRobot?s when they hit your Web site. If the file http://c2.com/robots.txt contains

 User-agent: *
 Disallow: /cgi/
 Disallow: /cgi-bin/
(which it once did), then this wiki should not show up in SearchEngines, and it shouldn't be crawled by robots.

Of course, a WebRobot? must adhere to this, otherwise it won't work. There's no way to prevent a poorly-behaved WebRobot? from ignoring robots.txt.

Ward's robots.txt is now (2006-10-17) as below.

 User-agent: *
 Disallow:   /wiki/history
 Disallow:   /~ward/morse/ve
 Disallow:   /lisa


Would it hurt for Wiki to be indexed by search engines?

The biggest issue at the time was "spam" web spiders traversing the ReverseIndex links ("clicking" on the page headers), causing the database to be locked. See "denial-of-service protection" on WikiHistory.


The issue is more often the load on a dynamic page generation system like Wiki. In other cases, it can be be misleading to index (and archive, in the case of Google) outdated pages with volatile information.

That something has been removed from Wiki doesn't mean nobody is interested in reading it anymore. It's nice to have the possibility to find removed stuff in the Google cache.

The search engines frequently index the "edit" page too, which may confuse the casual visitor and lead to strange edits. Sometimes the LikePages occurs too, which seems a little wasteful. The scary one is http://www.google.co.uk/search?q=site:c2.com+%22entry+point%22&hl=en&lr=&ie=UTF-8&start=30&sa=N which (at 2003-07-05) links to the EditCopy of OnMonads. This invites the casual visitor to roll back someone's change.

One solution would be to split wiki?FOO=page off into several FOO?page scripts, and then turn all the wiki?FOO= pages into redirections or 400-series errors. While this might be the RightThing, it could be easier just to stick a <meta name="robots" content="noindex"> tag in the top and hope for the best. -- MatthewAstley


I periodically google search c2.com to find things in the wiki, and would love it if somehow EditPage and EditCopy pages could be excluded from the search. I'm afraid Edit and Copy are too common for adding -edit and -copy to the search to remain effective. Maybe I'll try -ConvertSpacesToTabs. (ooooh yes, that works pretty well!) --KeithWright

See CanWebSearchEnginesIndexWikis


EditText of this page (last edited July 12, 2011) or FindPage with title or text search