From CrazyThingsThatMightSaveWiki
To dissuade spammers, modify RobotsDotTxt to tell the google-bot and other search engine bots to ignore the site completely. (Or at least don't follow any of the links; so GoogleLovesWiki won't still be true). Of course, people's HomePage will no longer have a high PageRank (or whatever they call it these days), but the reason for spammers to come here will be limited.
Apparently, this is possible already. The PythonCommunityServer? wiki claims, when editing, "Important note: All external links from these pages go through Google's redirector to remove PageRank. Don't bother spamming here to increase your PageRank; it won't work.". ...although I first found out about this when I editing its front page to remove WikiSpam. sigh
Even if we don't have logins; how about IP-banning known HTTP relays, anonymizers, and other means that might be employed for distributed and anonymous attacks against Wiki?
Let's all lay off WardsWiki for a month and see whether that makes a difference. Or we could allow certain non-contentious supervision like DeletedButWelcomeToWiki.
As for spams, using RobotsDotTxt to repel Google should be good enough.
Have to disagree on this one for multiple reasons:
- c2 wiki is famous and linked to in wiki lists all over, not just via google; it's not at all clear to me that google is even the primary way that spammers find their way here.
- it is very handy to use google to search c2, since google search is much more powerful than wiki search
- RobotsDotTxt would keep archive.org out as well, and that alone makes the CureWorseThanTheDisease; I use archive.org to look at ancient versions of c2 pages multiple times per week, and also, archive.org is just plain a good idea.
- Is archive.org archiving anything here currently? The latest thing I can find there is dated June 2003.
- I haven't noticed, but it isn't unusual for there to be huge lags between archive.org updates of a site. It's always been erratic, and if he's announced a way to view the update policy for a given site (if indeed there is one), I've missed it. But there's every reason to think that it will eventually take another snapshot if Ward didn't ask him not to. We could ask if it matters -- but I don't think it does. The overall update rate is basically limited by the $40,000 in disks he buys every month.
- [where is there a need to use archive.org often. What new uses I can make from this wiki for computer related work? I hope the person is not saying older material is better --dl]
- I find google's cache of pages here fairly handy, actually; I use it frequently as well
- google brings desirable people here (e.g. when they search on DesignPattern keywords), not just undesirables like spammers
It would be nice if there were some way to tell Google (and other search engine bots) that "this collection of nodes is a highly-connected graph; modify your
PageRank algorithm appropriately". Agreed with the advantages of having Google index Wiki.
CategoryWikiProgress