WikiSpam is a wikiwide problem. It won't be solved but wikiwide.
These spammers add links to wikis, blogs, bulletin boards, and guestbooks so their Google page rank goes up. We need to all help keep an eye on this at all wikis and help clean it up where we can. WikiSpam in general is described at http://en.wikipedia.org/wiki/Blog_spam.
See other discussions on:
SpammedWikisDatabase lists wikis currently getting hammered by spam. If you have a spare moment, please stop by, pick a page or two and delete the junk.
A difficult-to-judge case is one where a page is an advertisement for a commercial product or service that may seem OnTopic. For example, a page devoted to describing a company that sells software or consulting services may seem relevant or useful at first glance. Such pages should be deleted if they have little information other than "We offer an incredible product/service. Please contact me if you want to buy it.".
An easy-to-judge case is one where a single word is linked to an off-site page. Those off-site pages then contain links to pornography related services. Two actual cases of this were witnessed on the words "Python" and "Wcms" on another wiki.
This has also been observed on the AtomWiki, with the word SEO linking to an external site. Spam of this kind has become common on weblog comment systems and sadly it looks like widespread WikiSpam may become a big problem in the future.
Due to the way WardsWiki handles external links, this type of spam isn't possible here.
Please Wiki developers, add : <meta name="robots" content="noindex,nofollow"/> to the old versions of Wiki pages. When spam is cleaned up, it persists in the version history of pages, so continuing to contribute to Google page rank. If you add this header, Google won't index the old spammy versions. I wrote some more about this technique (and analysis of an attack on my site) here: http://www.annexia.org/wiki_spam. -- RichardWMJones
This has been in place on WardsWiki for a while now.
Ah, if only we were in an Asimovian utopia where robots could do no harm. It seems ironic somehow that Wiki can survive humans but can be strangled to death by misguided automatons. Of course, if you really wanted to restrict automated contributions, you could include a miniature TuringTest with each post. This is what Yahoo does when you sign up for an account; you have to read a series of letters and numbers in a distorted image (a CaptchaTest). -- PatrickParker
I've had good results on several wikis I administer with a small patch to UseModWiki which counts the number of external links before and after the edit, and redirects you to a page explaining the spam protection scheme in effect if there are four or more external URLs in the newly edited version of the page. This behaviour can be inhibited if you're logged in, or coming from a specific IP range. Mail me if you want a copy. -- DonaldGordon
This solution of checking the number of new links on every edit works very very well. I'm amazed more wikis don't do this. It seems like it would stop almost all spammers. Has UseMod added this type of patch? -- RachelStruthers
I was thinking that it ought to be possible to create a script that listed pages in order of number of external links. Even if a page is spammed by only a few bytes and each spam is a different size, they must add external links, and I can't think of many normal pages that have that many external links. -- JamesKeogh
Extremely frustrating are the sites that don't have a Revisions History - so the would-be gnome can only delete, not restore. Why would someone put up a wiki without the revisions link? I tried appending "?action=diff" to the page's url, but that doesn't work - I guess it's not the right syntax for the wiki engine in use. What a shame. -- TeganDowling?
That's been discussed extensively, and is not just a spam issue. In particular, keeping a complete history of every page can be stifling and lead to disaster. Note that Google may have cached some earlier versions of a page. If spamming is really troublesome, WikiSpamBlocker can be used - it has been tried and seems to work.
Ward has chosen to use DelayedIndexing to discourage spamming.
EliotScott asks: How does one automagically remove unwanted content, in this case WikiSpam? What's the secret to identifying it on pages? I'd like to setup my own Wiki Web Site, but I can't have inappropriate language or content either.
It depends which one of the WikiEngines you use. MoinMoin comes with an anti-spam feature. You just have to turn it on in your config file; see: http://moinmoin.wikiwikiweb.de/AntiSpamGlobalSolution You can add profanity filters to the LocalBadContent page I think. A much better wiki engine is MediaWiki, but they have not yet finished adding built-in support for spam blocking; see: http://mail.wikipedia.org/pipermail/mediawiki-l/2004-December/002666.html. It seems, that some spam (or what else is is?) occurs always at 4:00 (see http://c2.com/mrtg/).
May be of interest to some, mostly what we knew already though - http://www.theregister.co.uk/2005/01/31/link_spamer_interview/.
Spams left unattended for long periods
SandBox cleaned up 22ndFeb Sydney time around 7am, after left untouched for 10 hours or so by the community. Anyone in US has habit of checking wiki in the mornings? Please check for spam as well.
Spammer upset with antispammers
WikiHistory was spammed on Oct 16, 2005 (and FrontPage similarly the previous day). This is the only incident I have seen since the EditCodeWord was put in place in March. Quite effective as a spam deterrent, I'd say.
I don't know if it's been mentioned, but a way to combat spam is also to restrict editing to registered users. A bunch of wikis do that, WikiStory (http://www.wikistory.com/wiki/WikiStory_Home) is the first that comes to mind. People can still be annonymous, since choosing a username does not reduce anonymity. There would be less editing with this solution, because people might not want to register, for whatever reason, and I haven't visited C2.com before but the wiki appears to not incorporate a user system. So there are downsides to this approach and instances where it may not be applicable.
Thanks moved to WikiAppreciation.