There have been a number of proposals for dealing with WikiSpam. This page compares the various proposals. The requirements section has been distilled from arguments used against these solutions in the past.
For more information, see AntiWikiSpamScripts, CategoryWikiSpam, and CrazyThingsThatMightSaveWiki.
Requirements
Here are some vague requirements to help guide us in finding a good solution. The order is arbitrary.
- 01 Should deal with first-time spammers
- 02 Should not require spammers to pay attention
- 03 Should not require spammers to understand English
- 04 Should deal with human spammers
- 05 Should deal with robot spammers
- 06 Should not annoy readers (leading to loss of readership)
- 07 Should not annoy authors (leading to loss of contributions)
- 08 Should deal with bulk spam
- 09 Should deal with SubmarineSpam
- 10 Should not invalidate any current content
- 11 Should not be too different from what we have today
- 12 Should not be too hard to implement
- 13 Should be resistant to abuse (for example, in EditWars)
- 14 Should not require demanding ongoing maintenance
Solution comparison
Dashes mean that it probably meets the requirement; X's means that it probably doesn't meet the requirement.
01 02 03 04 05 06 07 08 09 10 11 12 13 14
-- xx -- -- -- -- -- -- -- -- -- -- -- -- DelayedIndexing
-- xx -- -- -- -- -- -- -- -- -- -- -- -- NoFollow
-- -- -- -- -- -- -- -- xx -- -- -- -- -- WikiSpamBlocker
-- -- -- -- -- -- -- -- xx xx -- -- -- -- StatisticalFilter?
xx -- -- -- -- -- -- -- -- -- -- -- -- xx BannedContentBot
-- -- -- xx -- -- -- -- xx -- -- -- -- -- LanguageFilter
-- xx -- -- -- -- -- -- -- -- -- -- -- -- RedirectExternalLinks
-- xx -- -- -- -- -- xx -- -- -- -- -- -- ExternalLinkArea
-- xx xx -- xx -- -- -- -- -- -- -- -- -- WarningMessage?s
-- xx xx -- xx -- -- -- -- -- -- -- -- -- SpamHereOnly
-- -- -- xx -- -- xx -- xx xx -- -- -- -- ShotgunSpam
-- xx -- -- -- xx -- -- -- -- -- -- -- -- LetsInsulateOurselves
-- xx -- -- -- xx -- -- -- -- -- -- -- -- StopAutoLinking?
-- -- -- -- xx -- xx -- xx -- -- -- -- -- VolumeLimitedEdits
xx -- -- -- -- -- -- -- xx -- -- -- -- xx EditThrottling
-- -- -- xx -- -- xx -- -- -- -- -- -- -- HumanVerification
-- -- -- -- -- xx xx -- -- xx -- -- -- -- RejectEdits?
xx -- -- -- -- -- xx -- -- -- -- xx -- -- EditsRequireKarma
-- -- -- xx -- -- xx -- -- -- -- -- -- xx UserLogin?s
xx -- -- -- -- -- -- -- -- -- -- -- xx xx SpamBlackList
-- -- -- -- -- -- -- -- -- -- -- xx xx -- PeerToPeerBanList
-- -- -- xx -- -- xx -- -- -- -- xx -- -- EditsRequireJavaScript?
Proposed solutions
- DelayedIndexing
- 02 Spammers will probably not notice
- NoFollow
- WikiSpamBlocker
- 09 Doesn't address well-concealed submarine spam
- StatisticalFilter
- 09 Intelligent spammers may blend in
- 10 Some current content may not pass the filter
- BannedContentBot
- 01 Banned content is likely to be spammer-specific
- 14 Requires ongoing maintenance
- LanguageFilter
- 04 Spammers can adapt to the filter
- 09 Doesn't address submarine spam
- RedirectExternalLinks
- 02 Has no effect unless the spammers realize what is going on
- ExternalLinkArea
- 02 Spammers may not notice
- 08 Depends on bulk spam containing many links to the same page
- WarningMessage?s
- 02 Spammers may not notice the warning message
- 03 Spammers may not understand the warning message
- 05 Robots will not notice the warning message
- SpamHereOnly
- 02 Spammers may not notice the directions
- 03 Spammers may not understand the directions
- 05 Robots will not notice the directions
- ShotgunSpam
- 04 Spammers can use multiple IP addresses or pages
- This is only a problem if you're basing spam detection on the editing of multiple pages; the simple approach of counting links during an edit submit isn't affected by where the spammer is coming from.
- If the spammer uses multiple IPs, he can make multiple repeat edits to the same page. So even if he can't add 1000 links at once, he could still add them 4 at a time.
- 07 May get in the way of some large refactorings
- 09 Does not deal with submarine spam
- 10 If existing pages exceed the new limit, authors are required to clean them up before posting minor edits in the future.
- Though that depends on whether we limit the total number of external links, or just the number of added links, and whether it's an absolute number or a links-to-text ratio.
- LetsInsulateOurselves: disable google indexing of wiki
- 02 Spammers will probably not notice
- 06 Readers will no longer be able to search the wiki with google
- StopAutoLinking?: stop auto-linking external addresses
- 02 Spammers may not even notice
- 06 Readers have to copy and paste addresses
- Google may still find the links, even if they aren't clickable
- VolumeLimitedEdits
- 05 Robots can make many small edits
- 07 Slows down major refactorings
- 09 Does not address submarine spam
- EditThrottling
- 01 First-time spam always gets through
- 09 Doesn't stop non-bulk spam
- 14 Manual edits still required, just at a slower pace
- HumanVerification
- 04 Spammers can answer quizzes too
- 07 Quiz would be time-consuming for authors
- RejectEdits?: reject edits containing discernible external addresses
- 06 Readers have to follow vague instructions to find external resources
- 07 Authors have to write detailed instructions leading to external resources
- 10 Invalidates all current external links
- EditsRequireKarma (related to WikiNeedsTrustMetrics)
- 01 One-time spammers may get their first spam through (if too much initial karma is granted)
- 07 One-time authors are discouraged from contributing (if not enough initial karma is granted)
- 12 May take a lot of work
- UserLogin?s
- 04 Spammers can create accounts too
- 07 One-time authors would be discouraged
- 14 Spammers' logins would have to be banned somehow
- SpamBlackList
- 01 By definition, doesn't deal with first-time spammers
- 13 Want to win an EditWar? Just add your opponent to the blacklist
- 14 Requires ongoing maintenance
- PeerToPeerBanList
- 12 Would have to be integrated with a number of WikiEngines before it becomes effective
- 13 Subject to same abuse as SpamBlackList
- EditsRequireJavaScript?: only accept edits from users who have configured their browser to run JavaScript
- 04 Human spammers use a normal browser
- 07 Some authors may prefer to disable JavaScript in their browser, or use a browser that cannot run it
- 12 The JavaScript code (and its generation) must be such complicated that it really needs a JavaScript interpreter to pass the test (hashcash…)
- (05 in the future, spam bots may be able to run JavaScript)
While no measure is perfect by itself, most of them help to some kind of spammers. This suggests combined measures will be more effective than using a single solution. For example, on my wiki (although it definitely does not have the attention and LinkShare? of WardsWiki) I use combined LinkThrottling, EditThrottling and VolumeLimitedEdits - together with a system that requires fetching at least part of a page before saving it (this is actually a side effect of the transparent ThreeWayMerging? deployed in the wiki). This has prevented spamming thus far.
At any rate, any spam detection system is IMNSHO better than the "code word" system used here. It has effectively made this wiki a closed medium, for example it took me a long time before I could find the right moment to put this comment here. -- PanuKalliokoski
The right moment?
See, there was this period of approximately a month (fall 2004 IIRC), when I didn't ever manage to come to the wiki at a time the code word was there. So I concluded, "okay, this is not the place to participate anymore", and left for a long time. After many months, I checked again, and no, the code word wasn't there. So I didn't come back until now.
Not fall 2004, as the code mechanism was introduced about February 2005.
Related to WikiVandalismSolutions.
CategoryWikiSpam CategoryWikiSecurity