Here is an idea that will help fight spam and add cool new functionality at the same time.
Stop rendering HTTP links inline with the text, and instead render them at the end of the page as a "references" section. In the rendered page, replace the inline link with a notation like "[External 1]", "[External 2]", etc. The auto-generated references section would be present at the bottom of each rendered page, as long as that page has one or more external links. In the references section, each link would be listed on separate lines, along with some special links (akin to the specialness of the "?" that appears after undefined pages). One special link would be to report a broken link. Another special link would be to report a spam link.
When a user clicks one of the special links, an automatic edit takes place. Each reference to the corresponding URL in the source document is modified. To make the edit, we can EmbraceAndExtend the square-bracket markup used on other WikiForums.
For example, suppose there is a source document with some text like this:
...and my favorite search engine is http://google.com.This would be replaced in the rendered version of the page as follows:
...and my favorite search engine is [External 1].The "[External 1]" is a hyperlink to a #-anchor for the references section. The references section would look something like this:
[External 1] http://google.com (broken?) (spam?)Where "broken?" and "spam?" are hyperlinks to perform the auto-edits mentioned above. When the user clicks "broken?", the source document is modified as follows:
...and my favorite search engine is [broken:http://google.com]Broken links could be rendered in a way to visually indicate that they are suspected of being broken. Similarly, spam links could be auto-edited to include a "spam:" modifier, and rendered differently. Some options for rendering spam links differently include:
With one-off links on random pages, the RecentChangesJunkies will presumably let a few slip by unnoticed. That is where the "spam?" link comes into play. By making it very easy for an ordinary reader to report spam, we can expect that most spam links will eventually be caught and reported.
Once links are reported as spam by ordinary, edit-shy readers, they will be disposed of in due course by WikiGnomes who happen across them. Remember that we assumed the 10,000 link spamming will have lost popularity by this point - any genuine spam links that make it into Wiki will be well-disguised and non-disruptive to the flow of the page; therefore, there shouldn't be a big problem with cleaning them up gradually.
Making the inline spam? and broken? tags WikiBadges would make it easier to find them. I think you'd still need to do other things, such as prevent the links being indexed by google in case they are there just to boost the page rank of something and/or maybe prevent edits with more than foo new external links.
Of course, external need not mean external to this wiki either. It would be possible to have a white-list of servers that you trust links to - this would take the labour out of commonly linked places. -- JamesKeogh
I agree that this would work best combined with a ShotgunSpam implementation. Another point you made, about preventing google indexing - I'm not sure I agree. The spammer's increased pagerank doesn't hurt us in any way, but not indexing pages does hurt us. Also keep in mind that with spammers, unless there is some feedback to them that their efforts are unsuccessful, they will continue doing what they have been doing (hence the highlighting of ExternalLinkArea's assumption). -- MichaelSparks
My understanding was that it was now possible to disable page ranking of links on a page without disabling indexing of the page itself using the rel=NoFollow. I read this here - http://www.theregister.co.uk/2005/01/19/google_nogoogle/.
This does assume that the spammers look and care. Since most ShotgunSpam is already aimed at blogs and other wiki-like sites, and there are enough of them to feed the spam maw, the fact that c2 cleans their links may not be significant to the source of the spam. This suggestion does clean up our space, but I am unsure if it would actually stop an influx. -- PeteHardie
If the spammers take notice that they have a harder time being successful at c2, they will be less likely to come back. It doesn't stop all spam, or do anything to discourage spam in general - but it does make c2 a less inviting place for spam. This is similar to how a lock on your front door doesn't do much to stop a determined robber - but it makes the prospective robber more likely to choose a different house. -- MichaelSparks
I'm just thinking that if the Googleturfing we're getting here is from a bot, the removal of links will not even be noticed unless the bot is a unique instance pointed at c2. If it's hitting every wiki with the same interface, it may never check for the links existing, and thus never see the removal, hence never stopping to put more links in.
A good experiment, if we wished to leave spam alone for a while, would be to leave the link spam up, and see if the bot hits a page twice. If it does not, then it's paying attention to what it posts, and a removal from the page may actually cause it to hit c2 more, trying to repost its links. -- PeteHardie
Some bots check and some don't. Bots that keep trying indefinitely if unsuccessful have been encountered, but are uncommon.
Just to clarify - I'm not suggesting that the idea isn't a good one, just that it may not stop spam from arriving, since the bots may not check back to see if the spam sticks. --PeteHardie
This solution will not address spam with 10,000 links to different sites. That type of spam is becoming more common. -- MichaelSparks
I stumbled across the EditLinks page today and was very amused to learn that wiki's links used to work in a similar way to what is described on this page. -- MichaelSparks