External Link Area

Here is an idea that will help fight spam and add cool new functionality at the same time.

Stop rendering HTTP links inline with the text, and instead render them at the end of the page as a "references" section. In the rendered page, replace the inline link with a notation like "[External 1]", "[External 2]", etc. The auto-generated references section would be present at the bottom of each rendered page, as long as that page has one or more external links. In the references section, each link would be listed on separate lines, along with some special links (akin to the specialness of the "?" that appears after undefined pages). One special link would be to report a broken link. Another special link would be to report a spam link.

When a user clicks one of the special links, an automatic edit takes place. Each reference to the corresponding URL in the source document is modified. To make the edit, we can EmbraceAndExtend the square-bracket markup used on other WikiForums.

For example, suppose there is a source document with some text like this:

    ...and my favorite search engine is http://google.com.

This would be replaced in the rendered version of the page as follows:

    ...and my favorite search engine is [External 1].

The "[External 1]" is a hyperlink to a #-anchor for the references section. The references section would look something like this:

    [External 1] http://google.com (broken?) (spam?)

Where "broken?" and "spam?" are hyperlinks to perform the auto-edits mentioned above. When the user clicks "broken?", the source document is modified as follows:

    ...and my favorite search engine is [broken:http://google.com]

Broken links could be rendered in a way to visually indicate that they are suspected of being broken. Similarly, spam links could be auto-edited to include a "spam:" modifier, and rendered differently. Some options for rendering spam links differently include: The motivation for collecting links into a references section is that it destroys the spammer's reason for posting edits with 10,000 links to the same page. Here is the critical assumption. ASSUMING most spammers will actually view the changed page after their edit has been applied, they will figure out that posting 10,000 links is not helping anymore. They will change tactics. They will try to sneak in one or two links on various pages, making it seem like it is relevant somehow. This is an even more insidious type of spam than what we commonly face today (although even this form happens from time to time). The result should be similar to what we can expect with the ShotgunSpam solution.

With one-off links on random pages, the RecentChangesJunkies will presumably let a few slip by unnoticed. That is where the "spam?" link comes into play. By making it very easy for an ordinary reader to report spam, we can expect that most spam links will eventually be caught and reported.

Once links are reported as spam by ordinary, edit-shy readers, they will be disposed of in due course by WikiGnomes who happen across them. Remember that we assumed the 10,000 link spamming will have lost popularity by this point - any genuine spam links that make it into Wiki will be well-disguised and non-disruptive to the flow of the page; therefore, there shouldn't be a big problem with cleaning them up gradually.

-- MichaelSparks


Making the inline spam? and broken? tags WikiBadges would make it easier to find them. I think you'd still need to do other things, such as prevent the links being indexed by google in case they are there just to boost the page rank of something and/or maybe prevent edits with more than foo new external links.

Of course, external need not mean external to this wiki either. It would be possible to have a white-list of servers that you trust links to - this would take the labour out of commonly linked places. -- JamesKeogh

I agree that this would work best combined with a ShotgunSpam implementation. Another point you made, about preventing google indexing - I'm not sure I agree. The spammer's increased pagerank doesn't hurt us in any way, but not indexing pages does hurt us. Also keep in mind that with spammers, unless there is some feedback to them that their efforts are unsuccessful, they will continue doing what they have been doing (hence the highlighting of ExternalLinkArea's assumption). -- MichaelSparks

My understanding was that it was now possible to disable page ranking of links on a page without disabling indexing of the page itself using the rel=NoFollow. I read this here - http://www.theregister.co.uk/2005/01/19/google_nogoogle/.


This does assume that the spammers look and care. Since most ShotgunSpam is already aimed at blogs and other wiki-like sites, and there are enough of them to feed the spam maw, the fact that c2 cleans their links may not be significant to the source of the spam. This suggestion does clean up our space, but I am unsure if it would actually stop an influx. -- PeteHardie

If the spammers take notice that they have a harder time being successful at c2, they will be less likely to come back. It doesn't stop all spam, or do anything to discourage spam in general - but it does make c2 a less inviting place for spam. This is similar to how a lock on your front door doesn't do much to stop a determined robber - but it makes the prospective robber more likely to choose a different house. -- MichaelSparks

I'm just thinking that if the Googleturfing we're getting here is from a bot, the removal of links will not even be noticed unless the bot is a unique instance pointed at c2. If it's hitting every wiki with the same interface, it may never check for the links existing, and thus never see the removal, hence never stopping to put more links in.

A good experiment, if we wished to leave spam alone for a while, would be to leave the link spam up, and see if the bot hits a page twice. If it does not, then it's paying attention to what it posts, and a removal from the page may actually cause it to hit c2 more, trying to repost its links. -- PeteHardie

Some bots check and some don't. Bots that keep trying indefinitely if unsuccessful have been encountered, but are uncommon.

Just to clarify - I'm not suggesting that the idea isn't a good one, just that it may not stop spam from arriving, since the bots may not check back to see if the spam sticks. --PeteHardie


This solution will not address spam with 10,000 links to different sites. That type of spam is becoming more common. -- MichaelSparks


I stumbled across the EditLinks page today and was very amused to learn that wiki's links used to work in a similar way to what is described on this page. -- MichaelSparks


See WikiSpamSolutions

CategoryWiki


EditText of this page (last edited September 25, 2006) or FindPage with title or text search