Link Rot

[WebsitePatterns]

Everyone's seen sites that feature flashing graphics touting "NEW!", only to find out the material's at least a year old. Or sites with BrokenLinks, or missing graphics.

This is the flip-side to KeepItFresh. Sites which do not have active maintenance slowly rot away, losing their value and utility. Instead of finding answers to their questions or help, users find the online version of the magazines in a doctor's office.

This isn't a pattern; it's a force, like BitRot.

LinkRot is what happens to links you have in your site which points to pages/resources which are no longer where they were, resulting in a broken image link or a 404 page not found.

Links to sites you don't control are most susceptible. Links of your own which are to resources on your own site are still susceptible if you are not careful.

There are two sides to the problem of handling LinkRot.

Keeping LinkRot off your site: there are tools you can use to batch-check all the links you have on your site. Those links which are broken you can deal with by removing, replacing, or ignoring.
Avoid causing LinkRot elsewhere: be careful when doing the big redesign that you don't break all the links to your site that exist on other sites, or even in bookmarks. Check your Web server logs to find which of your pages are externally referred and put up redirection pages explaining the new structure. Use a URL rewrite module on your Web server that examines requested URLs and using a set of rules determines where the new page is. Feed all 404 errors to a CGI script that does a site-search using the words in the requested URL. Read CoolUrisDontChange.

I think this is a pretty big issue, especially with regard to public wikis. In search engines I've designed, I've set up routines by which outside links were checked periodically and flagged as dead when they start drawing 404.

This could be a runtime script which checks each link before writing it to the client, but that would certainly be a server bash on a high traffic site.

More realistic is to have outside links stored in some table somewhere and check them nightly.

With wikis, I suppose a Perl Web client getting the results of wiki.cgi?$pages-slash-eachPage could then parse out the anchor elements and test them, flagging in-line as needed...

Any thoughts? -- JoeTennis

The problem with link checking (there are a lot of good tools out there) is, that 404 is not enough. Some servers may redirect you to a start page, or simply present a not-found document with 200 OK. Some others simply replace the content with totally unrelated stuff. This especially happens, if the target host is a public web space provider. Therefore it is good to make the link checking tools also alert on redirects, or even scan the resulting pages for mandatory key words. -- Bernd Eckenfels