Garbage Collection For Wiki

Once a month, run a script which identifies Wiki pages which have not been requested in at least 90 days. The script edits Wiki page MarkedForDeletionFromWiki?, appending the date and time the script was run and the names of the pages found.

Any page which is edited is automatically removed from the list of pages marked for deletion. Also, any page which is requested from a page other than the marked for deletion list is removed from the list.

The next time the script is run, it will check for any sections of its marked list which are more than 90 days old. Pages in that section will be deleted.

-- ChrisBaugh


Excellent description of GarbageCollectionForWiki. Please don't implement it here. A delicate balance exists between deleting pages and creating them. When deletions were impossible, it was slightly out of balance. Now, deletions only happen when someone chooses to do so. This seems to me to be a good thing. If the amount of garbage is large enough to bother you, why not clean up a few pages? If not... just let it be. -- MichaelChermside


Michael, no worries about my implementing it here because I don't know how to change the Wiki code on c2, and if I did I wouldn't without checking with Ward first.

I thought the idea might spark some interesting discussion, and could be useful to a WikiClones implementor.

Thanks. -- ChrisBaugh


Note that this isn't GarbageCollection in the normal sense of the term. GarbageCollection usually means deleting things which can't be reached from the active things. It would be possible to use this on wiki, too; we could define the RootSet? (as, say, RecentChanges and WelcomeVisitors, and perhaps some other key pages; or perhaps using an activity metric as described above) and then identify and delete any pages which were not reachable by browsing from there. The hitch with this is that the 'garbage' pages might be reachable via the ReverseIndex, and so their loss might be noticed as decreases in the populations of categories.

The original proposal is more like TenuringForWiki?. One of its consequences would be that links on actively-used pages were deleted.

Perhaps the strongest objection to automated reaper is that it can't really judge the value of a page, only how often it is read, how easy it is to find, etc. There may be some really good pages out there which aren't being read; it would be sad if they were to be deleted. On the other hand, there are plenty of junk pages which are fairly active!

I think that the proposal has taken care of your objection to an automated reaper. After it runs, the nodes in question will show up in RecentChanges, and the list exists on a wiki page that people can review for 30 days. If nobody has claimed it after 30 days, presumably it has passed human muster as well. -- BenTilly


GarbageCollection, as the term is used here, implicitly assumes that objects are known only by their references from other objects (their ObjectIdentity). Reliable GarbageCollection is impossible in any system that allows object references to be computed, because the collector/scavenger has no practical way of determining which computations might result in a reference to any given object. The collector therefore cannot reliably determine when an arbitrary object has become garbage. An alternative is to shuffle objects that have not been referenced in a very long time to tertiary storage - they still exist, but it might take a very long time to access them (because an operator might have to retrieve a piece of plastic from shelf someplace).

In a system like Wiki, this potential problem is exacerbated because "object references" - page names - are, by design, easy to calculate. In fact, the opportunity for "accidental" links - such as to GarbageCollection in this contribution - is an important aspect of WikiNature.

Pages with few or no links from other pages are not, therefore, "worse" than pages with many links. I suggest that we lead ourselves further and further down a mistaken garden path (or rathole, if you prefer) by making an ill-advised analogy between page references on a Wiki and object references in a software system.


CategoryWikiMaintenance CategoryGarbageCollection


EditText of this page (last edited May 31, 2006) or FindPage with title or text search