Signature Survey

A position paper presented at the OOPSLA 2001 workshop: Software Archaeology: Understanding Large Systems

from the introduction ...

Typically, a browsing mechanism for large, structured programs will expose a small part of the program in response to some form of query, possibly as simple as picking from a list. However, when examining an unfamiliar program one needs to get a feel for the whole program at once. To meet this need, one is better off writing a custom tool for viewing the whole program. Such a tool is easily written as a cgi script which runs on a server, reads all of the code base and then reports a summary of that code as a single html document.

See also TipsForReadingCode.


I recently started bringing the idea of using the signature survey method into the German .NET Community. I wrote a short summary about it, referencing your original article. I summed it up and provided it here:

I find it an extremely useful tool, although my understanding of why it is useful differs a little from your original ideas. Rather than actually browsing code I reduce it to creating a very flat list. I just wanted to say thank you for a very great idea that today, at a conference, inspired a lot of people. -- Johannes from Heidelberg, Germany


A version of SignatureSurvey exists for wiki. Try browsing with it and compare it with VisualTour and LikePages ...

The wikiSig script finds all of the pages cited on a root page and summarizes their structure. It is a shorthand for the structure of the page. It currently uses the following code:

A very simple example is the PietHein page. It's sig reads as follows:
 ..| H | C 

The best way to grok the SignatureSurvey is to mentally turn it on its side.


BrianDorsey? suggested I try applying SignatureSurvey to wiki after he and AsimJalis had done the same to the SeaPig MoinMoin wiki. See http://www.seapig.org/PageSignatureCode for their report. -- WardCunningham


I'm looking at this and thinking it could be very useful for refactoring. At a glance, you can see all the long pages, threadmess, external links, etc. But the characters it picks out could be better suited for this:

-- JonathanTang


This reminds me of the people in the movie TheMatrix watching characters "rain" down the monitor and seeing reality. -- DaveParker


As an alternative to using html to generate the summary and detailed view, you could use an IDE's cross-referencing ability to move around the code quickly. All you need is a high-level signature view, which can be easily provided through a plugin.

However, it might not be enough to generate a fixed signature - since the important thing is the active exploring... "forming theories" about the code and "refuting them". So there should be a way to define the signature pattern yourself.

This is a first attempt at such a plugin for IntellijIdea - http://www.intellij.org/twiki/bin/view/Main/JavaSig. It's a simple plugin, that generates a signature for each java source file in the project by replacing a given regex with an empty string. The regex can be modified.

Using an IDE has the additional benefit of providing an intermediate view - the fully-folded snapshot of the source file (Ctrl Shift Minus - in IntellijIdea).

-- YogiKulkarni


It's not a big deal to me, but I think (non-signatory) double hyphens fool the SignatureSurvey script. -- JoeWeaver

You are partially correct. Double hyphens trigger an 'S' result in the SignatureSurvey only if the hyphens (surrounded by spaces or not) are followed by a WikiWord. Double hyphens are ignored as long as they are not followed by a WikiWord (with or without spaces). See EmDashInAscii. -- ElizabethWiethoff

How does [the script] work, anyway? It shows no results at all on my homepage, but does on, e.g., TopMind. -- DougMerritt

The vertical axis lists all the pages linked to from that page. The horizontal axis displays the summaries for those pages.

For example, take http://c2.com/cgi/wikiSig?PainfulLanguage. You get a list of all links from PainfulLanguage: AbapLanguage, AlistairYoung, BackWardLanguage, etc. Besides each link is the summary for that page. Take MalbolgeLanguage as an example, because that's a fairly short page.

The . indicates the end of the sentence. Then there are 2 newlines, represented as spaces, an H for the external link, and then 2 more newlines. Then there's a P (page-link) for TuringComplete, and another period. Then ,. for the sentence about its Turing Completeness. A , after "However", and then Ps for GeneticAlgorithm and HelloWorld. The next sentence just has a single P, for HaltingProblem, and then there's the rest of the punctuation in the sentence. Then more whitespace, and the final P for BefungeLanguage.

Basically, long strings of punctuation mean dense text. Lots of Ps means someone likes AccidentalLinking, like me. Punctuation + S means ThreadMode. Lots of | (horizontal rules) is also a good indication of ThreadMode, or lots of disjoint comments. The length of the sig corresponds roughly to the length of the page.

This could be a good guide to refactoring. At a glance, you can see long pages that might require splitting, dense ThreadMode, unrelated comments, etc. I wish you could get a summary for a page without having to search for one of its backlinks though.

And I have no clue why DougMerritt's page doesn't show up. Maybe it's too long, or maybe some character in it is throwing off the script.

-- JonathanTang


CategoryWiki CategoryWikiMaintenance CategoryWikiStructure


EditText of this page (last edited October 21, 2012) or FindPage with title or text search