Inter Wiki Overlap Map

Here's what I'd love to see: a visualization of the nodes in each of the various wikis, focusing on which wikis have overlapping pages/concepts. Seems like it could be a very useful way to visualize roughly which wikis are covering which sorts of topics, you'd see the way that WardsWiki and MeatballWiki overlap on matters of community dynamics, and the way that WardsWiki and Why2K overlap on matters of spirituality, etc, not to mention how WardsWiki and GreenCheese overlap on ... well, okay, they don't.

Of course, I have no idea exactly how you'd generate such a thing. It's easy to imagine writing a script to spider all the wikis and count duplicates. To automatically generate a graphic based on that data would require something like GraphViz.


 let W = {w1, w2, .... wn} be your wikis
 let f(wi, wj) = 1 for wikis with no overlap, 0 for wikis with perfect overlap
 let P be a set of functions P1, P2, etc. where Pi(wi) = (x,y) maps wikis to particular points in 2d space for a visualization
 let D(n,i,j) = distance between points Pn(wi) and Pn(wj) for the Pn visualization  
 let E(n,i,j) = [f(wi,wj) - D(n,i,j)]^2
 let TE(n) = sum of E(n,i,j) for all 1 <= i < j <= n
 then P1 is a better visualization than P2 iff TE(1) < TE(2)

now assume we've picked our favorite P function and call it p let NM = {nm1, nm2, nm3, ...} = set of wiki names in W let Dist(x,y,i) = distance of point (x,y) from p(wi) let ss(x,y,i,j) = 0 if NMj is not in wi +infinity if Dist(x,y,i) = 0 1/Dist(x,y,i) otherwise let s(x,y,j) = sum of ss(x,y,i,j) for all i indexing a wiki when user clicks on point (x,y) in visualization p, rank Wiki names by s(x,y,j) and display top ten
-- SteveHowell


I don't think that nodes are a good basis for comparison. WikiOne might just have a PageAboutThis containing an interwiki link "See WikiTwo:PageAboutThis". WikiTwo might really contain content. Is this a perfect overlap?

Next, WikiOne could have node LanguageEsperanto and WikiTwo has a node EsperantoLanguage. No overlap?

IMO it would sense to make the page size available and split the page names to the single words. Then one could sum up the content size that is related to the certain words. Now one could use a %hash = { word => contentsize, ... } for an overlap calculation.

See also WikiWord.

-- HelmutLeitner


If Wiki One completely refers to Wiki Two for some page, that is perfect overlap on that subject. In other words, the communities are so similar, it's pointless to duplicate the page, and even annoying to some users. In more dissimilar communities, you might need two separate pages, because the two communities will address the issues in completely different ways.

The LanguageEsperanto vs. EsperantoLanguage issue is tough. One thing you could do is sort all Wiki names so that the words go alphabetically. I don't like dealing with individual words, because a lot of words have no real meaning by themselves. For example, what do "Inter," "Wiki," "Overlap," and "Map" mean individually?

Given the mathematical definition above, we are in a sense debating how to construct the "f" function. I am more curious about the general graphing problem. Given any set where you can establish a matching function "f" on all tuples from the set, how do you best create a graph to visualize the matchingness of the elements? Once you get more than three elements in your set, and assuming you constrain yourself to two dimensions, the problem gets real tough. Right now I am imagining some sort of iterative geometric construction, but I'm real fuzzy on the details. I wonder how much TE(n) suffers if you always generate the P function for n points by simply adding a new point to the best P function for the previous n-1 points.

Also, suppose you have a set S with matching function f, such that you can construct a visualization with TE = 0. What are some necessary or sufficient conditions of f?

-- SteveHowell

I see no reason why construction must be automatic. I'd have people just do it themselves. I'm working on an entry on MeatBall called "WikiCanonicalization?" that suggests the original idea on this page.

Basically: each wiki can make a map of its cognitive neighbors. Wikis that lightly overlap are drawn more distant; those that are covering the same space are drawn close, tight.

There's no graphing problems, even though you have a 2 dimensional map, because it is beneficial to exclude the vast majority of wikis, even many/most of those that touch in one places or two.

There is no reason to include, say, WikiPedia, because everyone knows about it. (i.e., HTML pages don't need to link back to Google..)

When you construct and maintain the maps by hand, you can emphasize the important and not even mention the trivial. We can't trust our maps to computers. And there's really no need to.

-- LionKimbro


EditText of this page (last edited April 3, 2014) or FindPage with title or text search