Too Many Dimensions

Often we have data that is in too many dimensions.

The general case of having N dimensions of data and trying to make a "nice" 3D or 2D display of some (or all) of the data -- is there some good ways of doing that?

Some special cases:

3 dimensional objects that we want to display on a 2 dimensional computer screen (or paper).
GeographyExample discusses the special case of trying to flatten the surface of a sphere to a flat map (cartography).
TreeMap is a really pretty way of flattening an arbitrary-order tree into 2 dimensions.

And of course there's the classic multiple factor analysis.

(moved from DougMerritt)

Doug, I got a little problem in 3D graphics. Or at least, I think I do. I have to render an object graph as a graph, but graphics engines' scene "graphs" are limited to trees as I understand it. So it seems that what I have to do is morph a tree into another tree? Oh yeah, and handle the transition between the two! Any ideas on this? Or at least whom to go to? Or even hopefully that I'm completely out of whack? -- RK

Not to steal Doug's excellent thunder below, but have you considered TamaraMunzner's Hyperviewer RK? http://graphics.stanford.edu/~munzner/h3/ - it's just straight C but I've adapted it for such purposes to very good effect on several occasions. And it's beautiful ... --PeterMerel

Yeah, sure...this is not an uncommon issue in an abstract sense, if I understand you correctly.

It's often reasonable for a toolbox to be limited to trees, precisely because tree algorithms are very straightforward in general - just do recursion, and ta da, done, whereas graph algorithms are often too difficult to solve (in a formal mathematical sense).

So the toolbox choice is arguably reasonable, but we have graphs in real world problem domains.

The solution is usually to simply pick a method of traversing the graph that suits your particular problem domain, and use that traversal method to produce a tree that the toolbox can then crunch.

Naturally, the output tree may be either a literal construction in memory, or in some cases might be a sort of virtual tree that is implied by handing the toolbox a traversal iterator.

For instance, I may set up a dependency graph that reflects which components of my system depend on other components (as inputs, as modules that need to be compiled [this is what make/makefile does], as lazy computations that need to be realized, as objects that need to be realized ... whatever).

I don't know ahead of time whether there are any cycles in that graph or not, but in most problem domains, there will be. I choose a graph traversal method that detects whether I've visited a node before (e.g. with a marker flag), traverse the graph, and emit the traversal as a tree. Nice and simple.

That covers most such situations, but if I've misunderstood a nuance, let me know. -- DM

Hmm, on second thought, I probably *am* misunderstanding. You mean that the graphics toolboxes in question just plain refuse to draw anything with an interconnect of the general graph form, only trees? If that's the problem, then it seems to me that it's just a limitation of the tool, and how could there be any solution but to switch tools? Well, maybe I'm misunderstanding yet again. -- DM

As I understand it, a node in a 3D graphics SceneGraph represents a space, and the entire graph is a tree so that each space is inside of only one other space. This way, when you leave the inner space, the 3D engine always knows which space you're leaving TO, because there is only one candidate for outer space. Basically, spaces are well-ordered with regards to the nesting relation.

In fact, this is an entirely reasonable and appropriate limitation for nearly all cases. It doesn't adversely impact any graphics applications such as games or chemistry modeling because all of these applications are 3D. So it's always possible to generate a static 3D world and freely navigate within that static 3D world. Object movements can be severely limited either to a given space, or those objects can be modeled as free actors just like the user.

Now, the reason this isn't good enough for me is because I have an N-dimensional graph which I have to represent 3-dimensionally. And this is only feasible because in practice, a local neighbourhood will almost always have a natural projection into 3 dimensions. A natural projection being one which:

is easy to discover
does not have too much clutter (cross-cutting links)
does not distort links (which all have distance = 1) too heavily (distance > 2)

The problem is that the many, many local projections which constitute the user's view of the graph are radically inconsistent. So when a user navigates around the graph, every object seems to move around of its own will as a response to the user's movement. Another way to think about it is that the user is moving in N dimensions but since they can only see 3 of those N dimensions at any one time, the user can't perceive their own movement in the N dimensional space.

What it comes down to in practice is that all of the objects you encounter are full-fledged actors, and that their movements are dependent on your own. But that's exceedingly unhelpful because if you treat every object as an actor, then there is no scene, nothing left, for them to move in, and all of the spatial relations between the actors (which is what a 3D engine is supposed to support) is left entirely up to you to produce.

I think you're right though. I think it's a severe limitation of the tool that you can't nest a space inside of multiple other spaces. I think that if I make scene-graph links bidirectional, then I can pop out of one space into any number of other spaces. The trick will be to break the tyranny of the World Coordinate System in favour of purely Local Coordinates and to be able to freely redefine the root of the scenegraph.

This really ought to go on some WikiPage. -- RK

Ah. That clarifies. The theory of projecting N-dimensional graphs onto lower dimensional spaces is quite, quite extensive, so there are plenty of known results on the topic. The horribly problematic case is projecting to 2d, which is of critical importance for circuit layout in chips and on boards. Fortunately, the main obstacle disappears entirely when projecting onto 3D, and it becomes a question of secondary constraints.

You said you don't want "clutter", but one needs a firm definition. Do you mean visual clutter when it's projected onto the 2D screen the user sees? That gets into difficult, sometimes unsolvable, issues, but there are reasonable algorithms that will often find a reasonable albeit suboptimal layout.

You also have a constraint on the distance distortion, and that also can lead to literally unsolvable problems, depending on the input graph, but again there are algorithms for finding suboptimal solutions that help minimize the distortions.

Knuth came up with an ingenious algorithm that can work very nicely on a large number of real world problems, that he called "boxes and glue", used in TeX for layout. His problem was 2d, but it doesn't matter. The idea is that each link has associated "glue" with parameterizable "stiffness" - basically a minimum and maximum range of distances allowed, a preferred value within that range, plus a weight that determines how easy or hard it is to stretch or compress the glue. At the same time there's a penalty (zero to infinity) for "snapping" the glue - breaking the link altogether because the layout isn't solvable otherwise.

The link distances are then either directly computed via e.g. dynamic programming techniques, or more generally and more easily, by doing a number of iterations over the graph, adjusting links to minimize local distance distortions. (For example, one could use AstarSearch.)

If you do your own N-D to 3-D (or 2-D) projection using an algorithm like this, then you will have chosen the acceptable distortions yourself, and the output should be then acceptable to your graphics tool (i.e. it shouldn't screw it up because it's already massaged into an acceptable form). -- DM

Well, the fact that "the graph" in its most extensive form spans the entire planet, is highly dynamic, and that your machine can only ever hope to see a tiny portion of it, that introduces certain problems. I seriously doubt that it's possible to process the entire graph, or predict where the user will navigate sufficiently to process what they'll see in advance. And if you can't process all of it, then you're inevitably going to generate inconsistencies between the local projections.

I'm projecting from ND to 3D only. Control over the 3D to 2D projection is in the hands of the user who can freely move and rotate around the structure. A small local neighbourhood of the ND graph is the set of nodes with link distance 1 from the central node. A large local neighbourhood has link distance 2. Any more isn't considered feasible or at all useful.

Clutter is defined as any link that doesn't go from a node to its immediate neighbour. So if you have 5 nodes (A to E) in a star configuration around A with B to E going clockwise around it, then the links BC, CD, DE and EB are not clutter and the links BD, CE are. The more nodes there are, the higher the proportion of links are going to be clutter. But this isn't a concern since you cannot use the order of nodes around the central node to reduce clutter. The order of nodes is determined by a user-selected sortBlock (alphabetic, size, usage, change rate, custom). Come to think of it, that pretty much eliminates any hope of controlling clutter right there and then.

And if you can't control clutter, I don't think you can control distortions either. Distortions occur because all links have distance 1 but crosslinks are represented with distance > 1. With a visible neighbourhood of size 2, you can even encounter cross-links of length 4, which is quite a bit of distortion. IF the amount of distortion is usually low then I might have trans-component links represented as distance 2 in addition to being a different colour, but the idea only just now occurred to me.

Maybe, just maybe, I could have a "least cluttered" sort order? But in fact, I'm not worried about either clutter or distortion. If it happens, it happens. I bring them up only to justify limiting my visible neighbourhood to distance 1 or 2, and to justify accepting radical inconsistencies between projections of different neighbourhoods. What I'm worried about is:

constructing those radically inconsistent projections out of the same individual parts (nodes), and
presenting a smooth and continuous transition when navigating from one projection to an adjacent one

-- RK

But what constitutes a "radically inconsistent" projection?

The second part is easy, again straight out of theory. It is necessary and sufficient to have mappings change smoothly when adjacency changes smoothly; in a continuous domain this would mean a non-discontinuous infinitesimal change in the map variable as the location variable changes infinitesimally, loosely speaking. Discrete domains are inherently less well behaved, so it usually works out best to pretend that they are themselves projections of an underlying continuous domain, in which case you get the right smoothness guarantees.

To put it another way, you could guarantee this by creating the transition function as a spline across all adjacent points. The whole point of splines is to create smoothness across adjacent points. -- DM

What's inconsistent projections? Let's say the graph has 4 nodes A to D arranged such that A is the root, B and C are subnodes of A, and D is a subnode of both B and C. That's the topology.

Now, the geometry is that A is at top-center, B is middle-left, C is middle-right, and D is somewhere off of either B or C. Now, if D is in lower-center then it's an equal distance away from B and C. However, it's trivial to make this geometry impossible for some node by the expedient of adding more nodes D', D, D until there's just no more space between B and C.

So in general, D is going to be off-center, either (off of B) lower-far_left or (off of C) lower-far_right. Now, without loss of generality, consider the case where D is lower-far_left (off of B). When the user navigates down from A to C, then C becomes top-center, A becomes left-way_above, and D gets transported to lower_left. So despite moving away from D, D ends up closer than it was to begin with. That's pretty damned inconsistent.

Okay, that may come in useful once I learn what splines are; I missed class the day the prof explained splines. In any case, I realize now that I mischaracterized my problem.

It isn't a dimensional projection problem at all. The space being projected onto is heavily constrained because up and down have semantic meaning, and so does ordering of subnodes around a node. The only degree of freedom is the choice of start location for the sequence of subnodes, and that corresponds to only a fraction of a dimension. It's still important enough to be taken into consideration but not for the kinds of things you were thinking of.

Fractions of a dimension around a node can be aggregated enough to be useful in the following case. If:

B and C are immediate neighbours in A,
B' is a node around B,
C' is a node around C
there is a link between B' and C'

THEN B and C can be orbited (B' rotated in B and C' rotated in C) such that the link B'C' is shortened to distance 1.

Still, the semi-free clutter-removing reordering (where up and down retain their semantic meaning but ordering of subnodes does not) remains interesting. IF the algorithm is utterly linear, small changes in topology result in at most small changes in geometry, then the clutter-free ordering would be extremely useful for users to memorize and navigate in. But any candidate algorithms would have to be distributed to be implementable in my framework.

Finally, I think I may have found a way to use trees after all. The key insights were that every node has to allocate slots around itself for subnodes that it doesn't render (because they appear off of some other node). And that every node has to allocate slots for its parent, even though the parent takes care of the rendering. Once you've got those two constraints, the before and after trees are a lot more consistent than I thought. It turns out that objects do move around, but they only move around in very limited and well-defined ways. They only move from the space they're rendered to a space where rendering was suppressed. And they only ever move during a transition, which has a very well-defined sequence of events. The remaining sticking points are switching coordinate systems, which shouldn't be a problem, and faking a smooth transition between the central node rendering itself in its own space, and rendering itself in a subsubspace which it gave away. That also shouldn't be too big of a problem, just a question of tweaking the implementation until it clicks.

What I learned today, I have a big tendency to generalize problems I'm facing. This isn't the first time I think I've got a big fundamental problem that turns out to be an implementation detail. It's just that I couldn't solve this one before I was able to ask people about it. -- RK

That happens to me also. See CardboardProgrammer. -- DC

As you have commented before, yes. :-)

When mapping, something always has to give. In projective geometry and non-Euclidean geometries in general, many interesting questions arise about what is, and is not, preserved, e.g. angles, distances...

Kohonen came up with a very nice unsupervised training algorithm for mapping from a high dimensional problem space down to a low dimensional solution space (such as, but not limited to, a 2D array of nodes) based on fairly arbitrary input constraints. In effect, it can e.g. cause the projected space to maintain relative distances or directions as closely as possible, even though they can't be literally the same, and even though some parts of the mapping will suffer fairly random distortions discontinuous from the rest - this is in general unavoidable.

Splines

Splines have been described in both simple and in highly complex ways. At their simplest, they are a way of joining two functions (line segments, planes, whatever) at a point (which generalizes to line, plane...) in a way such that smoothness is precisely specified: a zero-order spline gives no smoothness at all; the two unrelated curves simply terminate in the same point (as in linear piece-wise approximation). A first order spline guarantees that the two curves share a first derivative at the point where they join, and that in turn defines the simplest kind of smoothness.

Higher order splines, say an Nth order spline, means that the two curves share Nth derivatives, which also means that they are determined by N nearby points, not just 1, which can be bad or good depending on what one is doing...moving one point will also affect N-1 other nearest points (and the curve running through them). For most work, 1st through 3rd order splines are most useful, although some industrial design (e.g. some auto body design) may use up to 7th order in order to better approximate solutions to hairy nonlinear integrodifferential equations.

Reference for the above descriptions of Nth order splines?

Another issue of practical import is whether one insists that the spline control points actually lie on the curves in question or not, which is often what determines someone's choice between the two most popular spline families, B-spline and Bezier.

Outlines for drawing fonts in Adobe type 1 went B-Spline, Microsoft/Apple TrueType? fonts went Bezier, they each have strengths and weaknesses, but obviously both worked out from the point of view of the user.

-- DougMerritt

Um, no. TrueType? and Adobe both use Bezier curves. TrueType? uses only 2nd order ("parabolic", "quadratic"), while PostScript used 3rd order Bezier ("cubic"), - see http://en.wikipedia.org/wiki/Truetype, http://visual.wiki.taoriver.net/moin.cgi/BezierCurve. DonLancaster says this makes PostScript superior to TrueType?. -- DavidCary

It's been a long time since I implemented a Postscript interpreter and Type 1 and TrueType? software, so I could easily be forgetting. But I don't think so, although I said it backwards(Postcript Type 1: Bezier, TrueType?: B-spline), and although B-splines are a generalization of Bezier in the first place.
http://www.faqs.org/faqs/fonts-faq/part1/
- I can rummage around for the definitive Type 1 and TrueType? specifications...in a box somewhere upstairs...if you don't like the FAQ
http://mathworld.wolfram.com/B-Spline.html
http://mathworld.wolfram.com/BezierCurve.html
DonLancaster is a man of strong opinions. Ahem. :-) In this case, although it is controversial, font designers without an axe to grind will usually agree that Type 1 can be better in the highest quality implementations, but definitely not simply because of which spline they use, it's because the hinting system is worked out far better in Type 1 (there is an interaction between this and the splines used, but it nonetheless is not the predominant issue).

See http://visual.wiki.taoriver.net/moin.cgi/DataVisualization http://www.usemod.com/cgi-bin/mb.pl?MacroScope IncrementDimensionsPattern AlgorithmsDealingWithMassiveData AlgorithmsWanted

EmployeeTypes

What? What does that have to do with this page? Nothing, apparently!

There are multiple dispatching and classification factors.

Is that a joke springboarding off of your comments about people talking about dispatching methods in various languages?
Anyway, joke or not, I propose removing that link due to massively random irrelevancy.
Perhaps I am misunderstanding the intention of this topic. Is this about visual display issues only? Let's first narrow down what this topic is and is not. My interpretation is that it is about people, both developers and users, being overwelmed by the quantity of independent or semi-independent factors.
No, that's not what it is about, but aha, now I see why you added the link. It's about how to do dimensional reduction. There is barely any mention of why to do dimensional reduction. So that would be like categorizing a page about algorithms for fast multiplication under "accounting"; it's true accounting uses multiplication, but so does every other field.
Also, in this case the link is even more tenuous; offhand, I see no way at all to apply what's discussed on this page to the issues on EmployeeTypes. There may be some way to do so, but it is very far from obvious.
P.S. Although I wrote most of the text, someone else moved it to this new page, and I would not have chosen exactly the name "TooManyDimensions", myself; it's not quite apropos, and I think the title misled you.