Principal Component Analysis

IIRC this started on WikiDesignPrinciples, then hopped over to PrincipalComponentAnalysis on WhyClublet. I am thinking, though, that maybe we should call the analysis (?) done in both places PrincipalComponentGuessing? rather than PrincipalComponentAnalysis!...

Maybe so, but PCA is the accepted term in the literature, so attempting to change the name to PrincipalComponentGuessing? would not be doing anyone any good.


Q: Is PCA on-topic for the Portland Pattern Repository?

A: Looks so to me.

The conceptual basis of PCA sheds a light, if only a dim one, on the arbitrariness or otherwise of code (and wiki) refactorings.


What PCA Is

http://www.fon.hum.uva.nl/praat/manual/Principal_component_analysis.html

Q: Any real world experience in using this techniques out there?

A: My wife worked on the PCA of a questionnaire related to TV viewers' preferences and habits, some eight years ago (she's not a pro, that was an internship project). I didn't dig very deeply into the subject back then, but I gathered that it was an efficient DataMining technique. As I remember, you could extract an operationally useful typology of viewers that wasn't readily apparent in the original data; I suspect it should be most useful for "Web personalization" projects. -- LaurentBossavit

A:I used it to analyse whale songs. It wasn't particularly successful. PCA may not be the cats pyjamas for two reasons: it's a linear model (which often isn't appropriate), and its not based on any particular probabilistic framework. The GenerativeTopographicMapping? and ProbabilisticPrincipalComponentsAnalysis? are two approaches to removing these restrictions. -- NoelWelsh

A: It's used quite a lot in computer vision and extensively in statistics. A group on Manchester is especially big on PCA techniques and in Leeds we used it for constructing models of the variations of shapes. We've not found the linearity a problem as it can be used a a dimensional reduction technique, often the first 5 or so principal components can capture 90% of the variation. Once the dimensions have been reduced (from many thousands to ten) non-linear models could be constructed on top.

A: It is used pervasively in Geophysics, Medical Imaging, and other areas involving inverse problems in which one tries to interpret noisy physically sensed data by model fitting. Noise might be easily separated from data using PCA if it has intrinsically different statistical behavior, thus being described by separate principal components. Models are always wrong; some are more useful than others [ AllModelsAreWrongSomeModelsAreUseful ]. Model parameters do not always map well to parameters that are well represented by the data. PCA can often elucidate patterns or relationships in the data that are not well modeled, perhaps leading to insightful model improvements. Most importantly, PCA can help quantify the degree of confidence assiciated with the estimate of a model parameter. The application to non-physical data as described below is fascinating. -- Peter Kaczkowski

A: PCA is one of the most widespread methods in chemometrics (mathematics in chemical analysis). A number of software packages exist to process data from instruments like gas chromatographs, NIR spectrometers, etc. A whole group of instruments called electronic noses is havily using PCA. In these electronic noses data from several chemical sensors is analysed by pattern recognition methods. Since the data from the chemical sensors in these measurements is often very correlated, PCA is used to reduce the dimensionality of the data, while preserving the pattern. The data is then shown in a so called scores plot which will display the most important part of the (multi dimensional) information (or rather the variation) of the data in 2 dimensions (which really helps to understand the data fast). PCA thus helps to reduce the complexity while preserving the most important content. (Imagine you could do this in a Wiki...)


It's commonly used in financial modelling. For instance typically people might use 3 factors to model the term structure of interest rates (how much it costs to borrow money). These capture about 90% of variation (factor 1: overall level, factor 2: how fast rates are going up or down, factor 3: how much of a "twist" there is in the curve).

It's an extremely important technique because financial problems might have hundreds of degrees of variation (e.g., you're pricing an option on a basket of hundreds of stock prices). PCA lets you reduce the dimensions of uncertainty to a manageable level.


The analysis of WikiDesignPrinciples has been moved to this page and appears here with discussion ...

Can principal components of these eleven factors be derived using PrincipalComponentAnalysis? My attempt at this:

So by this change in axes, 'MUNDANE' contributes positively to low barrier of entry but negatively towards organization of data..

Would these be architectural attributes? I would think so. -- RichardHenderson

"To the man who only has a hammer, everything he encounters begins to look like a nail." -- AbrahamMaslow

For some values of 'hammer', everything is a nail.

The Wiki is both hammer and nail, and the opposite of each. -- MoFurness?

Sorry, I do not get the math implied above, it could be that I am dense; but,

?why ' - Mundane should it not be exactly opposite? If things work fine, that is okay - mundane, but okay. It is when things don't work or work other than was expected that we lose the mundane. Then things can become stressful - Oh yeah!

As a proposed update to design principles and component analysis - Consider the needs of unexpected asymmetrical event_handlers.

The outcome of effect design principle is Mundane If the design expects the unexpected and handles it just like everything else, the user's experience is Mundane, they use the program for work or pleasure without needing to know how the program does what it does, because it either does what they want, or close to it, or, they are diverted in ways that 'make sense'. Consider as a diversion the diversion most often encountered by users of Windows?- (it is not intuitive how I should insert the tm symbol so reader may or my not see it is there :) ) Anyway the diversion: 'Contact your system Administrator'.

For sure in the home market - that would be the end user and for a world that has grown up on print media the billions of pages of hypertext are daunting for how many users??

So I lose all sense of the mundane when I see messages like this. It is more frustrating than useful. Wiki's have the potential to greatly alleviate this sort of frustration.

Imagine instead of the above message something more like the following.... (The UI is just me having fun but the message should serve the programs' function.)

"Hey it is me - Your OS - I'm confused? It looks like you are trying to do the following..."

What I trust you notice is that instead of a dead-end I can be diverted knowingly to information that will clear up the ambiguity my poor OS is suffering under.

Wikis can do this, but not if we fail to strive for the mundane. The "contact your sys-admin" fails in this one critical area. And, again sorry, but I think; '- unified = + chaos', as does '- mundane'. The math puts 'Low Entry Barrier' on the left side of the equation, but it doesn't belong there. Mundane belongs to the LS. What any user wants is a program or suite of programs to work; regardless of the complexity or how many layers of winnowing required, or concurrent threads - who cares? Only programmers; users just want it to work, so they can get on with their lives. We may crave excitement and adventure, but we do want it to all work out too.


Layers of winnowing? What does that mean exactly?

to subject to some process of separating or distinguishing; analyze critically; sift: to winnow a mass of statements.

to separate or distinguish (valuable from worthless parts) (sometimes fol. by out): to winnow falsehood from truth.

(Random House Dictionary tm) I think of it as moving by step toward specificity of task


In this way, design objectives need to put mundane as an outcome, yet it also should be on the right side as well. Given that the LS Mundane must be expressed as Factor of Mundane, :) Mundane to the next level.

Yet this all strays from the real point. Any program design individual or team that foregoes the primary objective of design ('create something with a purpose') can use this program 'wiki', and have fun too. However, it is not a low-barrier tool, it is not bad; but, I know lots of people who just wouldn't get this. It is not very intuitive, again not bad, but not a low-barrier tool; which may be why this was on the LS in your model. I think a ModelWiki or any program is one that is completely mundane.

Do anything and you get a response; by default, wikis do this, as does anything served up in this medium. "Contact your sys-admin," "pagefault error," or syntax errors are not responses from a user's perspective; they are cop-outs for a program that doesn't work! And that is why mundane (Factor) is LS.

-- Smb


Maybe PCA itself, not being "quick"/easy, is useful for Wiki, but perhaps more user-friendly would be a natural grouping of the 11 principles under philosopher Deweys' 5 common-sense criteria for the truth of a work: Accuracy (usually in relation to external reference), Precision (in using language itself), Thoroughness (a.k.a. Completeness), Consistency, and Relevance (the hardest one of the five to nail down). -- sodaigakko


Wiki is very good for somebody who likes the Internet and data processing -- SN


Related: SingularValueDecomposition


CategoryMath CategoryWiki


EditText of this page (last edited July 9, 2012) or FindPage with title or text search

Why