From http://www.link.cs.cmu.edu/link/ :
You give it a grammatically correct or near correct sentence and it will tell you what part of speech each word is. - http://www.link.cs.cmu.edu/link/submit-sentence-4.html lets you use it to analyse single sentences, http://www.link.cs.cmu.edu/link/ gives a more complete outline. The output is a little cryptic, but a full explanation of all the terms is given on http://www.link.cs.cmu.edu/link/dict/index.html
Hmmm:
(S (VP (S Worthwhile (PP (PP for (NP (NP people) (SBAR (WHNP to) (VP know (PP about))))) , but just (PP for (NP the record)))) ,) (S (NP it) (VP is (PP beyond (NP (NP the state) (PP of (NP the art)))) (S (VP to (VP (ADVP accurately) (ADVP automatically) analyze (NP (NP the parts) (PP of (NP (NP speech) (PP of (NP words))))) (PP in (NP unconstrained text))))))) .)...looks pretty damn accurate to me!
It's nice that it worked well for you, but that's a different question, really.
No matter which AI problem we talk about, it is well known that they are all quite difficult, and it is quite important to know how accurate one can expect an algorithm to be. In testing OCR systems, the algorithms are often 100% accurate on samples "in the lab", but when tested on samples from the wild (e.g. seventh-generation photocopies with never-before-seen fonts), accuracy can drop to 60%. That's not hypothetical, I'm talking about tests I've run myself.
Similar things happen in every AI domain from machine vision to linguistics. Accurate tagging of parts of speech is well known to be one of those problems. Yes, various algorithms can approach 100% accuracy on some kinds of sentences, but no, they don't on a large corpus e.g. drawn from random magazines and newspapers.
Just as a by the way, 95% sounds good, and is good enough for some purposes, but for many other applications something closer to 99.9999% would be needed (e.g. when a technology is competing with minimum wage human operators).
-- DougMerritt
What are you saying? That if an AI problem is ever essentially cracked, it ceases to be an AI problem? The links provided open the algorithm to a theoretically infinite corpus... so let the failures be noted and the reasons discussed (that might fail, let's take a look...)
(S so (S (VP let (NP (NP (NP the failures) be (VP noted)) and (NP (NP the reasons) (VP discussed))))))...wrong, I think. Compare the expanded sentence "let the failures be noted and let the reasons be discussed"
(S (S (VP let (NP the failures) (VP be (VP noted)))) and (S (VP let (NP the reasons) (VP be (VP discussed)))))...failure under ellipsis with conjunction
No, no, I'm saying that your results are not following rigorous standards for statistical analysis, so they are statistically meaningless, whereas for most purposes, when evaluating AI algorithms, people need rigorous statistically meaningful measurements. This is an area I've done a lot of professional work in, and I've seen enormous differences between anecdotal evidence from casual testing versus careful methodology using large test sets painstakingly gathered from a large number of real world sources.
Well they're not my results, but I agree that they are anecdotal rather than statistical evidence. When testing such things, it is human nature to try to break them, so I would expect the anecdotal evidence to overstate rather than understate the error rate, but who knows?
Only if you're evil minded like I am. Try it on Shakespeare, poetry, street jargon, surrealist essays, and real horror cases like literary/theatre/art criticism. :-)
Since I am predisposed to it and hardly likely to be fair, and since you are, by your own admission, evil-minded, I would suggest that you try it out and tell us your results. Further, I did say "grammatically correct", into which category Shakespeare, poetry, street jargon, surrealist essays, and real horror cases like literary/theatre/art criticism frequently fail to fall.
It does appear that we are talking at cross-purposes. You seem to be using the term "unconstrained" to mean "random mish-mash of words that nonetheless manages to communicate something to someone."