Definition By Discussion
Organization X starts up their business, gets a big server and a database program, and runs their transactions through it. Their database is tuned to keep track of things like customer contacts and double-entry accounting.
After their database gets really full, they want to ask very hard questions of it. For example: "Based on our demographics, what is the best route for our salespeople to take through a given state, scheduling the most lucrative contacts on the days when their previous buying patterns show that they are most likely to be at home."
Organization X now connects with Organization Y, who goes into the business of "data mining". They claim they can find new ways to read very large swaths of legacy data, to extract more value from it. Along the way, they may find themselves trying to access information in ways the database was not tuned for. They also may encounter requests to access information in ways no known database will allow.
Hours of fun!
Another view by a data mining purist:
The specific problem above asks for a solution of a variant of the travelling salesman problem (TSP) on a large data base. TSP is NP-complete (very hard), even without the practical aspect of legacy systems getting in the way. The legacy system is not what makes it data mining, nor is the solving of NP complete problems. What makes it data mining is trying to determine from a set of data, designed originally to answer one question (Who gets paid how much? -- a billing system), other information that you wish you had collected at the time (Which customers are home when?).
It is also data mining when you try to arrive at the judgement of who are the "most lucrative contacts". Bear in mind that "most lucrative contacts" includes small-volume customers who buy every month and chew up no salesman time with dumb questions, providing they are home when you arrive. "Most lucrative contacts" also includes delivering the ordered case of beer/wine/pizza once a year to the holiday house of the CEO of your most valuable corporate beer/wine/pizza customer. You could just send the truck, but sending a salesman wins every time. Now data mining is not going to do this for you but in making the huge bureaucratic automated system at least part of the trick is still allowing the people on the ground to add the special tweaks.
Data mining often occurs when patterns must be extracted from noisy, loosely correlated data. Solving problems, even NP complete ones, on noise- and error-free data is, by comparison, easy.
Imagine trying to solve a variant of the standard travelling salesman problem where instead of exact coordinates for the towns, you were given inaccurate estimates, given by random citizens surveyed in the streets. In order to get the best answer, you cannot only apply the standard algorithm; you must also form a judgement of the accuracy of each data point.
This is hours (years!) of fun!
The data mining in which I have participated ran along somewhat more abstract lines. In one place, we had a large amount of billing data accumulated over years. My task was to see if there was anything about the data that was "unusual" or "interesting" or which would render their business assumptions invalid. I found a number of correlations and statistical profiles that challenged their business assumptions (as well as some that proved to be eye openers for new opportunity).
In another environment, we took 3 dissimilar business data sets (car rental, hospitality, and real estate finance) and also went looking for the unexpected. Yes, we also did the usual demographic slicing, but one of the goals was to recognize new patterns that would be of use to the business enterprise.
To understand data mining, it can be helpful to think about it in the context of impedance with scientific and statistical cultures.
Much of data mining is about leveraging existing data to make useful predictions. It is different from conventional statistical prediction because data miners don't care much about parsimony, so they don't care about covariance. Where two or more variables have similar predictive qualities (i.e. contain equivalent information), a statistician will endeavour to chose the better of two covariant parameters for their model. A data miner will happily bung them both in if together they improve its predictive power even slightly. This is because statisticians think good predictive models are ones that both predict well and hint at the underlying causalities. A data miner doesn't care about anything except good prediction (for example; what's going to be the most profitable / least risky way to spend my limited resource). That's why data miners are happy with complicated but effective neural nets, but they can creep statisticians out. The universe might be mathematical, but mother nature is a data miner, not a statistician.
Data Mining is an appied science; The machine learning branch of computer science underpins it.
I do data mining for a living and I find the above description of data mining offensive, hallucinogenic, erroneous, and ludicrous. Data mining just involves studying and republishing data that was already here. It can involve complicated techniques to extract data, since the data is not usually in very structured form. Scientists tend to want to study new theories (or existing theories) about the universe, while data miners tend to mine data that came from humans. For example, a data miner does not go around mining information from camels and elephants mouths - instead data miners tend to focus on a branch of human information statistics. Examples are mining data from streams of text, phone calls, banking transactions, internet websites, encyclopedias, books (optical character recognition ), etc. Therefore some data mining is simply a form of analysis (sometimes very statistical) on information - and is not a black magic. All statistics are not fool proof and contain estimations and possible errors - data mining is no exception. Some data mining involves republishing data, combined with statistical analysis. For example a search engine data mines and then republishes a small summary of the original data (existing copyrighted content). None of this is creepy - data mining is just humans analyzing and republishing data that was already there, in a different form.
I can only surmise you deeply misunderstood the above author's statements. Data miners are essentially in the business of statistics-based compression of data into 'theories' (i.e. from the dataset, P has a 95% chance of predicting Q). They tend to verify such theories by splitting data-sets randomly - e.g. half for production of rules, half for testing of rules. These 'theories' are, themselves, the products of a data mining exercise. Scientists often take a similar approach (e.g. viewing a graph of data before constructing a hypothesis), so the above author's characterization on that aspect is a bit unfair - but, still, that scientists favor parsimony is correct (OccamsRazor) - they attempt to invent a simple theory via an intermediate model based on existing observations, then test it against future observations. A scientist's model will tend to be well factored, e.g. drop observations into a model, and out pops a predicted value. Relatively, the 'theories' that come out of a data mining exercise can be complicated or poorly factored... e.g. supposing data matches (precisely) a simple 'model' a^2 + b^2 = c^2 that a scientist or statistician or even a kid with a trigonometry background would discover immediately. Many techniques for automated mining of data will be far from perfect and will result, from this data, in many if not dozens of formulas dealing with various ranges of a and b to predict various ranges of c. And teaching a neural network via this data will be complex 'black magic' from a scientist's perspective - you can't get useful theory back out of a neural network. The advantage of DataMining is that these very same tools and techniques can make useful predictions even where scientists are unable to provide useful theories or models - i.e. the job gets done even if a scientist will find the solution ugly, hackish, and abhorrent under OccamsRazor.
And while the above author said nothing about elephants and camels, there is no reason the same tools of Data Mining couldn't be used to relate dung readings of camels and elephants to foods they have eaten recently (may even be useful and economical - dung is a valuable commodity), or predict their migration patterns both natural and in the presence of humans. Data mining isn't about human statistical analysis - that's simply where it is most popularly applied.
What I’m about to describe has probably been done. If not, it should be. Imagine a computer program that scours scholarly books and periodicals for references. Suppose I refer to essays A, B, and C in my published essay. Each essay, including mine, would be represented by a dot. There would be a line from my essay to A, to B, and to C. Each reference in essay A would have a line from it to A, and so on. Suppose the dots varied in size depending on how many lines they had connected to them. Obviously, scholarly space would be large and messy, but you would be able to see which essays are most often referred to, and therefore most influential. (Influence in this context doesn’t mean agreement with; it means affected by.) I suspect that works such as John Rawls’s A Theory of Justice (1971) and Robert Nozick’s Anarchy, State, and Utopia (1974) would appear as planets in scholarly space, since they have been referred to many times. Most essays would be visible, if at all, as specks of dust. If anyone is aware of a website that contains this sort of thing, please let me know.
Whilst it doesn't use a graphical visualisation like you've described, there is http://citeseerx.ist.psu.edu/
I expect the clustering algorithm - that tries to place essays near one another in the visual model - would be non-trivial.
Curiously the term DataMining is all about extracting RealInformation whereas the term InformationTechnology is all about handling data. --JonGrover