Data Manipulation

DataManipulation is the Manipulation of Data. ;-)

That's obvious, of course.

The ability to perform DataManipulation presupposes that you have Data to Manipulate, meaning you must have a DataBase.

If you wish to delegate and/or automate the task of DataManipulation, you must have the ability to express, through either imperative or declarative statements, the precise manipulations you wish to perform. Any such expression is necessarily part of a language. A language designed specifically for the domain of DataManipulation is called a DataManipulationLanguage?. The most popular of such languages is the StructuredQueryLanguage (SQL) which is used for many RelationalDatabases, but there are many others.

(DataManipulationLanguage? (DML) is generally considered a subset of StructuredQueryLanguage (SQL). The "other part" of SQL would be DataDefinitionLanguage? (DDL) -- including CREATE, DROP, GRANT, REVOKE and ALTER statments.) - agreed, and that's not even counting the parts for concurrency management and transaction support.

There is some confusion inherent in common terminology. Conceptually speaking, querying something should not involve manipulating it (and though Heisenburg would call that WishfulThinking in general, it's actually doable for Data). A query to a database is a request for useful information regarding which you desire a response. A manipulation over a database is the performance of some alteration to it - the addition or deletion of data items (datums?). A perhaps more appropriate term would be 'Database Manipulation Language', but CestLaVie.

The manipulation portion of SQL consists of its 'Insert', 'Update', 'Merge', 'Truncate', and 'Delete' operators. Only SQL's 'Select' operator is used for querying. (Other portions of SQL are for concurrency control, security, data definition, etc.)

On the Purpose of Data Manipulation

(From a discussion in DataSpace)

There are also limits in purpose for DataManipulation:

Any open-world manipulation must (by definition) be performed from outside the closed system associated with the DataSpace, and thus will be based on the reason the DataBase exists. This reason will involve either or both of:
- Reflection -- making sure that the information in the DataSpace is consistent with some external world
- Projection -- where the data defines the world, albeit a virtual or simulated one, so updating the data updates the world
Any closed-world manipulation will have a purpose grounded in computational costs because one may neither gain nor lose real information and continue calling the manipulation closed-world, and thus will be either or both of:
- Inference -- creates more value from existing data. One might use induction or deduction to obtain valuable information from less-valuable data (e.g. in response to a query), or attempt to shift anticipated computational costs from a time when they are more expensive to a time when they are less expensive (e.g. in anticipation of a query). The latter could include performing induction or deduction ahead of time, but also includes such things as indexing, which is a form of inference in the sense that it adds a data-object (in particular a meta-data-object, the index) to the database.
- Maintenance -- where unneeded facts are removed, generally with the intent to reduce computational costs associated (usually space and search-time). Data items are unneeded when they may be derived from other facts in the DataSpace or when they will never be relevant to any future queries.
- Query -- In a sense, Inference always grows the data set while Maintenance always shrinks it. However, not all inference needs to add to the persistent data set. Making a volatile inference logically rolls inference and maintenance into one action - you derive new facts, fire those off to whomever requested them, then immediately remove those facts as unnecessary to future queries. This sort of volatile inference most closely corresponds to the notion of a query response.

I don't believe I'm missing anything, there. Reflection and Projection are conceptual duals (Projection casts an internal world of facts into an external one; Reflection collects an external world of facts into an internal one), as are Inference and Maintenance (Inference 'learns' and Maintenance 'forgets', both to increase the value of the database), and as are open-world and closed-world (open-world can add or removes information; closed-world cannot), so these provably cover all possible purposes for DataManipulation.

"I want PowerfulAdHocDataProcessingTools."

I have lots of data that isn't held hostage by a RelationalDatabase. I have data in CVS, for instance.

Do you mean ConcurrentVersionsSystem (CVS) or CommaSeparatedValues (CSV)? Not that it matters much. I'd question what you mean by "not held hostage by a database" [note: above statement since changed to 'RelationalDatabase'], since even a CSV or XSL file is (at least by many DatabaseDefinitions, including DatabaseIsRepresenterOfFacts) a DataBase. It just isn't a DBMS. And the basic reasons for DataManipulation upon it do not change.

ConcurrentVersionsSystem (CVS) doesn't respond well to SQL queries. But it does contain lots of data I find quite valuable. "Who changed this file last month?" could be useful information. And "Who calls this method?" could also be useful information. Attempting to get this data with SQL queries could be... interesting.

Advanced query support in any RefactoringBrowser or IntegratedDevelopmentEnvironment would be very useful indeed... you'd almost expect it to be state-of-the-art by now.