Merging Files And Database

One thing that could improve CloudComputing would be to simplify database configuration and transfers by virtually merging the RDBMS with the file system so that schemas and data can readily to copied or moved to another system via FTP and other "file tools" without a lot of re-fiddling at the new destination, and be accessible via file references instead of just SQL. With many RDBMS, cross-machine moving is not an easy task. If the industry wants to make CloudComputing a commodity (or practical enough to join), then changing cloud vendors has to also become a commodity activity.

A value for any given "cell" can be referenced through a path such as:

/databases/databaseX/tableX/rows/keyX/columnX.txt

All values would be represented as text, at least when viewed "as" a file. Optionally it could be also changed at the (virtual) file level [1].

A single column primary or surrogate key should probably be required. We could dream up approaches for compound primary keys, but it would muck up the standard.

Sample schema references:

/databases/databaseX/tableX/schema/columnX/type.txt /databases/databaseX/tableX/schema/columnX/length.txt /databases/databaseX/tableX/schema/columnX/nullable.txt

Most interactions with the database would still be through SQL, but the data is also presented as series of files. A small app or system may actually store them as files, but generally that's an implementation decision that should be invisible to the apps such that one could swap to a virtual-file-only RDBMS without breaking the app.

Creating such a system would probably require close integration between the file system and the database. (IBM-AS400's do something very roughly like this if I remember correctly.)

--top

I wonder if this would be possible (on Linux anyway) using FUSE (http://fuse.sourceforge.net/)

''I once wrote a proposal do implement exactly this. The idea was to use all those nice command line tools on the database. Structured selective archival would be much easier. And of cource text filtering with grep etc. But it was never done. -- GunnarZarncke''

{There's a slightly odd book called UNIX: A Database Approach which had a rather opposite idea, that all those nice command line tools could be treated like queries. Taken to a reasonable extent, it's a sound notion. PICK OS embodied it. See http://books.google.co.uk/books/about/UNIX_a_database_approach_featuring_syste.html?id=Ns1QAAAAMAAJ&redir_esc=y and http://en.wikipedia.org/wiki/Pick_operating_system }

How could queries be specified? Or is that best left to the script.

I was thinking something like

/databases/databaseX/tableX/queries/columnX/...

but dead-ended there while thinking how to do joins. How does AS400 do it?

Why are the queries associated with a given table and/or column here? Do you mean something like QueryByExample? Or perhaps you mean some kind of tree-based wild-carding (treegex? :-) As far as As400, it only vaguely does what's suggested. I rewrote that part to clarify the ambiguity (contradiction intended).

I'd rather not represent queries. I'd only map the data structures onto a two level deep directory. Like

/database/schema/table/column

This is easily done and has no conmplications with query syntax. If you want queries you might consider also mapping persistent views.

I'm still not following. And, why have "schema" be between database and table?

{I think your correspondent doesn't mean "schema" literally, but the schema name. For example, when I create databases for specific users, I often create 'public' and 'private' schemas. So, given a database for a user 'dave', there might be:}

/dave/private/myexpenses/date /dave/private/myexpenses/amount /dave/public/customers/id /dave/public/customers/name

We have to be careful about AttributesInNameSmell. Working out a useful (virtual) hierarchy will require a careful balancing act between brevity and thoroughness.

{If you don't know what a schema is, you might want to look it up. Tables belong to a schema.}

My working assumption is that most of the path references will be via applications and command-line users of data, not DBA's. I chose to optimize that path arrangement for users, not DBA's. DBA's still have access to info they need, it just may be a bit longer for them as a tradeoff. Perhaps to avoid vocabulary confusion, the example I gave earlier could be:

/databases/databaseX/tableX/structure/columnX/type.txt /databases/databaseX/tableX/structure/columnX/length.txt /databases/databaseX/tableX/structure/columnX/nullable.txt

Getting info about the "structure" of given table and its columns is a common need for database users. -t

I considered:

/databases/databaseX/tables/tableX/...

Instead of:

/databases/databaseX/tableX/...

But that would add verbosity to a lot of references. The additional layer would make it easy to have other things associated with a given database, such as queries, but I'll vote for brevity. There will have to be some names that tables are not permitted to have, such as "queries" so that we can have a query repository path under "databases".

The ideal "long" structure would have the category/instance pairing pattern of:

/objectCategory/objectInstance/objectCatgry/objectInstance/objectCatgry/objectInstance/etc...

But, that makes it verbose. Another approach would be the have a reserved prefix for category paths. However, some file systems choke on special characters. -t

Footnotes:

[1] Although I have proposed DynamicRelational with a text-centric feel to it to make it "script-friendly", this topic does not necessarily require changing the way current RDBMS do things. The presentation from the file side is only presentation. How a value is stored internally is an implementation detail. For example, if we save an Integer value through the virtual file system to "myIntegerColumn.txt", we'd get a type-conversion error message if the target file content was "123xyz". A GateKeeper would typically enforce TypeSafety and referential integrity from both the file side and the database side and "compress" the values to however the database system wants to store integers.

CategorySpeculative

OctoberTwelve