Access Path Independence

AccessPathIndependence is a property that describes relatively loose coupling between:

how data is organized or managed (issues of ownership, control, distribution, persistence, security)
and how data is accessed and processed (in the sense of PowerfulAdHocDataProcessingTools)

Achieving 100% AccessPathIndependence is thoroughly proscribed by security constraints, modularity concerns, and the EightFallaciesOfDistributedComputing. However, technologies common today leave a lot of room for improvement, ReinventingTheDatabaseInApplication and ObjectOrientedDatabases being among the worst of the lot.

There are several existing technologies that aim to tackle various classes of AccessPathIndependence:

RelationalDatabases allow views and ad-hoc queries that reduce concern for the 'ownership' of data (relative to ObjectOrientedDatabase).
LogicProgramming techniques, as found in DataLog, allow relationships distributed throughout code to be processed from a local query. Many such languages (including DataLog and PrologLanguage) possess facilities to integrate relationships found in a RelationalDatabase.
FunctionalReactiveProgramming identifies multiple, specific data resources directly in code, and expresses ad-hoc operations to combine them (which may include relational or logical operators). The result of this effort may be a new data resource. This supports independence from where data is stored and who manages it; if the FRP technology were better supported in the distributed scenario, one could express queries that access multiple, independent third-party databases. To the extent that data resources are securely identified, FRP can also securely cross trust boundaries (without introducing access-control issues!).
PublishSubscribeModel, DataDistributionService, MultiCaster, and various other classes of PluggableArchitecture (when applied to data), reduce the degree to which code or a component is coupled to a specific data resource, and generally allow several contributors. This is orthogonal to FRP, in the sense that FRP might specify a 'specific' data resource that turns out to be a shared registry or MultiCaster.

Suppose we are trying to represent information about employees and the departments in which they work. A system in which choosing the structure for the data involves setting up “routes” between data instances (such as from a particular employee to a particular department) is access path dependent. The relational model was supposed to solve this problem, but I have failed to find pattern recommendations or rules in relational literature (something like normalization rules) with the explicit objective of avoiding the impact of multiplicity changes. The only technology I have found so far interested in directly attacking this problem is ConceptualQueries.

The other technology that seems to be able to achieve AccessPathIndependence is DataLog, which attacks the problem from a "logic predicates" perspective, which got me to remember reading somewhere in TheThirdManifesto book that a relvar represents the general form of a predicate (or something like that)... but if an imperative relational language is unable to provide AccessPathIndependence, and AccessPathIndependence is one of the goals of the relational model... and a logical language approach (as the one provided by Datalog) give us AccessPathIndependence.... does that mean that there is a LogicalRelationalLanguageDependency? due to AccessPathIndependence requirements? In other words, for a language to be truly relational, it has to be logic based... that is, closer to ProLog syntax and way of working?

Another possible way to achieve AccessPathIndependence without a ProLog like language ,could be the approach taken by GeneXus: Define semi-independently the view required by each of the modules of the systems (with one model for for each StakeHolder?), and leave the problem of creating an unified model to NormalizationBySynthesis. That way the queries for each one of the modules is independent from the requirements of the other modules, and we achieve LocalAccessPathIndependence? (while at the same time we automatically get an unified relational database model). Is this the path for encapsulation in relational database design?

The RelationalModel as defined by TedCodd appears intended to achieve a greater degree of AccessPathIndependence than the network and ad-hoc database systems common in the late 1960s, but not necessarily absolute AccessPathIndependence. It's possible Codd regarded relational algebra expressions involving natural joins and explicit references to RelVars and attribute names, etc., as not representing access paths at all. Indeed, his original paper seems focused on tree and network structures being representative of the access path problem.

Guess what happens is that relational did not really kill the the AccessPathDependence?, it just move it out of the DataModel and in to the QueryLanguage, so now the next logical step is to remove it from there (or reduce the ripple effects of changes intruced by modifications in particular modules of the application by follwing GeneXus approach). Or maybe TedCodd always defined conceptual virtual relvars to hide the multiplicity relationships between base relvars and just forgot to add that to some kind of "VisualizationRules?" due to the TheCemeteryOfUnknowns effect... since he always used virtual relvars to avoid AccessPathDependence? he was unable to see that other people would fall in the trap of not defining them... (or maybe he transparently did NormalizationBySynthesis in his brain?)

My reading of Codd (1970) is that you could use relational algebra to generate whatever tables you need from the current tables - and that this would take place (conceptually) *outside* the user application. That is, the DBA would include some relational algebra to support the old version (what's called a "view" now). So, from the outside, the database would never change. But he doesn't explicitly discuss this at all in the paper, just claims that it's possible to make application code run unaltered against the new schema version of the database.

His paper certainly made it *possible* to do that (since it's all tables): in modern terms, consider going through your program, changing all the SQL statements to match the current concrete tables. You could achieve the same effect by generating all the old tables before your SQL statements saw them. Put another way, instead of changing your addressing to match the current table, you could *change the table* (transform it) into the old one; and this could be done outside the application. To sum up, *someone* still has to make them compatible; but it doesn't have to be you. Of course, there's problems here for writing data - but it works great for reading.

There's always been a certain intention, by various implementers and proponents of the RelationalModel, that query languages like SQL would be an intermediate layer between high-level query mechanisms -- such as natural language query systems, visual query systems, or higher level declarative languages -- and the database. They rarely are. Normalisation was intended to be automated. It rarely is.

In 1970, I doubt TedCodd gave much thought to relationship constraints and what would happen if they changed. Just creating a new data model that demonstrated an order of magnitude improvement in achieving data independence was a vast step. We now know that this step is perhaps not as far as we'd like, but that's a relatively recent consideration. One of Codd's few comments along those lines was to note that we should "adopt the policy that once a user access path is defined it will not be made obsolete until all application programs using that path have become obsolete. [However, s]uch a policy is not practical, because the number of access paths in the total model for the community of users of a data bank would eventually become excessively large." Whether at the time he considered the RelationalModel to be the entire solution (by neither being an explicit tree-structure nor a network structure), or considered it to still exhibit that problem in some respect, is not entirely clear. I don't recall whether his later works clarified this or not.

Do anyone here knows what is the opinion of ChrisDate or HughDarwen on the limitations of AccessPathIndependence in TutorialDee? Anyone here knows if there a way to ask him/them?

I can contact them if you like, or you can join TheThirdManifesto discussion group. HughDarwen is very active on it. Instructions for joining are found at http://thethirdmanifesto.com

I'm fairly certain I know the answer, however. DateAndDarwen consider TutorialDee to be a teaching language, illustrative of their abstract (i.e., non-syntactic) D language specification. The D specification does not mention access path dependence. It focuses on database, RelationalModel, and data type issues. Language design issues receive only a glancing mention: A D should be based on good language design principles. Obviously, these are not specified. AccessPathIndependence is a language design issue.

Speaking for myself rather than DateAndDarwen: TutorialDee is intended to be an example of a general-purpose programming language with database capabilities. As such, (a) there is nothing to stop you designing a D that encourages AccessPathIndependence, and (b) it is no more dependent on access paths than any other programming language.

Our domain abstractions necessarily involve relationships. There's no way around that. We cannot simply get rid of them, but only find ways to manage them better.

True.