Distributed Data Centralized Programming

In the early sixties, the programming was distributed (it was the deck of cards that you carried around), the data was centralized (on large magnetic drums in the computer room), and the execution was centralized (and manned 24/7 by computer operators).

In the middle sixties, we had the brilliant idea that we should store our programs on the disk drives with the computer. No more carrying around decks of cards! We could sit at a TTY terminal, enter our name, password, and the name of our program, and start typing line numbers and code. Then we could just type RUN. Later, if we needed to change something, the source code was already retained at the computer, so we could just log in and edit whatever lines needed changing. Programming was centralized, data was centralized, and execution was centralized.

In the early eighties, microcomputers became readily available. Programming was decentralized (not distributed, just decentralized), data was decentralized, and execution was decentralized.

This was all fine and dandy, but brought forth tons of interdepartmental political and operational issues. For example, what if the remote office in Poughkeepsie wanted to print AP checks? How were their entries going to get into the General Ledger? After a while, especially with the aid of local area networks, we found ourselves in a situation where programming was decentralized, data was distributed, and execution was decentralized.

So now I think we all agree that distributed data and decentralized execution are terrific. But what about the programming - should it remain decentralized, or should it be distributed? Now we see software being developed, for example J2EE platforms, where the programming (and execution) is once again centralized. Then there's talk of DotNet and new developments at Sun where the programming will be distributed (in real time). When should a program be centralized, distributed, or decentralized? Are there certain applications (or constraints of applications) that would favor one methodology over another?

-- JeffChapman

Why should data be distributed? I see little value in that. Everybody has their own copy, and changes are all out of sync. Why not just put it together to better serve OnceAndOnlyOnce? The only value is temporary buffers for times when you are out-of-connection with the central system.

The push away from centralized data in the 1980's was mostly because mainframe tools and UI's did not keep up with the times, not because of any inherent advantage of decentralization. Now the central servers run the same software as the desktops.

Data needs to be distributed because there will never be a central machine holding all data. Yes, only one persistent copy of each data item should exist, but items may be distributed across multiple machines. Aggregation of data should occur at the user machine when requested.

Maybe data should be centralized to the extent that specific groups of people share a common need for it. For example, cookies holding identification information and personal preferences for folks when they visit Web sites are distributed.

In organizations, the PoliticsOfControl? plays a major factor over what data gets Centralized and what data stays at divisional levels. Salespeople like to keep local decentralized control over the contact information for the accounts where they get paid a commission. Meanwhile corporates like to install SalesForceAutomation? to centralize this information. A lot of the battle over centralizing or distributing information has to do with JobSecurity versus control (or ManagementsInterestInDefeatingJobSecurity).