All Data Relates To Other Data

Apart from RandomNumbers, every datum in some way relates to something else. Most of our computing history has been done in the context of isolated systems with monolithic applications, creating DataSilos?. But with the post-Internet era, especially, the possibility of inter-linked data emerged, paving the way for a UnifiedDataModel.

The OperatingSystem must change so that instead of BigDataObjects that are isolated from everything else, siloed in applications, we have MashUps that keep data and all their relationships tended to like a giant WikiUniverse.

For the sake of clarity, data is defined as something that is both non-NULL (obviously) and non-random. That being the case, then it is clear that AllDataRelatesToOtherData. Other definitions are at WhatIsData.


Even RandomNumbers relate to something else. 42 isn't random unless it's either in a random sequence, or the value of a random variable.

Umm, I don't think so, keeping in mind that machine-generated numbers are generally only pseudo-random, I have to ask you: Where did you get that number? {rolled a fair 10-sided die, twice. http://xkcd.com/221/} Heh, good one, but I'm using the term "data" to mean "non-random", only your communication of the result is data to me. -- MarkJanssen

HitchhikersGuideToTheGalaxy. Where else? But I wasn't talking about pseudo-random numbers. The first thing I was talking about was Kolmogorov Randomness (aka. KolmogorovComplexity). The other is from ProbabilityTheory?..


Have you considered MetaData?

Yes, but data cannot provide that itself, that is where CrowdSourceing comes in. MetaData must stay distinct from regular Data.

{MetaData is regular data. It's just data about data.}

Now you're playing shifting semantics. If MetaData is regular data then why make another name for it? Why not just call it "data"?

{The semantics are consistent. We do call it data, but to identify that it's data about data, we qualify it as "meta" data. This becomes important when we build systems capable of reflection or introspection. For example, every database system based on the RelationalModel has meta data -- information about the database itself -- in the form of table/RelVar names, attribute names, stored procedure names and so on. Some database systems based on the RelationalModel also provide a catalog -- a collection of tables/RelVars that contain the meta data. Every database system based on the RelationalModel uses meta data internally, but only those with a catalog allow the user to query the meta data using exactly the same query language facilities as for domain data. This is inherently more powerful, consistent, and flexible than systems that don't provide such a facility, i.e., those that treat meta data and domain data as distinct. Equivalently, every ProgrammingLanguage compiler uses meta data -- variable and function names, class members, etc. -- internally, but only languages supporting reflection or introspection make the meta data accessible at run-time via the facilities of the language. In the latter case, domain data and meta data are again treated as equivalent, in the former they are distinct.}


Yes, metadata is data. If it is coupled with a set of definitions (ontology) it can help to make the data itself intelligible. That depends upon agreement about the ontology to be used. So I could expand my quick comment above to read:

Have you considered the use of MetaData in your work?

To give an example, my surname can also be a first name, a street name, a town name or something else. The context defines it here by common usage, but metadata would declare it as a surname. I hope that is more helpful. -- JohnFletcher

Here on the wiki, the ultimate meta-data is simple - everything is a 'page', so the name names a page and the description is in the name. Experience here has lead to other usages e.g. 'category' which is used to define meta-data for a page e.g. 'home page' 'person' etc. These are not about the content of the page, so much as the type of content. Using these in a systematic way enables a user to find something which is in a category e.g. 'book' as long as someone has added the category. This has often been done by a 'wiki gnome' following on after discussion has developed.


All names are meta-data. A name allocated to a thing describes the thing in its current context. Until a thing can be given a name, it is just an undefined instance of data. Typically a name implies the set of which the data is a member. Fwiw. -- PeterLynch

Right. That's the sorta the point I was trying to make; however, it should be known that a name can become another piece of data by being enclosed in another, larger "meta". The CrowdSourced web will allow everything to be organized less arbitrarily. Right now, we have DomainNameSystem? for organizing the web. -- MarkJanssen

{How will a CrowdSourced naming system for the Web differ from a DNS-based naming system?}

1) less-arbitrary: DomainNames? are somewhat arbitrary. 2) finer granularity: No web admin labels everything on the web, but the crowd could.

{I'm still not clear how this will work. Typically, what happens today is that if I own a machine that I want visible on the Internet, I put it on the 'net and give it a name in the DNS. Because it's my machine, I can give it whatever unique name I like -- assuming the domain name I want is either accessible to me or I own it. How would that change? Let's assume I have a machine with resources (say, a Web server or other services) which I can assign a unique IP address and put on the 'net. What happens next?}

PeerToPeer networking creates a software-level network. One contacts a set of peers and all data get exchanged from there. Whatever further data you need from other peers is connected behind the scenes by the software; you don't care, because it's ContentCentricNetworking.

{I can see that working for small ad-hoc networks where you exchange data for a specific purpose, but how does it scale to the Internet? If I run a web server, how does it get known and consistently referenced?}

Billions of users making p2p links. The VotingModel assures that valued content gets visible.

{Who is going to build it and replace the existing systems?}


See also UnifiedDataModel.


EditText of this page (last edited October 11, 2014) or FindPage with title or text search