Data And Code Are The Same Thing

Code is data, data code,—that is all Ye know on earth, and all ye need to know.

Trying to summarize, DataAndCodeAreTheSameThing can mean at least the following:

Data and code have internally the same representation (most microprocessors, Lisp (in a way))
Data can be used as code (sublanguages, domain specific languages) and code can be used as data (encoding booleans, numbers, lists etc. as functions, replacing data structures with behavioural objects)
Whether something (a structure, byte stream, whatever) is code or data is metadata: the something itself does not constrain this (codeness is in the eye of the beholder)
It's impossible to give a definition that would reliably and uncontroversially distinguish code from data. Think about an interpreter: it takes two inputs, the "code" and the "input", and produces one output. Now, what are the criteria which tell code from input data? Not all interpreters are even for TuringEquivalent languages. And if the "code" itself is an interpreter...

Data and code are the same thing to a "stored program digital computer." Other systems may make a distinction.

Most of the arguments below are unnecessary due to (unknowingly) being about terminology issues. Philosophically and mathematically it has long been known that it is hard to pin down what "same thing" means without being extremely careful about laying a formal groundwork, which is why you hear the word "isomorphic" so often instead of "identical". The term "isomorphic" can be used rigorously, and often is. The term "identical" on the other hand is almost never used rigorously, and hence generates controversy.

It is uncontroversial to say instead that DataAndCodeAreIsomorphic? and that "in some contexts, such as in a Lisp macro or a cpu simulator, something previously treated as data may reasonably be treated as code (or vice versa)."

On the other hand, you are going to do nothing but spin your wheels and generate flame wars if you argue about what "is" and "same" means.

See the philosophical and mathematical history of what "identical" means if you have an interest in such things; it is vast.

-- DougMerritt

And it depends on what kind of "isomorphism" you mean, "isomorphic" should be viewed only slightly less controversial than "same thing" and other stuff. For example you can draw an isomorphism from Haskell programs to TM programs, but you can't say that TM programs and Haskell programs are the same thing. From the perspective of a programmer there's a hell of a difference between Haskell and TM. The same can be said about data and code, yes you can encode booleans, natural numbers, tuples, relations and everything else as lambda calculus abstractions, but it ain't sweet. There's a world of difference rather than "the same thing".

The issues have been described and formalized in the seminal paper "On the Expressive Power of Programming Languages" (http://citeseer.nj.nec.com/felleisen90expressive.html) by Mathias Felleissen.

Expressive Power corresponds well to the mathematical notion of "homeomorphism" - i.e. a small change in the source program must have a corresponding small change in the target program.

The conclusion should be that DataAndCodeAreTheSameThing not when they are "isomorphic" (whatever, if anything at all, isomorphism means in this context ), but when the formal languages used for describing data like for example the relational model and formalism used for code, like Haskell or Lisp, have the same expressive power. -- CostinCozianu

On the one hand, part of that makes some good points, but on the other hand, I'm not sure you understood me entirely. "Isomorphic" is sometimes used loosely, but it has a very exact technical meaning: identical in regard to a particular formal relationship or operation. This improves on simply saying "identical" by explicitly requiring the relevant relationship to be stated. So for instance I could say "data representing Z80 instructions is isomorphic to code under the operation of Z80 simulation", and that would be precise and uncontroversial. Whereas if I don't mention the operation/relation (and it is not unambiguously understood to be implied), then "isomorphism" is not being used technically.

So for you to say "whatever, if anything at all, isomorphism means in this context" seems to mean that either I wasn't clear enough about the context, or that you hadn't heard the technical definition before. Once one establishes clearly enough what isomorphism means in a context, then one is doing mathematics, not random opinion. See CategoryTheory for the broadest treatment.

"Isomorphism" is loose because you specify the objects (for example Haskell and TM code), you loosely specify a relation, but in fact you don't specify an algebra/universe/category for which the isomorphism is defined. For example Z and Q are "isomorphic as sets" but Z is not a field, and they are not isomorphic as rings (anneaux). Now when you say "code is isomorphic with data", you don't specify what "algebraic universe" (is it universal algebra, sets, any specific category, some kind of other formalism) that the data lives in and the code lives in. You most likely specify an encoding. You can encode "code" as data (obviously) and you can encode data as executable code:

 (define _true (lambda (x y) x))
 (define _false (lambda (x y) y))

with the operations:

 (define not (lambda (x) ( x _false _true)))
 (define and (lambda (x y) (x (y _true _false) _false)))
 (define or (lambda (x y) (x _true (y _true _false) )))

Just like you can "encode" a rational number as an integer number. But the above defines an "encoding" not a morphism, because the true/false boolean algebra is not the same kind of thing as lambda calculus. Similarly we can encode all the "data" there is as lambda abstractions, but that doesn't mean we are talking about morphism, but we are talking of encodings, aka representation, aka realization, aka models. That's why I contend that isomorphism is loosely until you specify a precise formal theory that we talk about in which both data (say all relational constructs over the common basic types) and code (say lambda abstractions) are citizens and you prove them isomorphic. Otherwise there's no concept of isomorphism proper, there's isomorphism of groups, isomorphism of fields, isomorphism of Banach spaces, etc, etc, etc. When people say code and data are "isomorphic", I tend to assume, until proven otherwise, that they are talking loosey-goosey about encoding one into the other and vice-versa, but not about any real isomorphism.

P.S. thanks for the pointer to the expressivity paper; that sounds very interesting, and it certainly is usually the more important topic in practice, as you say.

Data is useless until it is in the context of a process.

All data was created by a process.

All process can be viewed to interact with data in this form -

New.Data = Process(Data)

I do not think there is an answer to the question of data or process having ascendancy. There is no separation of data and process - one process' data is the result of another process' process. -- PeterLynch

I agree. Data and process cannot be completely distinguished. The separation argument still holds, however. Data is usually more transportable than process. Also, given the OO obsession with typing, which is primarily a data issue, I think it is incorrect to characterize them as behaviour orientated. --RH

I do not think the separation argument holds for data any more than it does for comms, or arithmetic calculations or any other thing that produces a result. All processes should be separated according to domain. I think we see data as different because of the history - part of which is SQL. When using SQL to access your data, there is an obvious side-effect benefit of reportability, external control, all sorts of things. So it appears to be a QED - because our separation of data has produced so many benefits, it must be the right way.

I do not understand this part of your contribution - "I think it is incorrect to characterize them as behaviour orientated" --PL

It is common to think of "data" and "code" as being different. "Data" is the input or output of an application, and "code" is the set of instructions that specify what to do with the input. However, in most computer architectures, "code" and "data" are both just "stuff" in memory or on disk, and whether you consider any particular set of "stuff" as code or data is subjective.

As a simple example, consider a programming-language compiler. The "input data" to a compiler is what a programmer would consider to be "source code". The "output data" of a compiler is object code. That object code will then be used as the input data for a linker, whose output data is executable code.

Consider also a programming-language interpreter, or a VirtualMachine, or a CPU. Such an entity executes "code", but that code can be considered to be just a form of input data.

Now, say you have any program that reads some input set and processes it. You can think of the input as "data", but it is also equally valid to think of the input format as a specialized programming language (that is, as "code"), and to think of your program as an interpreter of that language.

So, "data" and "code" are just different ways of looking at the "stuff" that computers process. Treating them differently is often useful, but treating them identically is often useful as well.

Some of the many distinctions made between program and data are as follows ...

A program is something that does. Data is something that tells.
- By this definition, the thing traditionally called "program" in declarative languages is "data", whereas the specification of "goal" is the "code".
A geographical map is data for us. But the act of traveling is a process, a procedure. But the code in this case would be the directions, rather than the actual process, blurring the distinction
Whenever my computer tries to execute data instead of code, it tends to lock up or do other horrible things, but whenever it zips up code instead of data, it tends to do equally well. That's because you are using a data format in which all input is valid, such as plain text encoded in ISO 8859-1, not because of any inherent property of the concept of "data". There are programming languages which have no concept of "error" - runtime or otherwise.
My computer loads data into the data segment and code into the code segment. Data does not execute. How do you JIT, then? And is the code in the code segment formed from data? {Which category does it load EssExpressions into?}
Code (algorithms + literals) stays constant in a given scope, and data changes. That's in the feeble languages people use today. If I re-(defun) something in my emacs, its behavior changes.
- But, Big ol' databases are mostly static, with just a tiny percentage change every day, but the programmers keep scuttling over and around the codebase ReFactoring it like little crabs. It changes much faster than the data.'
A constant value is code to modules using it, and an executable is data to the linker producing it. Often, the code in a system has fewer layers of indirection than the data.
- By extension, a single piece of data can produce many different results when handled with different programs. These programs are, in a way, different inputs to the "code" of the single piece of data.
The rate of change can vary fairly widely depending on needs, and thus making distinctions based on rate-of-change is not going to be very useful.
If code is in a cache, it never gets dirty, so one never worries about coherency. Cached data needs to be checked for write back.
- Self-modifying code may have fallen so far out of favor that processors almost refuse to do it, but nevertheless it is not a law of the Universe that programs never change. And even the self-modification prohibition only applies at the machine-code level; a fully-factored object system tends to look a lot like a syntax tree in many cases, so at a higher level any modification to the "data" is also modifying the "code" the data structure represents.
  - Keep in mind that if code runs on a VirtualMachine, both the user's program (ByteCode or whatever) and the user's data are in the data cache; only the VirtualMachine itself is in the instruction cache. In other words, this distinction disappears. (A good VM will be small enough to be entirely I-cache resident; however VMs put additional burden on the D-cache).
My text editor has a "dirty" flag for each of my source-code files, and my application server has to keep checking to see if my web application code has been updated and needs to be restarted?''
If the tables are so static they can be replicated by simply copying once to every site, then they are effectively code. Where is the frequency threshold of the definition? Day? Week? Year?
If a program uses reflection, then the code itself becomes data. See OnReflection.

I had done a project where different calculations were done on "historical data" due to a lawsuit about billing calculations for a finished project. The data was essentially fixed because it was historical. The hours worked and other items were not in dispute. The dispute was the bill rate calculation algorithms. However, I would hardly call that data "code".

Why not? In "foo = bar + 2", the "2" is part of the code (it's a literal). If I have an algorithm that works on dates, I may have a set of strings for month names, and "February" is no more data than the literal 2 is. (But if I localize that algorithm, so it needs to look up at run time what it should print, then the month names look much more like data. Ditto for "foo = bar + x", where x happens to be bound to 2)

This seems to be an argument for their interchangeability, not that one is the other. Sort of like Einstein's acceleration versus gravity view. Or matter and energy.

I don't know who wrote that, but I hilighted it because I feel it is a wonderful analogy. The theory says they are the same thing, not merely interchangeable, no? Gravity is acceleration because objects bend the fabric of space, under the theory.

I would hardly call that data "code".

I don't understand this statement taken from the first paragraph. What is being referred to? I would agree that "historical data" is not code, but I can see no justification for not calling an algorithm code. The statement itself is unclear and, in either interpretation I can see, does not seem to logically flow from the previous statements. What is meant here?

If we define code as being "something on which interpretations can be done to produce results", then we can say that data and code are the same. Interpretations on code is being done by your favourite compiler, interpretations on data is being done by the program you just compiled.

Traditionally, in the LispLanguage, DataAndCodeAreTheSameThing, or at least have the same form.

In particular, looking at LispMacros (or SchemeMacros for that matter) may be instructive. The difference between expander and expansion code; the difference between read, compile and execute time (?) and so on. This could probably do with clarification because I am not yet a SmugLispWeenie!

in some computer architectures, "code" and "data" are both just "stuff" in memory or on disk, and whether you consider any particular set of "stuff" as code or data is subjective.

In other computer architectures the question of whether "stuff" (a page of words) is code or data is not subjective but is defined by the memory management hardware and virtual memory system of the OS. Pages of memory can be read-only, read-write or executable. For reasons of performance and security, executable pages are not writable.

The distinction is not true for other things that are called "code" (virtual machine opcodes, interpreted language scripts, emulated execution environments, etc.). Also, it is only true when the "code" is being executed; it is not true when "code" is being generated by compilers or processed by other programs.

An "update data set" might look like:

  Oper....Entity......ID..Field...Value
  .....................................
  change..customers...23..fname...Fred
  change..customers...23..lname...Johnson
  delete..customers..150..........
  add.....products........descr...hatch spring
  add.....products........price...49.95

(Note: I put dots in between the fields because tabs keep sneaking back in if I use spaces, ruining the alignment. I apologize for the odd look, but have not found a better alternative. TabsAreEvil?)

This could be viewed as code or data. I don't think there is any hard distinction. The program that reads the above essentially acts like a simple interpreter. If this does not convince you, then what if it was formatted as:

  change(customers,23,fname,"Fred")
  change(customers,23,lname,"Johnson")
  delete(customers,150)
  add(products,,descr, "hatch spring")
  add(products,,price, 49.95)

(The "add" operations are considered done to one record if contiguous. A "no_op" or "commit" should perhaps be added to separate adjacent adds. I don't remember how the actual specimen handled this. See Also: ChangeLog)

This data is a program. You have defined a language and written an interpreter! I've done this many times. It's great fun, but doesn't make all data the same thing as code.

Then what is the Boolean clearcut test for whether it is one or the other?

Yes, there is data, and there is ExecutableData? (code). And, actually, even "ordinary" data can be executed - with usually undesirable results - but several years ago I actually made one of my programs execute a portion of the copyright message.

It happened that (on the Z80) certain alpha characters were the values of instructions that performed simple register manipulations. Given a seed value, after the "copyright" executed, I could verify that this message had not been tampered with, as the result of seed + opcodes-in-text was known.

It is not so much them being the "same thing", but in using the same techniques or mechanisms to manipulate and manage them. If you have boat-loads of data, you generally gravitate toward a database to handle it all. Now that modern systems have boat-loads of code, ways are needed to manage that also. I say the same tools can be used for code that are used with data (with some minor enhancements). Linear or nested text in a bunch of files and/or classes is not sufficient IMO. That leads to NavigationalDatabase-like arrangements, which have properly (IMO) fell out of favor. If they failed for data, then they will probably fail for code to (on a larger scale). Thus, I am a TableOrientedProgramming fan.

Is the point that you are trying to make is that VersionControlSystems treat source code like data? If so, why is that significant?

Too often the point is missed entirely. There's content, and then there's context. Context adds value to content. Code adds value to data.

The Argument for Different

Data is the plural from the latin datum, which literally means a "given", a fact. Therefore data are statements of fact, like in DatabaseIsRepresenterOfFacts. That's how database theory regards data (and this is regardless of paradigm, be it OO, relational, constraint databases). Now, I really don't understand what you understand by code so that you can claim that it is the same thing as data. Of course everybody can say X==Y at any point in time about anything, and especially in computing. After all, we can reduce/represent/encode anything and everything to some stupid TuringMachines, where everything is a symbol in an alphabet. But that's irrelevant.

What we are looking here are formalisms. We have two sets of different mathematical formalisms (ultimately meaning syntax + semantics): we have formalisms for data (let's say RelationalModel) and we have formalisms for code (let's say LambdaCalculus). We couldn't care less if both those formalisms can be ultimately reduced to some stupid Turing Machine, if I, as a programmer, don't find TM convenient, the hell with them. What we look for in a formalism is some kind of elegance and abstraction that makes it suitable for humans to reason with and solve particular problems effectively, see BeautyIsOurBusiness. Therefore to say X==Y because ultimately X can be reduced to Y or the other way around is meaningless. To argue differently is truly immature :)

I'll grant that there are instances when code can be seen as input (therefore a particular kind of data) to another program. Also known as code transformations. As far as I can tell, the biggest effort is to allow these code transformations to be written using the same formalisms as the code itself. For example the latest Haskell template proposal (see TemplateHaskell) is of course using Haskell to do the transformations (unlike C++ templates for example, which are using , "C+ templates"). That's another argument that data and code are not the same thing, nobody apparently wants to be using a data model to model their code. For example s-expr, are very suitable for representing and manipulating code. However, they're not that great for data, using them for databases would be going back to HierarchicalDatabases? (don't even try to start me on XML at this point).

Bottom line is that NO, I don't see how data and code being the same thing. They may be the same thing for Tom, but then, who cares. I used different formalisms to put some order in my data, and entirely different formalisms to put some order (if any) in my code. If they were really the same thing, we'd be using the GrandUnifyingTheoryOfEverything? in software systems. And this is unlikely to happen any time soon (no SmallTalk, ain't good enough for me for that purpose, thank you very much :) ). -- CostinCozianu

There is no grand unifying model because some things are more convenient for (some) humans as code and other times a data. It is about viewpoints, not necessarily the underlying nature. EverythingIsRelative.

Please see pp 384-387 of StructureAndInterpretationOfComputerPrograms. -- TomStambaugh

Yes, as I suspected you were making a confusion. There is not an argument that DataAndCodeAreTheSameThing, but, to quote from the book: "That the user's programs are the evaluator's data need not be a source of confusion. In fact, it is sometimes convenient to ignore this distinction, and to give the user the ability to explicitly evaluate a data object as a Lisp expression, by making eval available for use in programs." You might as well think about why Wirth put his famous Program = Algorithm + Data Structure of data and code were the same thing (Program = 2 * Data Structure anyone?).

See, Tom, sometimes it is convenient to see programs as data, many times it is not, and on reverse, most of the times it is most definitely inconvenient to see data as programs. So data is after all different than code, most of the times or at least in a significant number of case they are not the same thing. See my other observation with regards to mathematical formalisms, and what it means to say that X==Y. -- CostinCozianu

Hang on a minute though: it is sometimes convenient to ignore this distinction. Isn't that a clue? We can, if we wish, ignore this distinction, if it is convenient for us to so do. Well, distinctions that may be ignored at will don't seem like very strong distinctions, surely? In fact, they seem very much like conventions, rather than essential features of some domain. No? -- KeithBraithwaite

No, the distinctions cannot be ignored at will. Maybe I'm a bad English speaker but I somehow don't see that "sometimes convenient" is the same with "always convenient" or is the same with "always possible". By the way, when is the last time when you represented real numbers as lambda calculus abstractions? When do you ever think it is convenient to do so for the purpose of solving software systems (mind you, I acknowledge it is sometimes convenient for the purpose of proving theorems)?

I content that the distinction between "data" and "code" in fact sometimes is not, and often should not, be drawn. And that we get to choose those times. It is never essential, and I see no argument on this page to suggest that it is. It may often be of utilitarian benefit to draw the distinction, but it is never required. -- KB

If it's utilitarian, then I think we should change the default handling: make the distinction by default, and only when you can draw some particular utility erase the distinction. For example, thinking of a tuple in a database: ("Costin", 100) as some kind of code, is a waste of time, so don't do it. Thinking of a qsort implementation as data, is equally a waste of time.

Is not. Qsort is a perfectly valid value! It's the result of a computation, even! Granted, it uses a lot of function literals...

     Y :: (a -> b) -> a -> b
     Y fun = fun (Y fun)
     qsort :: (a -> a -> Boolean) -> [a] -> [a]
     qsort = Y (\q ->
         \fn -> \list ->
            let pivot:rest = list,
                (gt, lt) = partition (fn pivot) tail,
            in (q fn gt) ++ [pivot] ++ (q fn lt))

[I'll have a go at it. AlanTuring and his self-modifying typewriter. What it wrote was both data and code. He showed that a typewriter like that could mimic any possible machine. Now each of us is sitting in front of one of them. They use an architecture where code and data share the same address space because we need to treat them as the same thing. Given enough time, they can mimic any machine you can describe. That's what makes computers valuable. -- EricHodges]

That's a non-argument. Computers are not valuable because they approximate Turing machines, they are valuable because they are effective tools in solving some problems, no more, no less. TM are only valuable as an abstraction vehicle to prove some impossibility results, when it comes to actually solving problems, they are close to useless. Do you have an idea how a qsort will look like in a TM? Again, it's a important decision of choosing your formalism. When was the last time you solved a linear equation using only set theory? How about a differential equation? In theory you can do it, but only in a certain, impractical theory. You can choose a better theory to solve these problems.

Similarly, we choose very different theories to structure data and code. If they were really the same thing we'd be using the same theory, but we don't. QED.

[Computers are valuable because they can mimic any machine. Before computers there were custom machines built for controlling car ignition systems, dryer and oven timers, audio and video reproduction, typewriters, microfiche, etc. It turns out for many problems it's cheaper to build generic Turing machines and program them to mimic specific machines. Any solution you've built with a computer would have to be built as a physical machine without the computer. A qsort in a TM looks like any qsort I've ever seen because every computer I've ever used has been a Turing machine. The fact that code and data can be treated interchangeably doesn't mean we apply the same theory to each, it means we can use the machine to tell the machine what to do. -- EricHodges]

That's a non-sequitur. We choose (apparently) very different theories to handle the different kinds of equation because it is convenient to so do. Does that make the two kinds of equation essentially different? No, because as your observation about set theory shows, they could both be handled the same way. So, by your argument what can we conclude about the "sameness" of code and data, given that we typically treat them differently because it is convenient to so do? Nothing. -- KB

Keith, it seems to me we are talking at cross-purpose here. We clearly have the same fundamentals in mind, but apparently we interpret it differently. You seem to be making convenience as merely a a side issue. Others see the same convenience as one of the most important aspects in Mathematics. That's why mathematicians keep inventing new formalisms, to make it convenient to solve problems. Convenient vs. inconvenient is now a fundamental distinction in Mathematics (well, at least according to some well reputed authors like JeanYvesGirard and EwDijkstra, I know there are divergent opinions). Take for example JeanYvesGirard LinearLogic: as he himself puts it, everything is expressible in classic logic already, however he invented LinearLogic and it became one of the fundamental formalisms in ComputingScience, because it makes reasoning of a certain class of problems easier. It abstracts away unnecessary nitty-gritty details needed to encode that class of problems in classic logic, and those nitty-gritties are enough to make our human reasoning on those problems simply unmanageable. Or take for example relational calculus: it is subsumed from your perspective by TM or LambdaCalculus. Try to create databases in terms of lambda abstractions, and you'll be suddenly lost.

It doesn't matter that "in theory" you could do it, that "in theory" has become a secondary aspect in modern mathematics. The quality of "how" you do it is essential. As EwDijkstra puts it: elegance is our most effective tool to manage complexity. -- CostinCozianu

[If data and code weren't the same thing, how would computers work? Where would they get their instructions?]

Perhaps we are speaking at cross purposes. Please help me to understand you. I can't interpret what's being said about code and data being distinct in any way other than that the distinction is inherent, inescapable and essential. My position is that this distinction, where drawn, is usually (tending to always) contingent, optional, and utilitarian. do we in fact have anything to discuss? -- KB

Yes, we have something to discuss here: a grand philosophical PissingMatch on the foundations of Mathematics and ComputingScience :) What you and Tom perceive as "essential", "inherent", "inescapable", others have a more cool attitude about it: so what? I can't put bread on the table with DataAndCodeAreTheSameThing. I believe you might call such people utilitarians. Speaking strictly for me, philosophy is a waste of time, I go to church on Sundays, I'm safely outside utilitarianism :)

For people like me a practical distinction is an essential distinction. Whatever makes a difference between solving a problem and not solving it is essential. To "look" on my tuples and relations through the colored glasses of ObjectOrientation (or pure LambdaCalculus to be less offensive) is exactly the kind of "utilitarian", "non-essential" distinction that will send my project over budget.

If it serves me no good purpose to do arithmetic or relational calculus or differential equations in pure lambda calculus, then that's enough essentiality for me: I do arithmetic in decimal notation with +, -, *, /, % and all kinds of non-essential symbols. It works kind of nice and it feels much better than pure lambda calculus, and I won't loose a minute sleep over this.

To paraphrase a nice argument of JeanYvesGirard, even if you could calculate the lottery numbers (in theory), it will cost you infinitely high. So who gives a damn if the universe is deterministic, if you can't compute the next move? Who gives a damn if you can represent both code and data using the same abstraction (let's say pure lambda calculus) and make them the same thing? You'll have to pay dearly for this exercise.

Well, a declarative viewpoint is valid and practical. Using your example, it has not been proven that relational or lambda calculus is objectively worse than the "traditional" approaches. Why person X is more or less comfortable with paradigm Y is hard to say. Is it training bias? Genetics? I don't know and either do you. (Note that I don't propose 100% declarative software. Most use a mix of both. It is a matter of degree and style.)

Try to create databases in terms of lambda abstractions, and you'll be suddenly lost.

Speak for yourself. The observation that you get lost doesn't demonstrate the impossibility or even difficulty of the task.

From the other direction: all the data processed by computers is code in the sense of instructions. Audio data is a set of instructions for speaker cones, text data is a set of instructions for display devices, etc. Data that isn't telling a computer or person what to do is noise.

Context is the Key

Determining whether an item is data or code depends upon the context being discussed. That said, saying an item is data in one context and saying the item is code in another context does not imply that the item is data and code in either context. It would be correct to say that an executable program is data resulting from a linker in the context of a code build, but it is not code within this context. When the program is executed, it is code in this context and not data. Code is responsible for the transformation of data, and one needs to look at the context to determine whether a particular item is doing the transformation or is being transformed.

Where I come from this is called being cute. A programmer unable to draw a distinction between code and data will have a devil of a time building anything that works. -- mt

I suspect you are approaching this from the "street sense", but this is more from the philosophical sense. You are right that if a candidate cannot tell the difference between Fortran code and comma-separated-value data, they should probably not be hired. However, that is about syntax, while this discussion is more about semantics.

I think I'm talking about semantics. If data and code are the same thing, they are interchangeable--semantically indistinguishable. That is to say, there is no relevant semantic distinction between "3" and "+". That, dear heart, is insane. -- mt

Where, exactly, do you think that the semantic distinction between "3" and "+" resides? In the only place that matters; drawing a distinction is the essential act of intelligence.

I admit, it seems hard to imagine a case where the Hutchison Whampoa mobile phone network would have interchangeable semantics with the symbol on the Swiss national flag... or did you have something else in mind?

{We tend to model our programming languages after spoken language, and thus they both have verbs and nouns. We can identify what are verbs and what are nouns. However, it is possible to "flip" a sentence structure. For example, instead of "add 3 to 7", we could word it, "do math on 3 and 7 using the process of addition". EverythingIsRelative. --top}

(Refactored from LessonsFromHistoryDiscussion. I attempted to remove the "catfight" language.)

Furthermore, you are conflating models for managing code (OO) with models for managing data (the RelationalModel). These are not the same thing. There are valid arguments against using an OO model for data management, and indeed OO databases may hark back to hierarchical and network databases, but this is orthogonal to using OO to implement programs. That said, I think a programming language built on the RelationalModel as a fundamental principle (as opposed to merely implementing it, as do TutorialDee and TopsQueryLanguage) would be interesting, but I have no idea what it would look like. Languages like SETL might give us some hints, though.

{They'd look a lot like Prolog, possibly with mutable functions.}

Hmmm... Yes, perhaps. I.e., DataLog and the like.

Re: "These are not the same thing." - I consider DataAndCodeAreTheSameThing. They are just different views on info. --top

{Data and code tend to have different properties in practice, but I share the opinion that, ultimately, DataAndCodeAreTheSameThing. It's made most visible in LogicProgramming. E.g. consider prolog: from one viewpoint, each predicate is a function over arguments to a logic-value. From another viewpoint, each predicate is a (potentially infinite) relation over tuples of size one or more. There is no fundamental difference between mutating a function and mutating data (though there are practical variations for optimizations and such). At least ideally, DBMS systems should be capable of utilizing a turing-complete language to describe relations. Further, code and data should have the same normalization rules.} [unknown author]

{That said, communication and code aren't the same thing, and data and process aren't the same thing. It's an important distinction. Mere examination of data doesn't admit to side-effects, whereas a visit to a procedure almost invariably has side-effects. "[Y]ou (top) are conflating models for managing [process and communication] (OO) with models for managing data (the RelationalModel)" is probably a more accurate expression of the above speaker's intent.}

As the speaker, I can confirm it does express my intent.

I don't see how side-effects are a difference. A "to-do" list with "kick dog at 3:20pm" is merely a list in one sense, but also (potential) behavior that does change the outside world. I will agree that in practice data and code have different "flavors" to them, but its mostly a matter of degree. This is because as humans we find some things more convenient represented as textual language and others as data. For example, Boolean expressions can be represented as a data struture (AbstractSyntaxTree), but most prefer it as text. A robot or an alien from the planet Structar may want it different.

{A "to-do" list describes behavior to an entity capable of understanding it. It is not, however, actual behavior (or even 'potential' behavior). To get behavior, you need to have an actor. An actor might look at the list and decide to kick the dog, of course.}

To the CPU chip, your EXE is just data. Your EXE is not an "actor" in a strict sense. The only real actor is the physical chip, not your program.

{Correct. The EXE is code and the EXE is data (DataAndCodeAreTheSameThing). The CPU is the 'real actor'. Or perhaps the universe, acting upon the CPU through laws of physics, is the 'real actor'. Actors are conceptual entities, not physical entities. Since actors are conceptual entities, one can also treat processes as actors (operating systems, PiCalculus, etc.), 'actors' as actors (in ActorModel), threads as actors (Procedural), even massive multi-agent systems (e.g. whole societies) as actors.}

So there's no real disagreement?

{I'm not sure there was one. Do you understand how side-effects are a relevant difference between data/code (values that ultimately just 'persist', doing nothing) and process/communication/behavior. Can you apply this understanding towards distinguishing models for managing process/communication/behavior from models for managing data (or raw code)?}

The existence of data by itself says nothing either way about "side effects". If the hourly rate next to my name in a payroll database affects my paycheck amount, then it has a "side effect" on reality. Although it is not directly a "payToppie" command/function, it essentially has the same effect. And an explicit "payToppie" function is merely data from the CPU's or interpreter's perspective. The "final actor" is the hardware, not software. It's as if a robot tells another robot to tell yet another robot to pull the trigger. Data may be further up the chain, but that is merely a matter of degree.

{[You] mention one robot telling another robot to tell yet another robot. Perhaps you can explain to me, top, how the RelationalModel is designed and intended to aide in describing and managing this sort of communication?}

The robot analogy was meant to illustrate definitional issues and not meant to be a literal development model. I would still like a clearer description of how "side effect" makes a distinction between data and code.

{How it makes a distinction between "data and code"? You misunderstand. DataAndCodeAreTheSameThing. Their existence has no side-effects (excepting some computational costs for storage and maintenance). Process, Communication, and Behavior are NOT code and are NOT data. The presence of 'side effect' is one illustrative difference between the set of {Data and Code things} from the set of {Processes, Communications, and Behaviors}. {Data and Code} can certainly infuence {Behavior and Process and Communication}, but you shouldn't conflate models for describing and managing {Process and Communication} (like ObjectModel, ActorModel, PiCalculus, CSP) with models for describing and managing {Data} (like RelationalModel).}

We are getting into the brain-land of mental models here. "Data" is a human mental model. Any data can be viewed as behavior and any behavior can be viewed as data if one allows themselves to see them from different perspectives. Evidence that "side effects" has found an exception to this rule has not been presented. Thus, the rule stands. Whether "process" is code or data is perhaps outside the scope of this discussion. I doubt these words have the math-like precision needed to say for sure. They are notions, not math. See TreatingLanguageLikeMathFails.

{I believe you are making an unreasonable leap in logic - if you wish to insist that data and behavior are merely different based on how you 'view' them, I request a formal description of your reasoning. In my own understanding, code and data can describe or influence behavior, but code and data are not the behavior so described or influenced... and I can think of no means of reconciling the differences simply by changing 'viewpoints'. There is something of a fundamental gap between thinking about doing things and actually doing them. If 'code (or data)' and 'behavior' are the same, then this gap must not exist. Oh, and I wouldn't consider "process" outside this discussion; as I recall, it started with: "[Y]ou (top) are conflating models for managing [process and communication] (OO) with models for managing data (the RelationalModel)".}

Again, I doubt "formal" is possible with mere English. The "gap" is in how you view them. Plus, I've yet to find anything significant that can't be viewed as both. Examples tested already:

A "to-do" list
- Data if viewed as a "list"
- Behavior if viewed as a sequence of steps or items to execute.
- {This example already failed. A "to-do" list is a description of behavior, but is not, itself, behavior. There is no behavior before you 'execute' it.}
- But that's true of ANY AND ALL code because only the CPU is the final actor. Dammit, we've been over this already. I thought it was finished and shut.
- {Okay, I'm having a serious problem understanding how you can say that what I just said is true of ANY AND ALL CODE (which makes it a rather useful truth statement), further recognize that we have been over this, and still say something contradictory. If you know that a "to-do" list is not behavior (and that "ANY AND ALL code" is not behavior) then why do you say the opposite?}
- Sigh. Whether something is data or behavior is a matter of perspective. It is a classic Toppie EverythingIsRelative argument here.
- {Any argument that starts with a premise that EverythingIsRelative has already contradicted itself.}
- If so, please take it up at that topic, not here.
- {See EverythingIsRelativeStrangeLoop. If "Everything is Relative" is true, that's an absolute statement, and thus "Everything is Relative" is false. Thus EverythingIsRelative is a contradiction. From a contradiction, you can prove anything (and its negation). Frankly, you have already committed grave fallacy for every argument you make that relies upon 'EverythingIsRelative'. If you consider it a 'classic Toppie argument', then 'classic Toppie' is defective.}
Hourly-rate in payroll database
- Data in the "traditional" sense
- It affects the results, so it ultimately is a component of "behavior"
- Also, one can represent anything in a database as code, such as with set/get's.
- {You make an irrational leap between "<X> affects <Y>" and "<X> is a <Y>". The rising and setting sun affects my daily life, but I am not a component of the rising and setting sun.}
- Let's turn off the sun and see how long you're not a component of it. As already described, no code "is" behavior if we take an absolute viewpoint.
A command like "print"
- This is a code command in the traditional sense
- It is data to a CPU or interpreter
- {Correct. The command is code and data. The code describes the intended behavior. But there is no actual behavior until the CPU reads (input) and executes it (producing output) - which is a process.}
- If that was true on an absolute basis, the NO code would be "behavior".
- {Correct. NO code is "behavior". NO data is "behavior". DataAndCodeAreTheSameThing, and ThatThingIsNotCommunicationProcessOrBehavior.}
- I assumed you were equating "code" with "behavior". If that is not the case, then we're gonna need a tight definition of "code" first.
- {We already have a tight definition for data (WhatIsData). If you agree that DataAndCodeAreTheSameThing, then we also have a tight definition for code. But we can go into it further if you wish. I suggest: ThereAreExactlyThreeParadigms for a description of four types of code that encompass every program known to man.}
- I don't find "data" clear there. How does one tell the difference between data and code? Are code (function) parameters data, code, or both? And, I'm not prepared to buy into your suggested triplet just yet.
- {DataAndCodeAreTheSameThing. Why would you need to tell the difference? That's even true of the paradigms I mentioned - they certainly aren't offering a clean break between "data" and "code". They just provide a high-level overview of what "code" can be. Among other things, code can be data (and data can be code). What you need, perhaps, is a difference between "behavior" and "code". But you said that "we're gonna need a tight definition of 'code' first", so I'm focusing on that.}
- I'm not claiming there is a hard difference. If you agree, then I guess the debate is done. (I will agree there is a "notion" difference in most heads, but it is not objectively isolatable, barring a precise working definition.)
- {I agree only that DataAndCodeAreTheSameThing.}

Also, any sequence of info or commands can be viewed as a predicate. For example, in "print(string_X);", "print" can be viewed as part of the predicate just as string_X is part of it. (This may resemble the ProLog kind of view.) It is true that some have immediate actions and some may not, but that is a matter of degree. Maybe your computer will ask you to select a printer (driver) or prompt for page range before printing. This is merely because one predicate may trigger another. Thus immediacy does not appear to be a fully distinguishing attribute. Maybe the concept of immediacy can be worked into a new definition somehow in order to destinqish, but I am skeptical. I will agree that "print" is more "immediate" than hourly rate (above example), but again it's a matter of degree.

I am also not clear on why you present managing communication versus managing data as a key issue. They may not be mutually exclusive anyhow. Are you saying that commands are communication but data is not? The outcome of a program may not really depend on or care about whether it got it's input from data or communication. The same inputs would normally result in the same output regardless of the delivery mechanism.

{I'm talking about actual communication, actual process, actual behavior. You seem to think that a command to perform communication is the same as actual communication. But a command is not communication; a command is (by definition) something that can be communicated as part of a command language. A command is code.}

If that's the case, then code is NEVER behavior because it relies on physical mechanisms.

{Correct - that is the case, and code is NEVER behavior. Since DataAndCodeAreTheSameThing, Data, also, is NEVER behavior. And, since data is NEVER behavior, it is ultimately a mistake to conflate 'models for describing or managing behavior' with 'models for describing or managing data'.}

[Excuse me for poking a nose in, but as the writer who kicked off this threadlet with "[f]urthermore, you are conflating models for managing code (OO) with models for managing data (the RelationalModel)," I'd like to confirm that the above paragraph is precisely what I had in mind, but better and more generally expressed.]
May I suggest you try a different way to say or illustrate your point. It seems an apples/oranges to me. I don't see why you put stock in that.

But I don't agree with the premise. What it "is" is a matter of viewpoint. I was just pointing what appears to be a problem with your reasoning.

{You have NOT made ANY reasonable argument that this is a matter of viewpoint. All the arguments you've made thus far are unreasonable.}

Nobody's shown a case where something cannot be viewed as both. That's not proof, but it is good evidence.

{Nobody's proven that the great Spaghetti monster isn't controlling your mind, top. That's not proof, but it is equal to the fallacious trash you just now provided as "evidence". Tell me, top, do you view 'false' and 'true' as the same? It would explain a lot, and it does follow from believing that EverythingIsRelative.}

{When you unequivocally prove that "to-do" list that describes a list of behaviors is THE SAME THING as actually performing the behaviors (i.e. isomorphic - merely a perspective change), then perhaps your claim will earn some respect. But to do so, you need to describe the perspectives and prove that they exist. Merely saying that there is one (because Top said so) is insufficient. So find two perspectives: one in which a "to do" list with "kick the dog" written upon it is merely code (that one's easy), and one in which a "to do" list with "kick the dog" written upon it is the actual behavior of already kicking the dog.}

Why am I burdened to prove "is the actual process"? Process-ness is not the issue that I can see anywhere. --top

{Prove it's the 'behavior' then. As far as not being able to see, that seems typical coming from you, but if you look UP a few pages worth of argument, you'll find that 'process and communication' were the ORIGINAL issues, not behavior. You've chosen to focus on behavior, and that suits me well enough (code and data are not behavior, either) but you were the person who chose to ADD that issue. The others don't go away. You probably don't know the difference between 'process' and 'behavior' anyway.}

As descrided above, the issue is "data" versus "code", not "data" versus "behavior". If you wish to establish that code == behavior, be my guest...

{Do you suffer from long-term memory loss? (I can't remember...). Top, this argument started when I agreed that DataAndCodeAreTheSameThing, but then further pointed out that process and communication are not code or data. You objected to this. Thus, THAT is the issue, NOT "data vs code". And I'm quite confident that 'code != behavior'. That 'code == behavior' is what YOU were arguing. Sigh. We might as well drop this discussion considering we're now arguing about what we're arguing about.}

I admit that I unfortunately didn't take care to distinguish the difference between the words "behavior" and "code" and may have used them interchangably. If this is the reason for the confusion, I apologize. ("Is data and behavior interchangable?" makes for an interesting (new) question, though.)

{Apology accepted. I agree that the question is interesting... or at least it is if you can provide some decent examples as to where data and behavior seem to be interchangeable.}

I would argue that CodeIsMetadata?: data *about* data that suggests (with varying degrees of absoluteness) how a processing system can infer new values from given ones. And of course, MetadataIsData?. We have a number of semi-standardised approaches to describing such metadata, some of which have standardised 'coding languages' which imply certain standardised 'machines' or 'workflows' or 'evaluation strategies'. C++, Lisp, XSLT and Prolog are all 'coding languages' and some of them are more readily recognised as 'code' vs 'data' to the human eye. But the question of whether *in a given context* a piece of information is 'data' or 'metadata' (and hence 'code') is entirely a matter of position in a system - is it 'inside the processing box' or 'outside' it?

Code that produces communications or initiates activity rather than produces new values wouldn't qualify so well under your logic above. And it is certainly not the case that all metadata is code. I.e. if I know that I know 21 ways to skin a cat, that isn't code. In any case, could you please be careful to avoid confusing 'value' with 'information' or 'data'? Values aren't information without context.

In my view, processes (communications and activity) are merely stuctured *temporal* patterns of information, while what we often think about as 'data structures' are structured *spatial* (for want of a better term) patterns.

Data can be represented as a signal in time, space, or both (as per the proverbial laser bouncing between Earth and Mars). Not all signals, however, need to represent data. The computational view of 'time' is simply a partial-ordering of signals (potentially, but not necessarily, including signals from a clock or a physical object). Interestingly, this computational view of time seems to be perfectly consistent with the current (relativistic) understanding of time we possess through physics, and when combined with the often strange and unintuitive 'observation' properties in the quantum model has led several intelligent and open-minded scientists to the conclusion information - the bit or something close to it - might be the most primitive thing in this universe, even more primitive than atoms or strings or matter or light. (There was a SciAm dedicated to this, but I can't remember which.)

That said, while verbs constitute patterns of percept or action over the temporal dimension (e.g. riding a bicycle), processes themselves don't need to carry structured information or anything meaningful to us. Your view ought to account also for processes that essentially produce white noise.

The spatial or temporal element is not actually that important: both of these can be produced from what we think of as 'algorithms' or 'data': for example, the Mandelbrot Set is a spatial pattern that can be produced from an algorithm, and an audio file playback exists as an experience in time, but can be represented as a block of frequency/amplitude values. And there is much crossover between the two: a sufficiently compressed 'data file' itself resembles an algorithm. Loops and recursion and tests/conditionals can be viewed as means of removing redundant data items from a description of a sequential process, an interaction, or a decision tree.

The compression structures we tend to consider 'code' tend to describe potentially infinitely-sized, recursive and conditional patterns, but that doesn't make them any the less 'data' unless we choose to arbitrarily limit our view of data to the finite. This is most obvious in lazy functional languages like HaskellLanguage which allow the definition of infinite and recursive *data structures*. Even two Lisp lists that share internal structure have blurred the distinction between code and data; a Lisp circular list even more so (recursion but no conditional). Whether you view a structured pattern of information as a 'process', an 'interaction', or a 'memory' depends on what *perspective* or context we examine it from. And if we choose to describe or annotate that context in a machine-processable structure itself, that's metadata.

Code itself cannot constitute a whole description of any particular thing or data when its behavior relies upon runtime percept or signals. If you limit yourself to purely functional code, I'd agree that it can readily be utilized to represent infinite data. Once communications are involved, the code itself can possibly represent how to acquire data, but could possibly represent any number of other actions that possess no particular meaning and describe or acquire no information.

Some of these ideas are developed further in JGerardWolff's ComputingAsCompression research. Others in TedNelson's XanaduProject and his concept of HyperStructure?.

The reason I think this is important is that I think increasingly in a parallel, web-distributed world we need to move away from visualising data processing as constructing 'sequences of operations' (the traditional idea of programs or 'code') and toward storing, retrieving, and inferring new values from marked-up pieces of data - moving toward an information architecture or knowledge management point of view. It's the data that is the reason why we have computers, after all.

I think video games and pr0n are the reasons most humans have computers. But information management and analysis are important too.

If you liked, you could say that 'code' (in the sense often used by programmers as opposed to librarians or information architects) usually implies one particular, fairly narrow, sense of metadata - a sense that tends to orient toward clustering data into hierarchical 'types' or 'classes' and defining sequential, often stateful, operations (functions or methods) on that data. Neither of which are necessarily the only possible or even best ways of thinking about how to 'infer new values', and a narrow focus on these as the only forms of 'coding' is going to be increasingly unhelpful in the future world of massive parallelisation and personal data remix.

My two cents. -- NateCull?

For opposite views: DataEqualsCodeDependsOnContext, DataAndCodeAreNotTheSameThing, SeparationOfDataAndCode, DataCodeEquivocationConsideredHarmful,

A PennyThought? -- "Data and Code are the same only in that they are a series of bits each of which is set at 1 or 0, it is how they are presented and how we perceive them that causes us to make their names different." -- DonaldNoyes.20110310

NovemberZeroSeven

CategorySubjectivityAndRelativism