All Abstractions Lie

From CritiqueOfUseCases: "The conclusion here is that AllAbstractionsLie."

ErnstMach? held that scientific laws are summaries of experimental events, constructed for the purpose of human comprehension of complex data. Thus scientific laws have more to do with the mind than with reality as it exists apart from the mind. He realized that AllAbstractionsLie.

He also said The goal which it [physical science] has set itself is the simplest and most economical abstract expression of facts. I think we could say the same thing about modeling software, that we are looking for the simplest and most economical abstract expression of facts that could possibly be enough to solve a problem... or to describe a solution for a problem... I wonder if we could get some ideas to improve (or understand better the nature of) software engineering from Mach's philosophy of science...

"All abstractions lie -- including this one."

This naturally follows from TheMapIsNotTheTerritory or TheRepresentationIsNotTheRealThing, but it obscures the fact that the abstraction also reveals truth -- usually more so than it "lies." No model is 100% reality - but if it were, you wouldn't need a model, you already have the reality, the real thing (and there are naturally better and worse models). No two models are identical. They both abstract what each abstracter finds to be the important relevant details to be modeled. There is often (if not, indeed, usually) significant overlap between models, but they differ to a greater or lesser extent. It is of course this difference (both between model and reality, and between independent models of reality) that often causes us problems.

We can't program without abstractions - it sure gets hard thinking about those electrons on your chip when you're coding. We just need to remember the assumptions needed to make an abstraction valid.

True.

It's what makes UseCases useful.

But not useful enough to make broad sweeping decisions unless you can prove that the stuff you elided doesn't break. Easy to do in mathematics, difficult to do in software. Unless you bother to actually implement and test it.

Here is the quote from CritiqueOfUseCases, needed to follow the abstract argument raging below:

Consider a system to be a volume in a discrete space. A discrete space is built out of indivisible quantized partitions, like pixels on your screen. It's discrete because your computer is.

If you consider that, then each feature, or UserStory, is like a point inside this volume. A UseCase is like a path through the volume. A diagram is like a surface inside this volume.

All actual changes happen at the point level. Many paths and surfaces go through any given point. To keep your static copies of these paths and surfaces (specific UseCases and diagrams) accurate, you have to update each element of this possibly large set. However, there is only the one point and the one volume, so there aren't many copies of these that need to be updated (probably approaching zero).

Naturally, if that one point is the root of a large coupling graph, there will be many changes to be made anyway. But still many paths and surfaces intersect each and every point in that graph, and those will need to be updated as well.

On the other hand, if you reduce the entire system to a set of unordered points, you've lost information on how those points are connected to each other. Thus, you need higher order slices of the overall volume (which we presume is too large to be understood) to provide insight. Hence, UseCases and diagrams.

What's ironic is that the paths and surfaces are usually too complex to be understood as well, so we "abstract" them to simpler paths and surfaces. That is, we remove information and provide an approximation. The conclusion here is that AllAbstractionsLie.

Anyway, since most systems are dynamic, especially during development, the cost of having all this spaghetti tied to each point is very large. After all, the UseCases and diagram act as opposing forces to change, thus requiring the plucky programmer to expend more energy (and more cost) to change a given point. It's friction.

The obvious solution is to not hand generate the higher order slices, but to have them derived from the volume itself. -- SunirShah

I see that...

Selection of inappropriate abstractions (in BigDesignUpFront) can get you into lots of trouble.

And, on a different track...

All abstract models we use to build systems are fundamentally "wrong," but we need them to make problems tractable. (Look for the double horizontal lines, further down on this page.)

Selection of inappropriate abstractions (in BigDesignUpFront) can get you into lots of trouble:

I read the context from CritiqueOfUseCases, but I still don't understand the significance of the claim: AllAbstractionsLie. What advice results from this insight? Abstraction in description is inevitable, so all descriptions are also lies, but who is deceived by this, and why does it matter? --WaldenMathews

You accumulate noise in the design every time you make an abstraction. When doing BigDesignUpFront, the noise quickly exceeds tolerable levels and the design falls down. You can reduce the noise by doing experiments (run the code!), and you can then abstract again. I doubt that more diagrams (or whatever) reduce noise as fast as just running the code. -- SunirShah

Sunir, I'd really like to see an example of this. Does abstraction really introduce noise, or does it just introduce silence? Is it abstraction you're having a problem with, or it is metaphor? In one case (abstraction), you come away with less than the whole truth, while in the other (metaphor), you come away with mistranslated truths, i.e. real lies. As systems experts, we should be very good at working with less than the whole picture, as there simply is no alternative. Thanks, Walden

What I was complaining about on CritiqueOfUseCases was the insistence of certain methodologists of designing your system using a series of diagrams, one after another, top down. As we drove down, it became obvious that the interface between the high level components was not as clean as the abstraction lead us to believe. The abstraction "lied" to us in that it suggested this was a workable architecture. In truth, there wasn't enough information at those levels of abstraction to make a decision. The lack of information is what I was referring to as "noise." Writing code, writing tests, running code fill in the missing pieces so you can make an informed decision. -- SunirShah

So then are you saying that out of this lack of information assumptions are made, code is written against those assumptions, and the only way to dispel the assumptions is to run the code? I'm still wishing for the concreteness of an example to understand this. -- Walden

Ok, here's my favourite "gotcha" design experience. Once, when building a rendering pipeline, we had the following components:

Content manager
Content (graphics)
Alpha Transitions
Video card

The obvious solution--from bubbles on a whiteboard--was to be linear:

Load a graphic.
Transform the graphic.
Blit the graphic to an in-memory buffer.
Transition the in-memory buffer to the rendering hardware.
Start again with the next graphic.

However, this "obvious" (it was to us) solution was wrong. Here, the content manager was in charge. We found it was much more useful to put the transition in charge.

Load transition.
Transition requests content.
Load graphic.
Transform graphic.
Blit graphic to an in-memory buffer.
Signal transition to go.
Transition.
Start again with the next transition.

This architecture was completely inverted from the original. When instead of the transition being at the end of conveyor belt, it was the puppeteer.

This solution was non-obvious from nice bubbles on whiteboards, and required experience with the hardware, OS, graphic library and the natural flow of the system before we ended up with the (more) optimal solution. -- SunirShah

Sunir, I do have an inkling of the problem you are describing, but could you please be very specific as to what it was about the first approach that didn't work? Above, when describing the second version, you said "we found it more useful", almost as though it was just a stylistic preference. Thanks, Walden

Basically, when we attempted to build a design from only our abstract notions of the system, we could not adequately determine a solution because we didn't have enough information. As you gathered, we made assumptions that were false. Only when we had more experience with the system could we build a working solution. An abstraction "lies" because it removes information. The lying is actually done by the human as the mind attempts to fill in details with assumptions. Or at least that's what I've found, which is why I distrust methodologies that insist that it takes only one more diagramming technique and the system will work. -- SunirShah

Okay, I realize I've been pressing you for details on something that probably happened some time ago. I was hoping you remembered something from the code level that was awkward or that just couldn't work. A synchronization problem, perhaps, or a module too difficult to implement (but what was so difficult about it?).

It seems to me that your initial abstraction "lied", whereas your later, more informed abstraction did not, and I want to chew on that idea before saying anymore on this subject. -- WaldenMatthews?

What was missed were a perfect understanding of the requirements. The initial design was good for a cyclical progression of disconnected frames (think, "flip book"), whereas the second design was good for both that and a cyclical progression of the same frame (i.e. marquis scrolling). Furthermore, the underlying hardware was buggy and slow in important cases. Even later on, the above design changed to support video chaining. -- SunirShah

There is no mystery here, nothing that you need to unravel. You think about a problem, let's say one with vertices and links for the sake the discussion. In your first way of thinking about it, you describe it in a way that puts the vertices at the center of the thinking. Vertices own links, and so Vertex is an object (just to carry the "analysis" thinking into the pre-design domain), whereas Link is just a collected attribute for Vertex.

You start implementing, and somewhere along the line, it becomes clear to you that inverting the story suddenly makes the problem more tractable. Suddenly your entire thinking about the problem gets inverted (not just the design, your entire thinking about it). You discover that if you put Link at the center of the thought pattern, then Vertices are almost innocuous, and traversal, computation, whatever, is (for instance) linear time instead of cubic or worse. Whatever.

Now, in any problem, some thinking takes the spot I labeled Vertex in those paragraphs and some takes the spot I labeled Link. It could be domain concepts, it could the overall architectural pattern, it could be algorithms.

All Sunir is saying is that a person's way of thinking about a problem changes over time, and that the first way is not always the best way, and that if you think that there is a way of thinking at the beginning that will protect you from such reversals, then you are wrong, dangerously wrong, and it will cost you.

(I first ran into this scenario long ago when studying top-down design, which presupposed there was a way to name the topmost pieces that would be stable over the course of the project. I next ran into it when it passed under the guise of Object Oriented Analysis, discovering the true domain objects that would transfer directly into a stable OO Design. The proposed concepts shifted, the net effects didn't).

Hope I read you point right, Sunir, cuz this took a lot of typing, and I'd sure hate for it to be all pointed in the wrong direction. -- AlistairCockburn

All abstract models we use to build systems are fundamentally "wrong," but we need them to make problems tractable:

Let's deal with some examples, by looking at some attributes:

OK, a person has the attribute "sex," which may be "male" or "female." Well, in the real world we'd like to think things are that straightforward and simple, but they really are not. Like, some people are born XY, but lack the receptors for Testosterone, and so they develop female genitalia and are indistinguishable from "XX" women by anything short of chemical or DNA testing. (This has been an issue in recent Olympics testing.)

OK, age... Except that age changes constantly as time progresses.

OK, birth date. Surely there can be no question about that! ...except that birth is not an instantaneous process; it's something that will certainly take a few seconds, making it possible for the "time spent physically giving birth to a person" to cross midnight -- so they have a "birth date" that really spans two days.

I could go on and on. But the point is not that you can't ever "get it right." Rather, that the goal is to "correctly" implement a model that is useful -- one that reasonably represents the attributes of the real world system that are important to you:

If you wish to grant a discount to seniors, then the difference of a few seconds in birth times is probably not important to you. If you wish to compute some statistics of "salary and raises by sex," to prove/disprove sexual discrimination claims, then the apparent sex (from casual examination) is probably good enough for your needs -- genetic testing would be excessive.

Certainly nothing I haven't experienced myself, many times over, this pattern of starting with an abstraction and having to backtrack as details fill in and have their selfish way with my perfect design. But it seems that the result of that is not that AllAbstractionsLie, but rather that all initial abstractions do. Maybe not even all initial abstractions, as I seem to recall getting it right at the start once and a while. I'm more likely to do that when I've spent the time to become intimate with the stuff in the problem context, and I would even go so far as to say that screwing up initial abstractions teaches a sort of sixth sense about what to look for on a new problem, although this could never be perfect.

I was curious to know whether it was a problem context detail or a language detail or some other kind of detail that derailed Sunir's initial attempt at a high level separation. I suppose it's not important, as in a sense they are all part of the problem context (including the toolset you have at hand). But I was curious. In any event, Sunir succeeded in staying abstract (with me).

If I were going to drop some words of wisdom, I'd say "before you can abstract, you must see the detail", which is the same as saying "before you can withdraw, you must stand close". Here comes the etymology again, sorry, but abstracting is "drawing away", not just "being away". Maybe that's significant. I'm sure it is, because we always get the abstract layers of our program perfect after we've finished all the detail work. Another way of saying this is that when you refactor, you are abstracting, but with the details as your proof.

There's danger in the message AllAbstractionsLie, as it will often be interpreted to mean "don't abstract". Programming without abstraction is the road to spaghetti. Maybe the message should be: Abstract when you know enough detail to make abstracting meaningful, or when you must abstract earlier, do so with proportionately modest expectations.

Also, yes, all abstractions do lie, but so do words in general and thinking in general, and yes, we have learned to live with it, and as programmers, work with it.

I think Sunir was reacting with a diatribe to some of the "stuff" that gets taught in the OO design classrooms, where the teacher purports that somehow thinking for a few minutes and doing some dervish incantations drawing joined named rectangles and triangles on a piece of paper will produce the "true good objects" that will be stable through the life of the system across design and maintenance. The novice OO "analyst" is going to abstract away from the problem and discover the truth. That is a lie, and, as he says, a costly (dangerous) one. I find myself having to exert quite some energy debunking that lie in OO classes at a distressingly regular pace.

For someone with a few years experience programming OO systems, the lie of the abstractions is not a major problem, it is just something to work with, much as walking requires countering gravity. It is to the rest that I read Sunir's original discharge as addressing.

One way that abstractions lie, or at least mislead our thoughts, is when we prioritize the perceived (and often unreal) needs of a collective over the needs of individual people. One example is racism, where the abstract collective of a race is said to be threatened and this claim is used as an excuse for actually threatening individuals. Another example is the medieval crusades, where the abstract collective idea of Christianity was said to be threatened and this was used as an excuse for religious wars. A third example is the communist or marxist-leninist dictatorship in the Soviet Union and other countries, where the better of the working class (an abstract class, uh-uh) was used as an excuse for not allowing freedom for individuals. A fourth example is the idea of a TragedyOfTheCommons, where a fairy tale example of an overgrazed village commons is used as an argument for privatization of property, when in reality village commons have worked just fine for centuries in Europe; the real tragedy here is that the argued-for privatization could easily lead to underuse of available resources and famine. As a fifth and similar example, the abstract threat of overpopulation has allowed the leaders of communist China to punish individual families that get more than one child, without having to face protests from western democracies. As a sixth example, some have complained that the actual and concrete usage of this wiki's system for categories (WikiCategories) isn't in line with the original and more abstract intentions (the intentions were to have categories and topics; topics never caught on and have since been deleted). My personal conclusion is that while abstractions are often useful (and usefulness is a high goal), it is always right (and often motivated) to question an abstraction and return to an analysis of the concrete underlying entities. This brings back the detail that was excluded in the abstraction. In my opinion this should be in line with ExtremeProgramming, ReFactoring, and avoiding the abstract BigDesignUpFront. -- LarsAronsson (18 May 2001)

There are abstractions, there are liars, and there are liars who use abstractions. There are also benign uses for abstractions.

Yes, there are liars, but abstractions in themselves are attractive because borrowing someone else's abstraction saves the time and energy it takes to study all the underlying details. This makes us gullible, because LifesJustTooShort to check all the facts. One more example of abstractions is the process of industrial standardization in organizations like ANSI, OSI, CCITT/ITU-T, and the InternetEngineeringTaskForce. In the ideal case, we will seek standardization when it solves a real problem. Having standardized 110 VAC 60 Hz (or 230 VAC 50 Hz in Europe) saves a lot of detailed thinking when planning your electricity needs. But sometimes standardization precedes implementation, and solves problems that nobody has, just like some BigDesignUpFront. Traditionally, the InternetEngineeringTaskForce has excelled in a pragmatic process driven by need and implementation while the classic failure of CCITT/ITU-T/ISO standardization of X.400 e-mail and OSI networking is held up as an AntiPattern. Lately, however, the IETF's standardization of IPv6 as a solution to the scarcity of IPv4 addresses is flopping because there is a more convenient solution to the same problem: Network Address Translation (NAT). Reality is lagging behind all plans and predictions for how fast IPv6 would be adopted. Of course there are benign uses for abstractions. But they are powerful, and should be handled with care.

Well, I can't speak for that book or its brain, but I think the most important function of my brain is its role in keeping my body alive, and beyond that I'm not sure if knowing its most important function does anything for me, so I'll leave it at that. At one level, abstraction is not a choice. When we intake our surroundings, there is information lost, and I'm sorry but there's nothing you can do about that. At a different level, abstraction is a decision to ignore some of what's known. This choice can be used in the service of malicious deception (the "lie"), or it can be used in the service of freeing capability. Abstraction from available information is a key part of sanity. If all the serotonin in your brain suddenly went south, all remembered experience would flood your cortex and render you unimaginably stupid. So stupid, in fact, that any increase in concreteness or practicality would be lost. So stick that in your pipe and smoke it, but watch what the smoke does to your brain chemistry, okay? -- WaldenMathews

Abstractions are a PrematureOptimization. It's why top-down waterfall methodologies don't work. The real world always has a gotcha to getcha.

Firstly, I have yet to see a reasonable size project that didn't require an iteration of the specification process in response to physical-architectural issues. So the principle of top down tends to be broken. Once this has happened a few times you get very suspicious of "forward only" methodologies. Can it work? Of course it can. Sometimes you get lucky.

Top down "metaphorical" definitions tend to purely logical abstractions. A considerable amount of further subdivision occurs until at some point code is built and the rubber hits the road. Since we cannot easily test abstractions against the real world, we inevitably face an ImpedanceMismatch to some extent. Fast machines and functional O/S services can mask some of these things, but the fact remains that architectural refactoring should be anticipated, sometimes requiring a significant adjustment of semantics. EJB's are an example where the metaphor/architecture ill suits the physical environment of RDBMS's and distributed concurrency.

So, (logical) abstractions constrain a system architecture in unpredictable ways. Thus all abstractions can be considered premature optimization, or premature constraint if you prefer, since they remain only theoretically appropriate until tested. Successful systems have a real problem with this. -- RichardHenderson .

Regarding "forward only" methodologies, they seem to be theories only, as they are never practiced. That's what makes them "methodologies" and not methods. However, there are "somewhat forward" methods that are somewhat "top-down" that are somewhat successful sometimes. Their abstractions are hypotheses that need to be explored. Without them, exploration may be much less effective. The point is that some constraints are always helpful when we are inventing. Popular myths to the contrary abound.

Architectures are abstractions, and their purpose is to constrain just as you have described, but not with absolute authority, and especially not with authority over real experience.

Premature constraint I understand, but how this kind of constraint acts as an optimization (or even an optimization) still eludes me. Typically, when proceeding from the abstract to the concrete, performance requirements are the last to be addressed. If it's not performance that's being optimized, then what? -- WaldenMathews

I don't like painting with so broad a brush. Rather, I'd state it this way: what abstractions really do is omit details. Abstractions are useful and true when the details they omit are non-essential. But how do you know what is essential and what isn't? My rule is to never develop an abstract interface or base class until you have at least two implementations or descendants clearly in mind. That way you know your abstraction "works."

Once I had to write a Vector class in C++ that represented a magnitude and direction as three cartesian components. At first I wanted to make it a template, so that it would take the types of its components (int, double, Rational, etc.) as a parameter. However, because it had functions like magnitude that needed the sqrt function, it wouldn't work with Rationals or with any other type that didn't have sqrt. When I was porting it to Java, I found that I would either have to leave that member function out, or make different interfaces for each type. The commonalities weren't there -- functions had to know whether they were working with Rational Vectors or double Vectors. It was better to write separate classes. So I wrote a regular class that took Rationals only.

Nevertheless, abstractions have been very useful with digital signal processing; all my processes descend from a single abstract class. That makes it possible to build up a chain of digital signal processing stages with the identities of the particular processes determined at run-time.

The use of abstraction is essential to the discovery of elegant solutions. The trick is to relegate as much as possible to the "leave it out" category and still deliver a great solution. That's another statement of the K-I-S-S principle. Effective abstracting is an iterative process. "What if we just look at these three things...No, that's no good...What if we look at these...?". Abstracting as you have described above (design of "abstract" classes), when it's based on your experience with concrete, working stuff, is a little bit different from the process of abstracting as exploration. Or is it? -- WaldenMathews

I agree that AllAbstractionsLie. Or more specifically, AbstractionsAreRelative?. There are no absolute abstractions. A detail for one person may be a key thing for another and visa versa. A "tree of abstractions" may be fine for training purposes, but it is too simplistic and rigid for most software abstractions. An accountant's view of car parts will be different from an engineer's. One view is not "more right" or "higher" than another in an absolute sense.

AllAbstractionsLie in the same sense that "The Tao that can be named is not the real Tao" (see TaoTeChing). Names are not the real thing, they are just names (identifiers pointing to the real things), and therefore every name, every thought is misguided. But we can achieve almost anything in this mess if every concept is just "close enough".

The FirstLawOfSoftwareDevelopment is that every successful system will have a next version. This happens because the clients are happy with it and they want more of the same. If you ever design a successful system, your design will get more complex in the next version until it is so complex that nobody can really understand it, not even you, and the system becomes undevelopable, because the next time the software was way too complex.

That's is why short proofs of concept (SpikeSolutions? or Prototypes) are necessary to keep complexity under control. And mockups and short release cycles are needed to get customer feedback sooner rather than later.

An abstraction that has no concrete use is as useless as an abstract class that has no concrete class. Nobody would do that in their right minds, but some people think abstractions are good per se, even if they can't be put to good use. Abstractions should be just an afterthought, create them and use them exactly when you need them, do not create them because you think you are going to need them. It is a waste of time to create abstractions before you can create the concrete examples, because you can't test the abstractions, you will get them wrong and you will have no way of testing them.

Besides everything in programming is abstract, even in wrong ways. For example you write int meaning integer but actually the numbers you are actually referring to in the program are -65535 to 65535 (or a bigger interval). Most computer abstractions are deceiving, like 0.1 which is not 0.1 really.

There is not only one way to abstract a system, so the single rooted/single inheritance approach is fundamentally wrong. I don't know exactly how it should be, but most OO practitioners seem to agree that inheritance should be a dynamic property instead of a static one. No language I know of permits this, so in the mean time we can use instance variables instead of inheritance, but that is not as good as inheritance because we loose the caller overriding methods (polymorphism), maybe delegation would solve that problem?

-- GuillermoSchwarz

Re: but most OO practitioners seem to agree that inheritance should be a dynamic property instead of a static one. No language I know of permits this

There is sort of: databases. It's one of the most true and tried dynamic classification engines (and not just trees). --top

As established above abstractions hide information (details). Models also hide information as they are inherently abstract. Hiding information is not the same as supplying false information, though.

The fundamental lie or problem in software engineering (and other sciences) is that you can derive designs (solutions) by working with a model and then apply your solution in the real world. That is very often not the case. The reason is that the model hides information that conflicts with the solution derived in the model. When you do implement your solution you may not even encounter the conflicts in your test. Eventually the details do hit the solution, and then it will fail.

Therefor my experience is that moving anything that was derived in a model to the real world is going to cause trouble, eventually. The answer is to model anyway, since you cannot work with all the details all the time, but to also implement small increments all the time as well, since you cannot anticipate real world problems using a model / abstraction.

I think that says it very well

-- MikkelBrahm

Would BadAbstractionsAreLies? be true?

There is something about AllAbstractionsLie that really gets up my nose. At one level there is nothing which we conceive that is not an abstraction, and in that sense abstractions lying fits in with Zen - reality is as you decide to perceive it. But my reality is not your reality - one of us has a lying abstraction. That said, what is the inate lie in something as abstract as an SQL statement? Select attributes from entities where conditions.

If I am designing a system, and I know I will want to serve out Invoices to my customers, I will eventually define a process "Print Invoice", which is an abstraction. The fact that this may eventually send a copy of an invoice to an email address, or actually send a copy to a printer, makes no difference to me at design time. The abstraction serves its purpose of producing an invoice, regardless of implementation technique. It is maintaining the WhatNotHow perspective at design time by the use of abstraction.

It is encapsulation. All encapsulation is abstraction. Are all encapsulations lies too?

How about AllAbstractionsAreLossy?

Technically more fitting, but less recognizable. Newbies may not recognize "lossy". Thus, I vote against change. --top

I agree. We should eliminate all abstractions and only allow programmers to have a binary keyboard with the keys 1 and 0.

But 1 and 0 are abstractions of the value of an analog electric current. Programmers should just have a moving coil meter.

CategoryExtremeProgramming CategoryAbstraction