Atom Cell Quantity Holy War

Many of the HolyWars on this wiki seem to be in part about how many "cells" are to be in our "atoms". Some of the quantities in question include:

Then we have the likes of LISP, which are kind of a roll-your-own atom composed of lists of trees or trees of lists. This can result in great flexibility, but also a sense that it allows too much personal-creativity as to make cross-developer code reading difficult.

Food for thought --top

Top, I think these, er, curious views about cells, atoms, records and tables may say something about why there are HolyWars between you and some others, but I can't agree that they're the basis of academic arguments re DynamicTyping vs StaticTyping (is that what your first and second points are about?) or relational vs. OO approaches (presumably the latter two points) in general. -- DaveVoorhis

I didn't say that it was the *only* basis to compare (see "in part" at start). It's just one of the main ones. --top

I didn't say it was the *only* basis to compare, either. My point still stands. -- DV


Top, IIRC, 'Atom' is (by definition) something indivisible. A 'cell' is something mutable. A cell can carry an atom (or, presumably, more than one), not vice versa.

Okay, maybe "atom" is the wrong word. But I haven't found a sufficient replacement yet. Isn't AcId also using it wrong? --top

[No, AcId is using it correctly. An "atom" and an "atomic operation" are two different things, though in certain contexts (see below) they may be related.]


What I call an atom is something for which there are atomic operations on the CPU and therefore can be operated on by more than one thread at the same time without locking. An atom is required for making a semaphore. On a single CPU architecture, CLI/STI type instructions can make a semaphore, but mult-CPU systems need a real one.

That certainly isn't the usual definition. There is very little on a CPU that can be considered atomic relative to multiple threads, especially when dealing with true multi-core CPUs, shared memory, DMI, and caches. Setting a word that doesn't cross boundaries can count, so long as you don't read it before you set it, but as you say: most CPUs require something special (like CAS). Nonetheless, in the above context the issue of multi-CPU access and synchronization is irrelevant.

[It's not that far off. They are atoms in the sense that they are indivisible (or at least they can be modeled as if they were indivisible) in time.]

It doesn't take support for atomic operations on the CPU to model something as though it were indivisible in time; any acid transactional model allows that much, as does pessimistic mutual exclusion support, and these can be provided by the language over individual cells no matter how large they are (and doing so is only relevant in the face of multi-threading or potential failures/exceptions after a partial write... which, depending on language and context, is not always the case). Focusing upon strictly hardware-supported atomic actions in the context of a discussion over 'atoms' in languages is, indeed, "that far off". If you wish to consider temporal indivisibility, you need to allow for a much broader view of it when defining 'atom' (e.g. "any component for which the language guarantees logically atomic updates with regards to mutable state"). This would vary from cell-structural atoms because the language may or may not support atomic updates of sub-cell components (e.g. via use of logically atomic transform functions over the surrounding cell).

OK, this page was called HolyWar for a reason. (Not necessarily a good one.)

On some ACID databases, if you set to SERIALIZABLE and someone else sets to READ UNCOMMITTED, you lose (you get his uncommitted stuff sometimes). In my work in multi-threading, I have to care what requires semaphores and what doesn't. Hence, my definition.

It's just a matter of guarantees. If someone is sniffing the memory bus and you use CAS/TAS/LLSC operations, they're also free to read uncommitted data. "You lose", too - and for essentially the same reason (because someone chooses to read uncommitted data). As an argument, it doesn't really hold water (and it qualifies as a straw man) because you haven't really lost - you, as the programmer, generally control both the operation doing the write and the operation doing the read, and you only really need guarantees for your own code and the legal modules. It isn't your concern whether someone runs your program in a virtual machine (so they can sniff out every memory access) or chooses to read inconsistent state from the database. If you're willing to consider 'atoms' those cells possessing hardware-level engineered support for atomic operations, you can't reasonably argue that higher-level 'cells' possessing software-level engineered support for atomic operations are any less 'atoms'. Any counter argument is subject to the slippery-slope fallacy regarding the question of where hardware stops and software starts, especially when considering virtual machines and field programmable arrays.

Consider: if you strictly use locks for cells (no matter how small - whether it be one bit or a variable string or a complex datatype representing a whole XML file) and pass to them (as a single action) immutable representations of the values they are to contain, you can guarantee that there is no deadlock and you get the exact logical equivalent of hardware-level 'atoms' at the software level. There is no risk of deadlock, and the data accessible as 'atoms' is simply a great deal more flexible than fixed-length fixed-alignment bitvectors. And you can even do so efficiently, using garbage-collection or reference-counting, perhaps using mutexes across such structures to trade mutation-speed for space and construction-speed. I am hardly ignorant on this subject; the vast majority of my labors at work, too, utilize the inadequate and poorly designed (but oh-so-easily-implemented) multi-threaded shared-memory model instead of something sane like ActorModel, and I, too, know the associated patterns and concerns. And, knowing this subject well enough to teach it, I still find ludicrous the way some people will blatantly violate other software engineering concerns (e.g. the ZeroOneInfinity rule, and especially OptimizeLater) in a great fit of PrematureOptimization. No, no, no - you don't "have to care" what requires a semaphore and what doesn't - not unless you "have to be" concerned about gaining the last few micro-cycles of speed and "have to" optimize by that means exactly. Who or what is forcing your hand? Chances are, like most people focused on likely PrematureOptimizations, you'll miss the larger optimizations and feature-benefits that can be identified when the system is kept simpler by NOT distinguishing between 'cells' of various types when it comes to multi-threading (such as allowing for cheap copies of complex data and lazy evaluation and observer-patterns and versioning or trans-cell-action support).

The only hardware-implemented atomic operation that would be greatly beneficial for avoiding semaphores is DCAS, which would allow many data-structures to be manipulated wait-free, and avoids potential need to hold two locks simultaneously on two cells. But, surprise o surprises, nothing but some old and long dated Motorola chips provide DCAS.

Hmmm. I'm used to atomic operations locking the bus.

There's no such thing as 'locking the bus' - not in any real sense of the word, anyway. There are low-level signals telling other parts of the system what is going on so they can play nice with one another. In most systems with more than one CPU, on-die caches need to listen to the bus in order to know which cells to invalidate or update even in the case of atomic operations; further, such CPUs in properly designed systems can continue to perform operations. They can even update the 'atomic' cell in question (which is part of why TAS and CAS and LL/SC all have failure conditions - the other part being context switches). From this, one can conclude that 'locking the bus' is metaphorical at best. I wonder: for which architectures have you utilized atomic operations directly? Though I'll admit it has been a while, I, myself, have written two mutex and semaphore libraries for different 'toy' operating systems at university.

I have used x86, which has several atomic instructions. x86 has an opcode prefix that does indeed lock the bus so no other writes to memory can occur until it finishes (bus lock is held for one instruction). My favorites are inc [pointer] and dec [pointer] for atomic reference counting.

The x86 lock prefix talks to the CPUs and tells them to play nice (via setting of a hidden register). It isn't as though they aren't still on the system bus, listening for a signal to go again and invalidating or updating cache locations based on signals received; they, instead, just pause any pipelines or labor work when it comes time to send signals to the bus. As I said above, it isn't locking in any 'real' sense of the word. It doesn't prevent sniffing the bus or putting signals onto it if you wire in a processor that isn't willing to cooperate... much like installing the software entity that 'reads uncommitted' isn't cooperating, and much like a software entity that reads or writes to mutex-protected memory without first trying to grab the mutex isn't cooperating. The way any of these locks work is the same: you get all the players to agree upon and obey the rules. Admittedly, installing an uncooperative processor is far more difficult and far less likely to happen on accident than installing uncooperative software, but it can be and has been done.

That aside, I'll note that a mere 'ADD' can (in hardware) relatively easily be done atomically without even telling other CPUs to wait on the bus for a moment: given certain memory architectures, you can easily send an 'add' (or 'xor' or 'rot', etc.) instruction directly to memory and get back the previous value or result. When I interned for a summer at a RAID controller company, they did exactly that - they had (depending on the card) one or two gigabytes of memory (which was a LOT at the time) on a board with a dedicated CPU (just for talking to the HDDs, deciding what to cache or prefetch, and taking commands from the system CPU) and the memory on each card had its own (limited, but very fast) controller for performing vector operations (e.g. XORs, ROTs, ADDs, and SETs over and between ranges or with registers). The controller itself would delay return if asked for a value from a range that was in the mid-transform. It was pretty cool stuff, designed for rapid RAID ops and a little bit of simple shared-key encryption. OTOH, I've gained what I consider a very healthy distaste for auditing hardware and device drivers. But I digress. If you have a dedicated controller for the Memory that can understand instructions more complicated than 'get' and 'set', a great number of operations can efficiently be made atomic very easily. You mentioned your favorite. If I were to point to my favorites it would be the use of a dedicated and slightly-more-intelligent memory-controller with these sort of complex and vector operations, plus (on more traditional architectures) CAS (though I'd be clasping my hands in glee were I to have access to DCAS, which would easily allow wait-free, unsynchronized, safe shared mutable lists, queues, and stack operations, and whole books of other data structures). I hate test-and-set with a passion.

Nice! Believe me if I had those primitives I'd use them too.


MarchZeroEight


EditText of this page (last edited November 10, 2014) or FindPage with title or text search