Runtime Upgradeable Core

While considering the idea for a WikiIde, DaveVoorhis expressed an opinion that the WikiIde kernel should be as small as possible, with the goal of moving into a purely Wiki-based editing environment ASAP, yet be fully upgradeable from within the Wiki-based editing environment. Specifically, his words were: "if adequately designed, I believe it should be possible, over time, to evolve the [SpikeSolution] into the system described above entirely via the Wiki interface. [...] make the initial kernel, if it can be called that, as minimal as possible, with the goal of moving into a purely Wiki-based editing environment as quickly as possible."

This is a goal that I've latched onto; it appeals very heavily to my affection for elegant design and simpleness without simplicity. However, I've been banging my brain against it for a while now, and I'm a bit stuck on how to do it in a manner that is both elegant and practical.

First, a few definitions for this discussion:

The real goal is to provide a safely RuntimeUpgradeableCore, such that one can literally upgrade core services, like the webserver or the page database/indexer, while in the midst of using the current web-server and page database/indexer, all without that horrible sense of dread or fear of things going horribly wrong and forcing you rescue your machine (or panoply of distributed machines) by hand.

Now, I've so far thought of two fundamental approaches to obtaining safety (and cannot help but think they ought to both be implemented, for sake of completeness): (a) the ability to run a to-be-core-service in parallel for an arbitrary period of time before switching to it, and providing some mechanism of transition. This would be akin to running the 'new' Linux Kernel atop the 'old' Linux Kernel, but having some magic button that says "switch!" and makes it so the old Linux Kernel is now running atop the new one - and implementing this "magic button" will undoubtedly take a great deal of thought itself (especially testing/ensuring it survives the transition, such that the property of being a RuntimeUpgradebleCore? is not lost in the upgrade). The ability to run the proposed service as a regular project on the WikiIde is ideal for a great number of reasons, since it means all projects are treated equally and allows one to run a variety of tests and perform arbitrary debugging from within the WikiIde (which is intended to provide these services) so you can feel comfortable that the changeover is unlikely to break things. (b) Some sort of rescue-service available continuously that runs with a set of core services completely independently of the primary one. E.g. if the webserver runs on port 80, one might have the rescue webserver running on port 2280 (memorable as: "port 80 after catch 22") - one could reserve the whole 22xx block just for rescue services. These would share very few of the core services, but there would need to be some overlap: the primary process scheduler and primary memory manager (which would leave upgrading those riskier than the others & more likely to require a rescue). This is intended to be redundant, as a safe-harbor before resorting to a bootup disk with rescue software. These catch-22 backup services also need to be upgradeable since you might change file-system formats or upgrade the language, but one can do so by choosing a backup service that is known-stable from among the recent builds.

Also, it wouldn't hurt for the WikiIde to be capable of automatically packaging up and providing the core service components as-they-exist as part of a machine bootup disk. (I do intend, ultimately, for WikiIde to run on bare metal; there isn't much reason to deal with interference from another kernel with other agendas if one does not need to do so. But the ability to provide the services even atop, say, a Linux boot disk would also do the job... albeit far less elegantly IMO.) That way, when the going gets really tough, a smart administrator can have a library of old bootup disks to fall back upon to rescue the system that will inherit any changes to the versioning support, filesystem formats, code language, etc. that might be critical for reading and rescuing the system. This packaging of the core will be aided by having the WikiIde be quite reflective and fully bootstrapped (the code for the WikiIde will be on WikiWord pages available through the WikiIde), so I imagine it is readily possible.

Anyhow, the above may provide safety, but still leaves that darned transition hanging around - the "magic button" mentioned above to initiate the transfer a core-service to its upgraded version that is running in the WikiIde, plus maintenance of the RuntimeUpgradeableCore property across upgrades. The latter, I'm afraid, will simply need to be obviously marked so it won't be removed by accident (ultimately, I imagine that part of being extensible and flexible is sometimes the ability to give up the ability to be extensible and flexible) - one still has the 22 service for rolling back if desired. As far as transition goes: forwarding between services doesn't count: that would bog down the system after just a few upgrades, and I'm not keen on one webserver knowing about another (or at least needing to know about another). But if services in the WikiIde are identified by their WikiWord page names (and that's how I plan it), then perhaps making the changeover can be as simple as modifying either one, central file (WikiHeart?, maybe? or WikiCore? if we're feeling a little less Disney-esque) that has a list of core services - the ones to which information incoming on particular ports are directed (e.g. it might have '2280 - RescueWikiWebServer') or perhaps having each of those destinations be a 'redirect' page (so RescueWikiWebServer is a ServiceRedirect:CoreWebService_xxxxx). Thus, the 'core' services are fully defined very simply as those that receive the top-level 'port' inputs, whereas non-core services get inputs directed through the code-language itself (often SharedMemory MessagePassing) and only redirect through the primary ports when doing reflection on the Wiki.

Just ThinkingOutLoud as though I had an audience that cared. (It helps me, if not you.)


On WikiIde, DaveVoorhis suggests use of dual heartbeat monitors (described below). These would also provide a great deal of stability absent human-intervention (for everything but the scheduler and the network stack stuff - which aren't helped by much else, either). They can be used in addition to the above approaches, and rely on both a source-control system and the ability to manipulate the execution to swap out an old service in favor of a newer one.

Primary heartbeat monitor.

Secondary heartbeat monitor. The following constraints must be observed:


CategoryWiki


EditText of this page (last edited November 7, 2014) or FindPage with title or text search