In many systems, the response to an exception or error is to ReportAndDie. One style is to display a dialog that says something like "An error has occurred, program terminating" and a single "Ok" button that finishes the deed. Another style is to dump a logfile or something similar and then exit. Sometimes a stacktrace or its equivalent is displayed.
All these share common "choreography" - the user is told that something terrible happened, and the program or operation is then terminated.
This page is crying out for advocation of an alternate approach: In ReincarnationProgramming?, ReportAndDie is replaced with ReportAndDieAndLiveAgain?.
I use a product called TreePad, written by a cool Dutch guy, HenkHagedoorn?, using Delphi. When it does something horrible, like an access violation, it throws the exception, issues the complaint, waits for the user to click OK, and recovers from the error without data loss.
It's the only Windows program I've ever used that accomplishes this.
I would suggest that whatever he's doing, or whatever Borland is doing in Delphi that makes this possible should be adopted as a standard.
The only technical features that come to mind that allow ReincarnationProgramming? are checkpointing and rollback and such, which are rarely pervasive anywhere except a database server. Are there others?
People often claim this with regard to Delphi. As a user of many Delphi applications, and a sometime Delphi programmer, it's an illusion - the application isn't actually recovering from an error, it's simply declining to crash, leaving internal data in an unstable state. This may accidentally be recovered from or it may not. Sometimes it's just obnoxious, like when a Delphi app decides to iterate over a non-existent array, popping up dozens of access violation boxes but refusing to die. People don't like applications that crash, but a crash is the *only* reasonable answer to an exception you aren't sure you can recover from. If you are sure you can recover from it, there's no reason to alert the user. The delphi framework installs an exception handler that pops up that message box, then continues as if nothing had happened. There is no magic involved whatsoever. -- ChrisMellon?
Smalltalk applications written against VisualWorks or IbmSmalltalk (I don't know about Squeak) can gracefully recover. Sometimes they don't notify the user, but instead log the incident in some way.
Recover how? Cosmic rays destroy the contents of RAM. A redundancy check reveals a fatal error: important data is smashed. And Smalltalk just...recovers. I think more specifics are in order.
Ok, ok, ok. What I mean is that in the Smalltalk environment, the context of the expression that raised the exception is kept around, and is pointed at by the exception. When an exception is caught, the handler can (some of this varies by vendor):
So when your user is deep in a stack of nested modal dialogs, is trying to save and exit, and accidently clicks the (empty) A:\ button instead of the C:\\ button that she wanted, you can handle the no-volume-mounted exception and return back to the nest, no harm done. As opposed to making the user start at the top all over again.
Or, to quote an old apocryphal myth, when your missile guidance system pulls a segmentation fault in the middle of a routine in a math package calculating the inbound target's velocity, you can tell your code to:
lmao. I didn't know Microsoft made aircraft instruments. I wonder if the displays spanked the user for improperly shutting them down on their way back up.
The AbiWord word processor is rather less stable than I would like, but its pattern is "save the current document to a .CRASHED file and die". Thus, although it crashes too often, I've never lost work. (Aside: I'm not using the very latest version; it may be getting more stable.)
The ability to "retry" (restart) in a smart way after an error was discovered is an important factor in software quality - knowing that bugfree software is hard to find. The critical part for a successful recovery seems to be the ability of the program to access a non-corrupted state to continue the execution from there. Is there a universal strategy for this? (A WordProcessor seems to work fine with autosaves, but is this always applicable? What are other approaches?)
See also LimpVersusDie, FailFast
Related to SegmentationFault, CoreDump