Fault Isolation

Fault isolation is the practice of designing systems such that when "something bad" happens, the negative consequences are limited in scope. Limiting the scope of problems reduces the potential for damage and makes systems easier to maintain.

The typical method of fault isolation is to create boundaries between system components, and ensure that the effects of faults don't cross the boundaries or that they are limited. Examples of isolation boundaries are:

ProcessIsolation
Machine. When applications are running on separate computers, a crash of one of the applications (or an entire machine) is less likely to affect other applications.
Device. With some hardware technologies, such as RAID, data is written to multiple physical devices, so that if one of those devices fails, the data is still available.

Fault isolation can also be achieved to some extent using an IsolationLayer. However, whenever multiple components are running in the same process or are accessing a common resource, there is a potential for one component causing problems for the others.

Note that for fault isolation to have benefit, it is necessary that components be designed in such a way that they can function, or at least shut themselves down cleanly, in the absence of a failed component.