This page is to refute the idea that XP can be used for anything. Consider ExtremeSurgery (aka EmergencySurgery?).
The Extreme practices tend to be "plan as you go", specifically avoiding BigDesignUpFront. BigDesignUpFront is, of course, a major value in surgery, since design time is much less expensive (and, more to the point, less dangerous) than time in the OR. Imagine performing surgery using UnitTests, RefactorMercilessly, and YouArentGonnaNeedIt..
Let the programmers program, and the surgeons do surgery.
Just like XP does surgery poorly, it does certain types of programming poorly. Consider aircraft fly-by-wire code, or other LifeCritical applications. These require better quality than ExtremeProgramming, or any commercial software methodology, can produce. These are produced using stricter methodologies, a sort of surgical programming. The error rate is incredibly low, but the code is incredibly expensive per line of code.
You've asserted that no commercial software methodology can produce quality sufficient for life-critical software. So just what methodology do you use to produce life-critical software?
XP is inherently limited. Refactoring by definition is an adjustment of factors within their context; one conceptual level. Thus it does not reach other conceptual levels effectively, and is not efficient for developing complex systems.
Better methods would involve the establishment of SoftwareArchitecture before a rapid design/ coding process. 'Rapid' approaches could be applied to architecture but in this domain code is not the expressive language and gung-ho coding can not be a relevant behaviour... :-)
It is harder to think than to type.
Life-critical software can be methodologized by *any* methodology which covers the entire stack from customer requirements to general software requirements (in architecture) to design to implementation to testing; establishes both pre- and post-condition testing at every level; and is verified over the entire methodology's use to ensure reliable propagation of pre- and post- tests down the stack.