Ariane Five

The inquiry board's report on the failure of the ArianeFive's maiden flight is available from:

It's a most informative read.

Briefly:

It's not clear where the point of failure was. The horizontal accelerometer wasn't needed after take-off; it could have been shut down. The conversion from doubles to shorts is one that usually generates a checked exception (the software was written in the AdaLanguage), but the programmer had decided that the exception couldn't happen, and so had suppressed the check; he could have handled it properly. The unexpected exception crashed the SRI; it could have had a recovery strategy, like restarting if something 'impossible' happened. The hot backup crashed as well, because it was an exact duplicate; the backups could have been running different software (?). The SRI-OBC communication channel (bus + protocol + software at each end) allowed the OBC to become confused; it could have signalled that the SRI's output was diagnostic rather than navigational data. The OBC treated the bizarre SRI data as gospel; it could have sanity-checked it. Any one of those 'could's (especially the earlier ones) could have saved the mission, ArianeSpace's face, and the 3 billion franc Cluster mission it was carrying.

Incidentally, the software was reused from ArianeFour?; ArianeFour? has a different take-off trajectory, with less horizontal acceleration, so even though the potential existed for this bug to manifest itself, the conditions were never right. The correct operation of the software in ArianeFour? seems to have led the ArianeFive engineers to think that they didn't need to test the software as heavily as if it had been new.

The really interesting questions are "What could they have done differently to prevent this happening?", "What other totally unforeseen problems are there?", "How can they prevent those?" and "Am I in any better a position than them?". Perhaps the best quote in the report is "The Board is in favour of the [...] view, that software should be assumed to be faulty until applying the currently accepted best practice methods can demonstrate that it is correct."!

Apparently, the double-to-short conversion wasn't checked because it was thought to be too computationally expensive; the SRI had a tight CPU budget. However, there is no indication that this decision was made on the basis of profiling - it looks like PrematureOptimization.

I believe the conversion had been proven never to overflow. However, the proof was valid only for Ariane IV. Ariane V was more powerful and had a flatter trajectory - the proof was no longer valid.

There was no proof:

n) During design of the software of the inertial reference system used for Ariane 4 and Ariane 5, a decision was taken that it was not necessary to protect the inertial system computer from being made inoperative by an excessive value of the variable related to the horizontal velocity, a protection which was provided for several other variables of the alignment software. When taking this design decision, it was not analysed or fully understood which values this particular variable might assume when the alignment software was allowed to operate after lift-off.

o) In Ariane 4 flights using the same type of inertial reference system there has been no such failure because the trajectory during the first 40 seconds of flight is such that the particular variable related to horizontal velocity cannot reach, with an adequate operational margin, a value beyond the limit present in the software.

The analysis that showed that the horizontal velocity could not have overflowed in Ariane IV was done after the Ariane V failure, not during the Ariane IV development.


See Also: FixedQuantityOverflowBug, TheCaseOfTheKillerRobot.


CategoryBug CategoryHardware


EditText of this page (last edited June 6, 2009) or FindPage with title or text search