While researching a forthcoming MIRI blog post, I came across the University of York’s Safety Critical Mailing List, which hosted an interesting discussion on the use of AI in safety-critical applications in 2000. The first post in the thread, from Ken Firth, reads:
...several of [Vega’s] clients seek to use varying degrees of machine intelligence—from KBS to neural nets—and have come for advice on how to implement them in safety related systems… So far we have usually resorted to the stock answer “you don’t — at least not for safety critical functions”, but this becomes increasingly difficult to enforce, even if your legal and moral ground is sound. Customers are increasingly pleading the need for additional functionality and for utility to have precedence over safety (!!)
The thought of having to apply formal proofs to intelligent systems leaves me cold. How do you provide satisfactory assurance for something that has the ability to change itself during a continuous learning process? I can only assume that one would resort to black box testing, with all its inherent shortcomings and uncertainties—in particular, a black-box test would only apply to the version tested, and not to subsequent evolutions...
My fear is that the longer we ignore this problem, the more likely that users will simply ignore the safety community and press on regardless (precedents from US naval combat systems and commercial operating systems??). Can anyone offer pragmatic advice to customers who are likely to use IKBS anyway? Personally I think that I prefer… [to avoid] putting unvalidatable systems in the safety-critical firing line. But for how long can we continue to achieve this?
I encountered this thread via an also-interesting technical report, Harper (2000).
That report also offers a handy case study in the challenges of designing intelligent control systems that operate “correctly” in the complexities of the real world:
If it really is the case that hazardous behaviour is dependent on factor(s) purely related to environmental disturbances, then there are limits to how much the risks of a system can ever be reduced, even if the system design contains no decision errors. One notable example of a real accident that may be in this category is that of an Airbus A320, which landed at Warsaw Airport on 14th September 1993 under severe weather conditions, and overran the end of the runway. During the event, the aircraft’s braking system did not begin to apply braking force to the landing gear wheels, because the weight of the aircraft was distributed unevenly across the wheels. In addition, load sensors on the wheels, which measure the weight, act as enabling signals for the ‘Ground Spoiler’ function, which will deploy the spoilers to act as an airbrake when the aircraft has settled onto the ground. Since the weight was not evenly applied to the wheels, neither the wheel brakes nor the spoilers deployed for a considerable part of the aircraft’s travel along the runway. By the time these systems did deploy correctly, there was insufficient runway distance remaining for the aircraft to avoid overrunning the end of the runway.
There was some initial speculation that this might be a design flaw in the logic of the brakes/spoiler systems. However, it is also the case that uneven braking force applied to the wheels can cause an aircraft to veer off the side of a runway, and at high speeds this is every bit as severe as overrun at the end. Hence, if wheel braking had been applied during the event in question, another hazard (veering off the side of the runway) could well have occurred. Since one possible escape from this situation is for the aircraft to take off again and attempt to land for a second time, the correct decision for the braking system to take is not to apply the brakes until weight is established evenly across the wheels. Therefore, in my opinion, there was no flaw in the system design.
The reason why this event is so pertinent to this discussion is the reason why the aircraft weight was unevenly distributed in the first place, and the interactions between the aircraft and its environment that led to this condition. The reason for the uneven weight distribution was that the relative airspeed of the aircraft was unusually high. This meant that there was considerable residual aerodynamic lift being generated from the aircraft wings, even though the aircraft was touching the ground. This meant that the aircraft had not settled properly onto its wheels, thus enabling the braking system in the intended manner. This is an example of environmental feedback (the residual aerodynamic lift) generating a disturbance to the system state (a modification to the load measured by the wheels’ weight sensors) that exceeded some bound, and even though the braking system made the correct decision at the time (not to apply the brakes until weight distribution is even), a hazard occurred. The principal cause of this event was an error made by Air Traffic Control, in which the aircraft pilots were misinformed about the prevailing wind speed and direction at the airport. As a result they configured the aircraft’s speed and bank angle incorrectly for the true conditions on the ground. As a result, the ground speed at landing exceeded the maximum landing speed for which the aircraft was certified, and the bank angle led to the aircraft not settling evenly onto its wheels. This led to events unfolding as described above.
While researching a forthcoming MIRI blog post, I came across the University of York’s Safety Critical Mailing List, which hosted an interesting discussion on the use of AI in safety-critical applications in 2000. The first post in the thread, from Ken Firth, reads:
I encountered this thread via an also-interesting technical report, Harper (2000).
That report also offers a handy case study in the challenges of designing intelligent control systems that operate “correctly” in the complexities of the real world: