Is every undesired behavior an AI system exhibits “misalignment”, regardless of the cause?
Concretely, let’s consider the following hypothetical incident report.
Hypothetical Incident Report: Interacting bugs and features in navigation app lead to 14 mile traffic jam
Background
We offer a GPS navigation app that provides real-time traffic updates and routing information based on user-contributed data. We recently released updates which made four significant changes:
Tweak routing algorithm to have a slightly stronger preference for routes with fewer turns
Update our traffic model to include collisions reported on social media and in the app
More aggressively route users away from places we predict there will be congestion based on our traffic model
Reduced the number of alternative routes shown to users to reduce clutter and cognitive load
Our internal evaluations based on historical and simulated traffic data looked good, and A/B tests with our users indicated that most users liked these changes individually.
A few users complained about the routes we suggested, but that happens on every update.
We had monitoring metrics for the total number of vehicles diverted by a single collision, and checks to ensure that the road capacity of the road we were diverting users onto was sufficient to accommodate that many extra vehicles. However, we had no specific metrics monitoring the total expected extra traffic flow from all diversions combined.
Incident
On January 14, there was an icy section of road leading away from a major ski resort. There were 7 separate collisions within a 30 minute period on that section of road. Users were pushed to alternate routes to avoid these collisions. Over a 2 hour period, 5,000 vehicles were diverted onto a weather-affected county road with limited winter maintenance, leading to a 14 mile traffic jam and many subsequent breakdowns on that road, stranding hundreds of people in the snow overnight.
Root cause
The weather-affected county road was approximately 19 miles shorter than the next best route away from the ski resort, and so our system tried to divert vehicles onto that road until it was projected to be at capacity.
The county road was listed as having the capacity to carry 400 vehicles per hour
Each time system diverted users to avoid the collisions, it would attribute that diversion to a specific one of those collisions. When a single segment of road had multiple collisions, the logic that attributed a diversion to a collision chose in a way that depended on the origin and destinations the user had selected. In this event, attributions were spread almost uniformly across the 7 collisions.
This led to each collision independently diverting 400 vehicles per hour onto the county road
Would you say that the traffic jam happened because our software system was “misaligned”?
So would you say that the hypothetical incident happened because our org had a poor alignment posture with regards to the software we were shipping?