Surely “the whole point of AI safety research” is just to save the world, no?
Suppose you’re an engineer working on a project to construct the world’s largest bridge (by a wide margin). You’ve been tasked with safety: designing the bridge so that it does not fall down.
One assistant comes along and says “I have reviewed the data on millions of previously-built bridges as well as record-breaking bridges specifically. Extrapolating the data forward, it is unlikely that our bridge will fall down if we just scale-up a standard, traditional design.”
Now, that may be comforting, but I’m still not going to move forward with that bridge design until we’ve actually run some simulations. Indeed, I’d consider the simulations the core part of the bridge-safety-engineer’s job; trying to extrapolate from existing bridges would be at most an interesting side-project.
But if the bridge ends up standing, does it matter whether we were able to guarantee/verify the design or not?
The problem is model uncertainty. Simulations of a bridge have very little model uncertainty—if the simulation stands, then we can be pretty darn confident the bridge will stand. Extrapolating from existing data to a record-breaking new system has a lot of model uncertainty. There’s just no way one can ever achieve sufficient levels of confidence with that kind of outside-view reasoning—we need the levels of certainty which come with a detailed, inside-view understanding of the system.
If the world ends up being saved, does it matter whether we were able to “verify” that or not?
Go find an engineer who designs bridges, or buildings, or something. Ask them: if they were designing the world’s largest bridge, would it matter whether they had verified the design was safe, so long as the bridge stood up?
Suppose you’re an engineer working on a project to construct the world’s largest bridge (by a wide margin). You’ve been tasked with safety: designing the bridge so that it does not fall down.
One assistant comes along and says “I have reviewed the data on millions of previously-built bridges as well as record-breaking bridges specifically. Extrapolating the data forward, it is unlikely that our bridge will fall down if we just scale-up a standard, traditional design.”
Now, that may be comforting, but I’m still not going to move forward with that bridge design until we’ve actually run some simulations. Indeed, I’d consider the simulations the core part of the bridge-safety-engineer’s job; trying to extrapolate from existing bridges would be at most an interesting side-project.
But if the bridge ends up standing, does it matter whether we were able to guarantee/verify the design or not?
The problem is model uncertainty. Simulations of a bridge have very little model uncertainty—if the simulation stands, then we can be pretty darn confident the bridge will stand. Extrapolating from existing data to a record-breaking new system has a lot of model uncertainty. There’s just no way one can ever achieve sufficient levels of confidence with that kind of outside-view reasoning—we need the levels of certainty which come with a detailed, inside-view understanding of the system.
Go find an engineer who designs bridges, or buildings, or something. Ask them: if they were designing the world’s largest bridge, would it matter whether they had verified the design was safe, so long as the bridge stood up?