That may be the crux. I’m generally of the mindset that “can’t guarantee/verify” implies “completely useless for AI safety”. Verifying that’s it’s safe is the whole point of AI safety research. If we were hoping to make something that just happened to be safe even though we couldn’t guarantee it beforehand or double-check afterwards, that would just be called “AI”
Surely “the whole point of AI safety research” is just to save the world, no? If the world ends up being saved, does it matter whether we were able to “verify” that or not? From my perspective, as a utilitarian, it seems to me that the only relevant question is how some particular intervention/research/etc. affects the probability of AI being good for humanity (or the EV, to be precise). It certainly seems quite useful to be able to verify lots of stuff to achieve that goal, but I think it’s worth being clear that verification is an instrumental goal not a terminal one—and that there might be other possible ways to achieve that terminal goal (understanding empirical questions, for example, as Rohin wanted to do in this thread). At the very least, I certainly wouldn’t go around saying that verification is “the whole point of AI safety research.”
Surely “the whole point of AI safety research” is just to save the world, no?
Suppose you’re an engineer working on a project to construct the world’s largest bridge (by a wide margin). You’ve been tasked with safety: designing the bridge so that it does not fall down.
One assistant comes along and says “I have reviewed the data on millions of previously-built bridges as well as record-breaking bridges specifically. Extrapolating the data forward, it is unlikely that our bridge will fall down if we just scale-up a standard, traditional design.”
Now, that may be comforting, but I’m still not going to move forward with that bridge design until we’ve actually run some simulations. Indeed, I’d consider the simulations the core part of the bridge-safety-engineer’s job; trying to extrapolate from existing bridges would be at most an interesting side-project.
But if the bridge ends up standing, does it matter whether we were able to guarantee/verify the design or not?
The problem is model uncertainty. Simulations of a bridge have very little model uncertainty—if the simulation stands, then we can be pretty darn confident the bridge will stand. Extrapolating from existing data to a record-breaking new system has a lot of model uncertainty. There’s just no way one can ever achieve sufficient levels of confidence with that kind of outside-view reasoning—we need the levels of certainty which come with a detailed, inside-view understanding of the system.
If the world ends up being saved, does it matter whether we were able to “verify” that or not?
Go find an engineer who designs bridges, or buildings, or something. Ask them: if they were designing the world’s largest bridge, would it matter whether they had verified the design was safe, so long as the bridge stood up?
Surely “the whole point of AI safety research” is just to save the world, no? If the world ends up being saved, does it matter whether we were able to “verify” that or not? From my perspective, as a utilitarian, it seems to me that the only relevant question is how some particular intervention/research/etc. affects the probability of AI being good for humanity (or the EV, to be precise). It certainly seems quite useful to be able to verify lots of stuff to achieve that goal, but I think it’s worth being clear that verification is an instrumental goal not a terminal one—and that there might be other possible ways to achieve that terminal goal (understanding empirical questions, for example, as Rohin wanted to do in this thread). At the very least, I certainly wouldn’t go around saying that verification is “the whole point of AI safety research.”
Suppose you’re an engineer working on a project to construct the world’s largest bridge (by a wide margin). You’ve been tasked with safety: designing the bridge so that it does not fall down.
One assistant comes along and says “I have reviewed the data on millions of previously-built bridges as well as record-breaking bridges specifically. Extrapolating the data forward, it is unlikely that our bridge will fall down if we just scale-up a standard, traditional design.”
Now, that may be comforting, but I’m still not going to move forward with that bridge design until we’ve actually run some simulations. Indeed, I’d consider the simulations the core part of the bridge-safety-engineer’s job; trying to extrapolate from existing bridges would be at most an interesting side-project.
But if the bridge ends up standing, does it matter whether we were able to guarantee/verify the design or not?
The problem is model uncertainty. Simulations of a bridge have very little model uncertainty—if the simulation stands, then we can be pretty darn confident the bridge will stand. Extrapolating from existing data to a record-breaking new system has a lot of model uncertainty. There’s just no way one can ever achieve sufficient levels of confidence with that kind of outside-view reasoning—we need the levels of certainty which come with a detailed, inside-view understanding of the system.
Go find an engineer who designs bridges, or buildings, or something. Ask them: if they were designing the world’s largest bridge, would it matter whether they had verified the design was safe, so long as the bridge stood up?