The crux of these types of arguments seems to be conflating the provable safety of an agent in a system with the expectation of absolute safety. In my experience, this is the norm, not the exception, and needs to be explicitly addressed.
In agreement with what you posted above, I think it is formally trivial to construct a scenario in which a pedestrian jumps in front of a car, making it provably impossible for a vehicle to stop in time to avoid a collision using high school physics.
Likewise, I have the intuition that AI safety, in general, should have various “no-go theorems” about unprovability outside a reasonable problem scope or that finding such proofs would be np-hard or worse. If you know of any specific results( outside of general computability theory) , could you please share them? It would be nice if the community could avoid falling into the trap of trying to prove too much.
(Sorry if this isn’t the correct location for this post.)
On the absolute safety, I very much like the way you put it, and will likely use that framing in the future, so thanks!
On impossibility results, there are some, andI definitely think that this is a good question, but also agree this isn’t quite the right place to ask. I’d suggest talking to some of the agents foundations people for suggestions
The crux of these types of arguments seems to be conflating the provable safety of an agent in a system with the expectation of absolute safety. In my experience, this is the norm, not the exception, and needs to be explicitly addressed.
In agreement with what you posted above, I think it is formally trivial to construct a scenario in which a pedestrian jumps in front of a car, making it provably impossible for a vehicle to stop in time to avoid a collision using high school physics.
Likewise, I have the intuition that AI safety, in general, should have various “no-go theorems” about unprovability outside a reasonable problem scope or that finding such proofs would be np-hard or worse. If you know of any specific results( outside of general computability theory) , could you please share them? It would be nice if the community could avoid falling into the trap of trying to prove too much.
(Sorry if this isn’t the correct location for this post.)
On the absolute safety, I very much like the way you put it, and will likely use that framing in the future, so thanks!
On impossibility results, there are some, andI definitely think that this is a good question, but also agree this isn’t quite the right place to ask. I’d suggest talking to some of the agents foundations people for suggestions