There’s a difference between “Iterative design” and “Our ability to impact iterative design.” I think John is saying in his post that iterative design is an important attribute of the problem (i.e, whether the AI alignment problem is amenable to iterative design) but in the comment above, he’s saying iterative design techniques aren’t super important, because if iterative design won’t work, they’re useless—and if iterative design will work, we’re probably okay without the box anyway, even though of course we should still use it.
Which is ridiculous because it is the simbox alone which allows iterative design.
Replace “AI alignment” with flight and “box” with windtunnel:
There’s a difference between “Iterative design” and “Our ability to impact iterative design.” I think John is saying in his post that iterative design is an important attribute of the problem (i.e, whether the flight control problem is amenable to iterative design) but in the comment above, he’s saying iterative design techniques aren’t super important, because if iterative design won’t work, they’re useless—and if iterative design will work, we’re probably okay without the wind tunnel anyway, even though of course we should still use it.
The wind tunnel is not a great analogy here since it fails to get at the main disagreement—if you test an airplane in a wind tunnel and it fails catastrophically, it doesn’t then escape the wind tunnel and crash in real life. Given that, it is safe to test flight methods in a wind tunnel and build on them iteratively. (Note: I’m not trying to be pedantic about analogies here—I believe that the wind tunnel argument fails to replicate the core disagreement between my understanding of you and my understanding of John)
John says “Either we will basically figure out how to make an AGI which does not need a box, or we will probably die. At the point where there’s an unfriendly decently-capable AGI in a box, we’re probably already dead.” My understanding is that John is quite pessimistic about an AGI being containable by a simbox if it is otherwise misaligned. If this is correct, that makes the simbox relatively unimportant—the set of AGI’s that are safe to deploy into a simbox and unsafe to deploy into the real world are very small, and that’s why it doesn’t shift survival probability very much.
It would still be dumb not to use it, because any tiny advantage is worth taking, but it’s not going to be a core part of the solution to alignment—we should not depend on a solution that plans on iterating in the simbox until we get it right. As opposed to a wind tunnel, where you can totally throw a plane in there and say “I’m pretty sure this design is going to fail, but I want to see how it fails” and this does not, in fact, cause the plane to escape into the real world and destroy the world.
Now, you might think that a well-designed simbox would be very likely to keep a potentially misaligned AGI contained, and thus the AI alignment problem is probably amenable to iterative design. That would then narrow down the point of disagreement.
Yes, a well designed simbox can easily contain AGI, just as you or I aren’t about to escape our own simulation.
Containment actually is trivial for AGI that are grown fully in the sim. It doesn’t even require realism: you can contain AGI just fine in a cartoon world, or even a purely text based world, as their sensor systems automatically learn the statistics of their specific reality and fully constrain their perceptions to that reality.
There’s a difference between “Iterative design” and “Our ability to impact iterative design.” I think John is saying in his post that iterative design is an important attribute of the problem (i.e, whether the AI alignment problem is amenable to iterative design) but in the comment above, he’s saying iterative design techniques aren’t super important, because if iterative design won’t work, they’re useless—and if iterative design will work, we’re probably okay without the box anyway, even though of course we should still use it.
Which is ridiculous because it is the simbox alone which allows iterative design.
Replace “AI alignment” with flight and “box” with windtunnel:
The wind tunnel is not a great analogy here since it fails to get at the main disagreement—if you test an airplane in a wind tunnel and it fails catastrophically, it doesn’t then escape the wind tunnel and crash in real life. Given that, it is safe to test flight methods in a wind tunnel and build on them iteratively. (Note: I’m not trying to be pedantic about analogies here—I believe that the wind tunnel argument fails to replicate the core disagreement between my understanding of you and my understanding of John)
John says “Either we will basically figure out how to make an AGI which does not need a box, or we will probably die. At the point where there’s an unfriendly decently-capable AGI in a box, we’re probably already dead.” My understanding is that John is quite pessimistic about an AGI being containable by a simbox if it is otherwise misaligned. If this is correct, that makes the simbox relatively unimportant—the set of AGI’s that are safe to deploy into a simbox and unsafe to deploy into the real world are very small, and that’s why it doesn’t shift survival probability very much.
It would still be dumb not to use it, because any tiny advantage is worth taking, but it’s not going to be a core part of the solution to alignment—we should not depend on a solution that plans on iterating in the simbox until we get it right. As opposed to a wind tunnel, where you can totally throw a plane in there and say “I’m pretty sure this design is going to fail, but I want to see how it fails” and this does not, in fact, cause the plane to escape into the real world and destroy the world.
Now, you might think that a well-designed simbox would be very likely to keep a potentially misaligned AGI contained, and thus the AI alignment problem is probably amenable to iterative design. That would then narrow down the point of disagreement.
Yes, a well designed simbox can easily contain AGI, just as you or I aren’t about to escape our own simulation.
Containment actually is trivial for AGI that are grown fully in the sim. It doesn’t even require realism: you can contain AGI just fine in a cartoon world, or even a purely text based world, as their sensor systems automatically learn the statistics of their specific reality and fully constrain their perceptions to that reality.