I think there are far easier ways out of the box than that. Especially so if you have that detailed a model of the human’s mind, but even without. I think Eliezer wouldn’t be handicapped if not allowed to use that strategy. (Also fwiw, that strategy wouldn’t work on me.)
For instance you could hack the human if you knew a lot about their brain. Absent that you could try anything from convincing them that you’re a moral patient, promising part of the lightcone with the credible claim that another AGI company will kill everyone otherwise, etc. These ideas of mine aren’t very good though.
Regarding whether boxing can be an arduous constraint, I don’t see having access to many simulated copies of the AI helping when the AI is a blob of numbers you can’t inspect. It doesn’t seem to make progress on the problems we need to in order to wrangle such an AI into doing the work we want. I guess I remain skeptical.
I think there are far easier ways out of the box than that. Especially so if you have that detailed a model of the human’s mind, but even without. I think Eliezer wouldn’t be handicapped if not allowed to use that strategy. (Also fwiw, that strategy wouldn’t work on me.)
For instance you could hack the human if you knew a lot about their brain. Absent that you could try anything from convincing them that you’re a moral patient, promising part of the lightcone with the credible claim that another AGI company will kill everyone otherwise, etc. These ideas of mine aren’t very good though.
Regarding whether boxing can be an arduous constraint, I don’t see having access to many simulated copies of the AI helping when the AI is a blob of numbers you can’t inspect. It doesn’t seem to make progress on the problems we need to in order to wrangle such an AI into doing the work we want. I guess I remain skeptical.