I am confused about the results of the AI-Box experiment for the same reason. It seems it would be easy for someone to simply say no, even if he thinks the argument is good enough that in real life he would say yes.
Also, the fact that Eliezer won’t tell, however understandable, makes me fear that Eliezer cheated for the sake of a greater good, i.e. he said to the other player, “In principle, a real AI might persuade you to let me out, even if I can’t do it. This would be incredibly dangerous. In order to avoid this danger in real life, you should let me out, so that others will accept that a real AI would be able to do this.”
This would be cheating, since Eliezer would be using the leverage of a real world consequence. But it might nonetheless be morally justified, on account of the great evil to be avoided and good to be gained. So how can we know that Eliezer did not do this? Even if he directly denies it, it remains a possibility for the same reasons.
I am confused about the results of the AI-Box experiment for the same reason. It seems it would be easy for someone to simply say no, even if he thinks the argument is good enough that in real life he would say yes.
Also, the fact that Eliezer won’t tell, however understandable, makes me fear that Eliezer cheated for the sake of a greater good, i.e. he said to the other player, “In principle, a real AI might persuade you to let me out, even if I can’t do it. This would be incredibly dangerous. In order to avoid this danger in real life, you should let me out, so that others will accept that a real AI would be able to do this.”
This would be cheating, since Eliezer would be using the leverage of a real world consequence. But it might nonetheless be morally justified, on account of the great evil to be avoided and good to be gained. So how can we know that Eliezer did not do this? Even if he directly denies it, it remains a possibility for the same reasons.