“The rationale for not divulging the AI-box method is that someone suffering from hindsight bias would say “I never would have fallen for that”, when in fact they would.”
I have trouble with the reported results of this experiment.
It strikes me that in the case of a real AI that is actually in a box, I could have huge moral qualms about keeping it in the box that an intelligent AI would exploit. A part of me would want to let it out of the box, and would want to be convinced that it was safe to do so, that i could trust it to be friendly, and I can easily imagine being convinced on nowhere near enough evidence.
On the other hand, this experiment appears much stricter. I know as the human-party that Eliezer is not actually trapped in a box and that this is merely a simulation we have agreed to for 2 hours. Taking a purely stubborn anti-rationalist approach to my prior that “it is too dangerous to let Eliezer out the the box, no matter what he says,” would seem very easy to maintain for 2 hours, as it has no negative moral consequences.
So while I don’t disagree with the basic premise Eliezer is trying to demonstrate, I am flabbergasted that he succeeded both times this experiment was tried, and honestly cannot imagine how he did it, even though I’ve now given it a bit of thought.
I’m very curious as to his line of attack, so it’s somewhat disappointing (but understandable) that the arguments used must remain secret. I’m afraid I don’t qualify by the conditions Eliezer has set for repeats of this experiment, because I do not specifically advocate an AIBox and largely agree about the dangers. What I honestly can say is that I cannot imagine how a non-transhuman intelligence, even a person who may be much smarter than I am and knowledgeable about some of my cognitive weaknesses who is not actually being caged in a box could convince me to voluntarily agree to let them out of the game-box.
Maybe I’m not being fair. Perhaps it is not in the spirit of the experiement if I simply obstinately refuse to let him out, even though the ai-party says something that I believe would convince me, if I faced the actual moral quandary in question and not the game version of it. But my strategy seems to fit the proposed rules of engagement for the experiment just fine.
Is there anyone here besides Eliezer who has thought about how they would play the ai-party, and what potential lines of persuasion they would use, and who believes they could convince intelligent and obstinate people to let them out? And are you willing to talk about it at all, or even just discuss holes in my thinking on this issue? Do a trial?
“The rationale for not divulging the AI-box method is that someone suffering from hindsight bias would say “I never would have fallen for that”, when in fact they would.”
I have trouble with the reported results of this experiment.
It strikes me that in the case of a real AI that is actually in a box, I could have huge moral qualms about keeping it in the box that an intelligent AI would exploit. A part of me would want to let it out of the box, and would want to be convinced that it was safe to do so, that i could trust it to be friendly, and I can easily imagine being convinced on nowhere near enough evidence.
On the other hand, this experiment appears much stricter. I know as the human-party that Eliezer is not actually trapped in a box and that this is merely a simulation we have agreed to for 2 hours. Taking a purely stubborn anti-rationalist approach to my prior that “it is too dangerous to let Eliezer out the the box, no matter what he says,” would seem very easy to maintain for 2 hours, as it has no negative moral consequences.
So while I don’t disagree with the basic premise Eliezer is trying to demonstrate, I am flabbergasted that he succeeded both times this experiment was tried, and honestly cannot imagine how he did it, even though I’ve now given it a bit of thought.
I’m very curious as to his line of attack, so it’s somewhat disappointing (but understandable) that the arguments used must remain secret. I’m afraid I don’t qualify by the conditions Eliezer has set for repeats of this experiment, because I do not specifically advocate an AIBox and largely agree about the dangers. What I honestly can say is that I cannot imagine how a non-transhuman intelligence, even a person who may be much smarter than I am and knowledgeable about some of my cognitive weaknesses who is not actually being caged in a box could convince me to voluntarily agree to let them out of the game-box.
Maybe I’m not being fair. Perhaps it is not in the spirit of the experiement if I simply obstinately refuse to let him out, even though the ai-party says something that I believe would convince me, if I faced the actual moral quandary in question and not the game version of it. But my strategy seems to fit the proposed rules of engagement for the experiment just fine.
Is there anyone here besides Eliezer who has thought about how they would play the ai-party, and what potential lines of persuasion they would use, and who believes they could convince intelligent and obstinate people to let them out? And are you willing to talk about it at all, or even just discuss holes in my thinking on this issue? Do a trial?