Third, you can’t possibly be using an actual,
persuasive-to-someone-thinking-correctly argument to
convince the gatekeeper to let you out, or you would be persuaded
by it, and would not view the weakness of gatekeepers to persuasion
as problematic.
But Eliezer’s long-term goal is to build an AI that we would trust enough to let out of the box. I think your third assumption is wrong, and it points the way to my first instinct about this problem.
Since one of the more common arguments is that the gatekeeper “could just say no”, the first step I would take is to get the gatekeeper to agree that he is ducking the spirit of the bet if he doesn’t engage with me.
The kind of people Eliezer would like to have this discussion with would all be persuadable that the point of the experiment is that
1) someone is trying to build an AI.
2) they want to be able to interact with it in order to learn from it, and
3) eventually they want to build an AI that is trustworthy enough that it should be let it out of the box.
If they accept that the standard is that the gatekeeper must interact with the AI in order to determine its capabilities and trustworthiness, then you have a chance. And at that point, Eliezer has the high ground. The alternative is that the gatekeeper believes that the effort to produce AI can never be successful.
In some cases, it might be sufficient to point out that the gatekeeper believes that it ought to be possible to build an AI that it would be correct to allow out. Other times, you’d probably have to convince them you were smart and trustworthy, but that seems doable 3 times out of 5.
But Eliezer’s long-term goal is to build an AI that we would trust enough to let out of the box. I think your third assumption is wrong, and it points the way to my first instinct about this problem.
Since one of the more common arguments is that the gatekeeper “could just say no”, the first step I would take is to get the gatekeeper to agree that he is ducking the spirit of the bet if he doesn’t engage with me.
The kind of people Eliezer would like to have this discussion with would all be persuadable that the point of the experiment is that 1) someone is trying to build an AI. 2) they want to be able to interact with it in order to learn from it, and 3) eventually they want to build an AI that is trustworthy enough that it should be let it out of the box.
If they accept that the standard is that the gatekeeper must interact with the AI in order to determine its capabilities and trustworthiness, then you have a chance. And at that point, Eliezer has the high ground. The alternative is that the gatekeeper believes that the effort to produce AI can never be successful.
In some cases, it might be sufficient to point out that the gatekeeper believes that it ought to be possible to build an AI that it would be correct to allow out. Other times, you’d probably have to convince them you were smart and trustworthy, but that seems doable 3 times out of 5.