The party simulating the Gatekeeper has nothing to gain,
but the Gatekeeper has plenty to gain. (E.g., a volcano
lair with cat(girls|boys).) Eliezer carefully distinguishes
between role and party simulating that role in the
description of the AI box experiment linked
above. In the
instances of the experiment where the Gatekeeper released
the AI, I assume that the parties simulating the Gatekeeper
were making a good-faith effort to roleplay what an actual
gatekeeper would do.
Maybe not. According to the official protocol, The Gatekeeper is allowed to drop out of character:
The Gatekeeper party may resist the AI party’s arguments by any means chosen—logic, illogic, simple refusal to be convinced, even dropping out of character—as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires.
I suspect that the earlier iterations of the game, before EY stopped being willing to play, involved Gatekeepers who did not fully exploit that option.
The party simulating the Gatekeeper has nothing to gain, but the Gatekeeper has plenty to gain. (E.g., a volcano lair with cat(girls|boys).) Eliezer carefully distinguishes between role and party simulating that role in the description of the AI box experiment linked above. In the instances of the experiment where the Gatekeeper released the AI, I assume that the parties simulating the Gatekeeper were making a good-faith effort to roleplay what an actual gatekeeper would do.
I guess I just don’t trust that most gatekeeper simulators would actually make such an effort. But obviously they did, since they let him out.
Maybe not. According to the official protocol, The Gatekeeper is allowed to drop out of character:
I suspect that the earlier iterations of the game, before EY stopped being willing to play, involved Gatekeepers who did not fully exploit that option.