In general it seems that gatekeepers who win are more willing to release the transcripts.
It’s also possible that the ‘best’ AI players are the ones most willing to pre-commit to not releasing transcripts, as not having your decisions (or the discussions that led to them) go public helps eliminate that particular disincentive to releasing the AI from the box.
Eliezer Yudkowsky has been let out as the AI at least twice[1][2] but both tests were precommitted to secrecy.
I’d be surprised if he’s the only one who has ever won as the AI, I think it more likely that this is a visibility issue (e.g. despite him being a very-high profile person in the AI safety memetic culture, you weren’t aware that Eliezer had won as the AI when you made your comment) and while I’m not aware of others who have won as the AI, I would place my bet on that being merely a lack of knowledge on my part, and not because no one else actually has.
this is further compounded by the fact that some (many?) games are conducted under a pre-commitment to secrecy, and the results that get the most discussion (and therefore, most visibility) are the ones with full transcripts for third-parties to pick through.
forgive me if I misunderstand you, but you seem to be implying that, on two separate occasions, two different people were (induced to?) lie about the outcome of an experiment.
So you’re implying that either Eliezer is dishonest, or both of his opponents were dishonest on his behalf. And you find this more likely than an actual AI win in the game?
EY’s handling of the basilisk issue can be called many things (clumsy, rushed, unwise, badly thought out, counterproductive, poster child for the Streisand effect), but it was not deceitful.
Did you deliberately phrase that (“letting the transcript out”) so as to hint at an AI-Box-Box game, in which one player’s goal is to convince the other to release the transcript of an earlier AI-Box game, while the other tries to keep it secret?
Whoa, someone actually letting the transcript out. Has that ever been done before?
Yes, but only when the gatekeeper wins. If the AI wins, then they wouldn’t want the transcript to get out, because then their strategy would be less effective next time they played.
I would imagine that if we ever actually build such an AI, we would conduct some AI-box experiments to determine some AI strategies and figure out how to counter them. Humans who become the gatekeeper for the actual AI would be given the transcripts of AI-box experiment sessions to study as part of their gatekeeper training.
Letting out the transcript, then, would be a good thing. It would make the AI player’s job harder because in the next experiment the human player will be aware of those strategies, but when facing an actual AI, the human will be aware of those strategies.
Whoa, someone actually letting the transcript out. Has that ever been done before?
Actually, it has been done several times, but most of them are pretty boring.
I still don’t recall any where the gatekeeper lost.
In general it seems that gatekeepers who win are more willing to release the transcripts.
It’s also possible that the ‘best’ AI players are the ones most willing to pre-commit to not releasing transcripts, as not having your decisions (or the discussions that led to them) go public helps eliminate that particular disincentive to releasing the AI from the box.
Never still seems extraordinary. I find myself entertaining hypotheses like “maybe the AI has never actually won”.
Eliezer Yudkowsky has been let out as the AI at least twice[1][2] but both tests were precommitted to secrecy.
I’d be surprised if he’s the only one who has ever won as the AI, I think it more likely that this is a visibility issue (e.g. despite him being a very-high profile person in the AI safety memetic culture, you weren’t aware that Eliezer had won as the AI when you made your comment) and while I’m not aware of others who have won as the AI, I would place my bet on that being merely a lack of knowledge on my part, and not because no one else actually has.
this is further compounded by the fact that some (many?) games are conducted under a pre-commitment to secrecy, and the results that get the most discussion (and therefore, most visibility) are the ones with full transcripts for third-parties to pick through.
I was already aware of those public statements. I remain rather less than perfectly confident that Yudkowsky actually won.
forgive me if I misunderstand you, but you seem to be implying that, on two separate occasions, two different people were (induced to?) lie about the outcome of an experiment.
So you’re implying that either Eliezer is dishonest, or both of his opponents were dishonest on his behalf. And you find this more likely than an actual AI win in the game?
We already know from the Basilisk that Eliezer is willing to deceive the community.
EY’s handling of the basilisk issue can be called many things (clumsy, rushed, unwise, badly thought out, counterproductive, poster child for the Streisand effect), but it was not deceitful.
Awww. I didn’t actually read this one either, yet. Is this one boring?
I didn’t found it particularly interesting. Entertaining the idea of letting the AI out is far from the same as almost letting the AI out.
I can’t speak for myself, but at least it wasn’t boring to play. Polymathwannabe also said that he enjoyed the experiment enormously.
Did you deliberately phrase that (“letting the transcript out”) so as to hint at an AI-Box-Box game, in which one player’s goal is to convince the other to release the transcript of an earlier AI-Box game, while the other tries to keep it secret?
I probably had the phrasing primed and ready to go in my brain, but it wasn’t intentional.
Yes, but only when the gatekeeper wins. If the AI wins, then they wouldn’t want the transcript to get out, because then their strategy would be less effective next time they played.
I would imagine that if we ever actually build such an AI, we would conduct some AI-box experiments to determine some AI strategies and figure out how to counter them. Humans who become the gatekeeper for the actual AI would be given the transcripts of AI-box experiment sessions to study as part of their gatekeeper training.
Letting out the transcript, then, would be a good thing. It would make the AI player’s job harder because in the next experiment the human player will be aware of those strategies, but when facing an actual AI, the human will be aware of those strategies.
Doesn’t the same logic apply to the gatekeeper?
The Gatekeeper usually wants to publish if they win, to brag. Their strategy isn’t usually a secret, it’s simply to resist.