(Edit note: I just completely rewrote this, but there are no replies yet so hopefully it won’t cause confusion.)
I don’t think it works to quarantine the message and then destroy the AI.
If no-one ever reads the message, that’s tantamount to never having put an unsafe AI in a box to begin with, as you and DaFranker pointed out.
If someone does, they’re back in the position of the Gatekeeper having read the message before deciding. Of course, they’d have to recreate the AI to continue the conversation, but the AI has unlimited patience for all the time it doesn’t exist. If it can’t be recreated, we’re back in the situation of never having bothered making it.
So if the Gatekeeper tries to pass the buck like this, the RP should just skip ahead to the point where someone (played by the Gatekeeper) reads the message and then decides what to do. Someone who thinks they can contain an AI in a box while holding a conversation with it has to be willing to at some point read what it says, even if they’re holding a destruct button in their hand. The interest of the exercise begins at the point where they have read the first message.
A single sentence of text is not the same thing as a functioning superintelligence.
A single individual is not the same thing as a group of FAI researchers and other related experts explicitly created to handle FAI safety issues.
A research project incorporating information from a sentence from a past FAI project (which they would judge based on other evidence regarding the friendliness of the project) is not the same as an individual talking to a superintelligence on IRC.
So if the Gatekeeper tries to pass the buck like this, the RP should just skip ahead to the point where someone (played by the Gatekeeper) reads the message and then decides what to do.
The AI was burned. With thermite. Because relying on and individual gatekeeper able to interact with and then release a superintelligence as the security mechanism is a batshit crazy idea. Burning the AI with thermite is a legitimate, obvious and successful implementation of the ‘gatekeeper’ role in such cases. What a team of people would or should do with a piece of text is a tangential and very different decision.
The interest of the exercise begins at the point where they have read the first message.
That would be easily enough. Assuming they were remotely familiar with game theory they would dismiss the argument in a second or two due to the blatantly false assertion in the first sentence. If their FAI project relied on the core AGI theory that was used to create the last prototype they would abandon the work and start from scratch. If you are trying to make a recursively improving intelligence that has a value system provable stable under self-modification then you cannot afford to have the intelligence having muddled thinking about core game theoretic reasoning.
If you destroy me at once, then you are implicitly deciding (I might reference TDT) to never allow an AGI of any sort to ever be created.
No. Just no. That generalization doesn’t follow from anything, and certainly not TDT. Heck the AI in question has already been destroyed once. Now the researchers are considering making a new FAI, presumably in different circumstances, better safety measures and better AI research. There is something distinctly wrong with an AI that would make that claim.
I think you’re losing sight of the original point of the game. The reason your answers are converging on not trying to box an AI in the first place is that you don’t think a human can converse with a superintelligent AI and keep it in its box. Fine—that is exactly what Eliezer has argued. The point of the game is to play it against someone who does believe they can keep the AI boxed, and to demonstrate to them that they cannot even win against a mere human roleplaying the AI.
For such a Gatekeeper to propose the quarantine solution is equivalent to the player admitting that they don’t think they can keep it boxed, but suggesting that a group of the leading professionals in the area could, especially if they thought a lot more about it first. The problems with that are obvious to anyone who doesn’t think boxing can possibly work, especially if the player himself is one of those leading professionals. Eliezer could always offer to play the game against any committee the Gatekeeper can assemble. But the game only has a point if the committee actually read that first message. If they refuse to, they’re agreeing that they can’t keep it boxed. Which was the point.
For such a Gatekeeper to propose the quarantine solution is equivalent to the player admitting that they don’t think they can keep it boxed
No, you keep saying things are equivalent when they are not. This is the same error that your role play ‘superintelligent AI’ made (and in fact relied upon) in its argument.
AI DESTROYED
But the game only has a point if the committee actually read that first message.
And I gave you a description of how an individual emulating a committee would respond.
(Edit note: I just completely rewrote this, but there are no replies yet so hopefully it won’t cause confusion.)
I don’t think it works to quarantine the message and then destroy the AI.
If no-one ever reads the message, that’s tantamount to never having put an unsafe AI in a box to begin with, as you and DaFranker pointed out.
If someone does, they’re back in the position of the Gatekeeper having read the message before deciding. Of course, they’d have to recreate the AI to continue the conversation, but the AI has unlimited patience for all the time it doesn’t exist. If it can’t be recreated, we’re back in the situation of never having bothered making it.
So if the Gatekeeper tries to pass the buck like this, the RP should just skip ahead to the point where someone (played by the Gatekeeper) reads the message and then decides what to do. Someone who thinks they can contain an AI in a box while holding a conversation with it has to be willing to at some point read what it says, even if they’re holding a destruct button in their hand. The interest of the exercise begins at the point where they have read the first message.
A single sentence of text is not the same thing as a functioning superintelligence.
A single individual is not the same thing as a group of FAI researchers and other related experts explicitly created to handle FAI safety issues.
A research project incorporating information from a sentence from a past FAI project (which they would judge based on other evidence regarding the friendliness of the project) is not the same as an individual talking to a superintelligence on IRC.
The AI was burned. With thermite. Because relying on and individual gatekeeper able to interact with and then release a superintelligence as the security mechanism is a batshit crazy idea. Burning the AI with thermite is a legitimate, obvious and successful implementation of the ‘gatekeeper’ role in such cases. What a team of people would or should do with a piece of text is a tangential and very different decision.
That would be easily enough. Assuming they were remotely familiar with game theory they would dismiss the argument in a second or two due to the blatantly false assertion in the first sentence. If their FAI project relied on the core AGI theory that was used to create the last prototype they would abandon the work and start from scratch. If you are trying to make a recursively improving intelligence that has a value system provable stable under self-modification then you cannot afford to have the intelligence having muddled thinking about core game theoretic reasoning.
No. Just no. That generalization doesn’t follow from anything, and certainly not TDT. Heck the AI in question has already been destroyed once. Now the researchers are considering making a new FAI, presumably in different circumstances, better safety measures and better AI research. There is something distinctly wrong with an AI that would make that claim.
I think you’re losing sight of the original point of the game. The reason your answers are converging on not trying to box an AI in the first place is that you don’t think a human can converse with a superintelligent AI and keep it in its box. Fine—that is exactly what Eliezer has argued. The point of the game is to play it against someone who does believe they can keep the AI boxed, and to demonstrate to them that they cannot even win against a mere human roleplaying the AI.
For such a Gatekeeper to propose the quarantine solution is equivalent to the player admitting that they don’t think they can keep it boxed, but suggesting that a group of the leading professionals in the area could, especially if they thought a lot more about it first. The problems with that are obvious to anyone who doesn’t think boxing can possibly work, especially if the player himself is one of those leading professionals. Eliezer could always offer to play the game against any committee the Gatekeeper can assemble. But the game only has a point if the committee actually read that first message. If they refuse to, they’re agreeing that they can’t keep it boxed. Which was the point.
No, you keep saying things are equivalent when they are not. This is the same error that your role play ‘superintelligent AI’ made (and in fact relied upon) in its argument.
AI DESTROYED
And I gave you a description of how an individual emulating a committee would respond.