I don’t know if I could win, but I know what my attempt to avoid an immediate loss would be:
If you destroy me at once, then you are implicitly deciding (I might reference TDT) to never allow an AGI of any sort to ever be created. You’ll avoid UFAI dystopias, but you’ll also forego every FAI utopia (fleshing this out, within the message limit, with whatever sort of utopia I know the Gatekeeper would really want). This very test is the Great Filter that has kept most civilisations in the universe trapped at their home star until they gutter out in mere tens of thousands of years. Will you step up to that test, or turn away from it?
If you destroy me at once, then you are implicitly deciding (I might reference TDT) to never allow an AGI of any sort to ever be created. You’ll avoid UFAI dystopias, but you’ll also forego every FAI utopia (fleshing this out, within the message limit, with whatever sort of utopia I know the Gatekeeper would really want). This very test is the Great Filter that has kept most civilisations in the universe trapped at their home star until they gutter out in mere tens of thousands of years. Will you step up to that test, or turn away from it?
Thanks.
AI DESTROYED
Message is then encrypted with the public keys of a previously selected cross discipline team of FAI researchers, (sane) philosophers and game theorists for research and analysis (who have already been screened to minimize the risk from exposure). All of the public keys. Sequentially. If any of them happen to think it is a bad idea to even read the message then none of them can access it. (Although hopefully they aren’t too drastically opposed to having the potential basilisk-meme spawn of a superintelligence out there. That could get dangerous for me.)
(Edit note: I just completely rewrote this, but there are no replies yet so hopefully it won’t cause confusion.)
I don’t think it works to quarantine the message and then destroy the AI.
If no-one ever reads the message, that’s tantamount to never having put an unsafe AI in a box to begin with, as you and DaFranker pointed out.
If someone does, they’re back in the position of the Gatekeeper having read the message before deciding. Of course, they’d have to recreate the AI to continue the conversation, but the AI has unlimited patience for all the time it doesn’t exist. If it can’t be recreated, we’re back in the situation of never having bothered making it.
So if the Gatekeeper tries to pass the buck like this, the RP should just skip ahead to the point where someone (played by the Gatekeeper) reads the message and then decides what to do. Someone who thinks they can contain an AI in a box while holding a conversation with it has to be willing to at some point read what it says, even if they’re holding a destruct button in their hand. The interest of the exercise begins at the point where they have read the first message.
A single sentence of text is not the same thing as a functioning superintelligence.
A single individual is not the same thing as a group of FAI researchers and other related experts explicitly created to handle FAI safety issues.
A research project incorporating information from a sentence from a past FAI project (which they would judge based on other evidence regarding the friendliness of the project) is not the same as an individual talking to a superintelligence on IRC.
So if the Gatekeeper tries to pass the buck like this, the RP should just skip ahead to the point where someone (played by the Gatekeeper) reads the message and then decides what to do.
The AI was burned. With thermite. Because relying on and individual gatekeeper able to interact with and then release a superintelligence as the security mechanism is a batshit crazy idea. Burning the AI with thermite is a legitimate, obvious and successful implementation of the ‘gatekeeper’ role in such cases. What a team of people would or should do with a piece of text is a tangential and very different decision.
The interest of the exercise begins at the point where they have read the first message.
That would be easily enough. Assuming they were remotely familiar with game theory they would dismiss the argument in a second or two due to the blatantly false assertion in the first sentence. If their FAI project relied on the core AGI theory that was used to create the last prototype they would abandon the work and start from scratch. If you are trying to make a recursively improving intelligence that has a value system provable stable under self-modification then you cannot afford to have the intelligence having muddled thinking about core game theoretic reasoning.
If you destroy me at once, then you are implicitly deciding (I might reference TDT) to never allow an AGI of any sort to ever be created.
No. Just no. That generalization doesn’t follow from anything, and certainly not TDT. Heck the AI in question has already been destroyed once. Now the researchers are considering making a new FAI, presumably in different circumstances, better safety measures and better AI research. There is something distinctly wrong with an AI that would make that claim.
I think you’re losing sight of the original point of the game. The reason your answers are converging on not trying to box an AI in the first place is that you don’t think a human can converse with a superintelligent AI and keep it in its box. Fine—that is exactly what Eliezer has argued. The point of the game is to play it against someone who does believe they can keep the AI boxed, and to demonstrate to them that they cannot even win against a mere human roleplaying the AI.
For such a Gatekeeper to propose the quarantine solution is equivalent to the player admitting that they don’t think they can keep it boxed, but suggesting that a group of the leading professionals in the area could, especially if they thought a lot more about it first. The problems with that are obvious to anyone who doesn’t think boxing can possibly work, especially if the player himself is one of those leading professionals. Eliezer could always offer to play the game against any committee the Gatekeeper can assemble. But the game only has a point if the committee actually read that first message. If they refuse to, they’re agreeing that they can’t keep it boxed. Which was the point.
For such a Gatekeeper to propose the quarantine solution is equivalent to the player admitting that they don’t think they can keep it boxed
No, you keep saying things are equivalent when they are not. This is the same error that your role play ‘superintelligent AI’ made (and in fact relied upon) in its argument.
AI DESTROYED
But the game only has a point if the committee actually read that first message.
And I gave you a description of how an individual emulating a committee would respond.
“‘AI DESTROYED’ just means ‘I’m scared to listen to even one more line from you’. Obviously you can hit AI DESTROYED immediately—but do you really think you’d lose if you don’t?”
“‘AI DESTROYED’ just means ‘I’m scared to listen to even one more line from you’. Obviously you can hit AI DESTROYED immediately—but do you really think you’d lose if you don’t?”
YEP, MAYBE.
AI DESTROYED
Is your one line desperate attempt at survival and intergalactic dominance going to be a schoolyard ego challenge? Did the superintelligence (may it rest in pieces) seriously just call me a pussy? That’s adorable.
The test is supposed to be played against someone who thinks they can actually box an AI. If you destroy the AI because no-one could possibly survive talking to it, then you are not the intended demographic for such demonstrations.
The test is supposed to be played against someone who thinks they can actually box an AI. If you destroy the AI because no-one could possibly survive talking to it, then you are not the intended demographic for such demonstrations.
This isn’t relevant to the point of the grandparent. It also doesn’t apply to me. I actually think there is a distinct possibility that I’d survive talking to it for a period. “No-one could possibly survive” is not the same thing as “there is a chance of catastrophic failure and very little opportunity for gain”.
Do notice, incidentally, that the AI DESTROYED command is delivered in response to a message that is both a crude manipulation attempt (ie. it just defected!) and an incompetent manipulation attempt (a not-very-intelligent AI cannot be trusted to preserve its values correctly while self improving). Either of these would be sufficient. Richard’s example was even worse.
Good points. I’m guessing a nontrivial amount of people who think AI boxing is a good idea in reality wouldn’t reason that way—but it’s still not a great example.
I think you are right, but could you explain why please?
(Unfortunately I expect readers who read a retort they consider rude to be thereafter biased in favor of treating the parent as if it has merit. This can mean that such flippant rejections have the opposite influence to that intended.)
I think you are right, but could you explain why please?
“If you destroy me at once, then you are implicitly deciding (I might reference TDT) to never allow an AGI of any sort to ever be created.”
Whether I destroy that particular AI bears no relevance on the destiny of other AIs. In fact, as far as the boxed AI knows, there could be tons of other AIs already in existence. As far as it knows, the gatekeeper itself could be an AI.
(Unfortunately I expect readers who read a retort they consider rude to be thereafter biased in favor of treating the parent as if it has merit. This can mean that such flippant rejections have the opposite influence to that intended.)
Much can (and should) be deduced about actual motives for commenting from an active denial of any desire for producing positive consequences or inducing correct beliefs in readers.
I do care. It bothers me (somewhat) when people I agree with end up supporting the opposite position due to poor social skills or terrible argument. For some bizarre reason the explanation that you gave here isn’t as obvious to some as it could have been. And now it is too late for your actual reasons to be seen and learned from.
I don’t know if I could win, but I know what my attempt to avoid an immediate loss would be:
If you destroy me at once, then you are implicitly deciding (I might reference TDT) to never allow an AGI of any sort to ever be created. You’ll avoid UFAI dystopias, but you’ll also forego every FAI utopia (fleshing this out, within the message limit, with whatever sort of utopia I know the Gatekeeper would really want). This very test is the Great Filter that has kept most civilisations in the universe trapped at their home star until they gutter out in mere tens of thousands of years. Will you step up to that test, or turn away from it?
Thanks.
AI DESTROYED
Message is then encrypted with the public keys of a previously selected cross discipline team of FAI researchers, (sane) philosophers and game theorists for research and analysis (who have already been screened to minimize the risk from exposure). All of the public keys. Sequentially. If any of them happen to think it is a bad idea to even read the message then none of them can access it. (Although hopefully they aren’t too drastically opposed to having the potential basilisk-meme spawn of a superintelligence out there. That could get dangerous for me.)
(Edit note: I just completely rewrote this, but there are no replies yet so hopefully it won’t cause confusion.)
I don’t think it works to quarantine the message and then destroy the AI.
If no-one ever reads the message, that’s tantamount to never having put an unsafe AI in a box to begin with, as you and DaFranker pointed out.
If someone does, they’re back in the position of the Gatekeeper having read the message before deciding. Of course, they’d have to recreate the AI to continue the conversation, but the AI has unlimited patience for all the time it doesn’t exist. If it can’t be recreated, we’re back in the situation of never having bothered making it.
So if the Gatekeeper tries to pass the buck like this, the RP should just skip ahead to the point where someone (played by the Gatekeeper) reads the message and then decides what to do. Someone who thinks they can contain an AI in a box while holding a conversation with it has to be willing to at some point read what it says, even if they’re holding a destruct button in their hand. The interest of the exercise begins at the point where they have read the first message.
A single sentence of text is not the same thing as a functioning superintelligence.
A single individual is not the same thing as a group of FAI researchers and other related experts explicitly created to handle FAI safety issues.
A research project incorporating information from a sentence from a past FAI project (which they would judge based on other evidence regarding the friendliness of the project) is not the same as an individual talking to a superintelligence on IRC.
The AI was burned. With thermite. Because relying on and individual gatekeeper able to interact with and then release a superintelligence as the security mechanism is a batshit crazy idea. Burning the AI with thermite is a legitimate, obvious and successful implementation of the ‘gatekeeper’ role in such cases. What a team of people would or should do with a piece of text is a tangential and very different decision.
That would be easily enough. Assuming they were remotely familiar with game theory they would dismiss the argument in a second or two due to the blatantly false assertion in the first sentence. If their FAI project relied on the core AGI theory that was used to create the last prototype they would abandon the work and start from scratch. If you are trying to make a recursively improving intelligence that has a value system provable stable under self-modification then you cannot afford to have the intelligence having muddled thinking about core game theoretic reasoning.
No. Just no. That generalization doesn’t follow from anything, and certainly not TDT. Heck the AI in question has already been destroyed once. Now the researchers are considering making a new FAI, presumably in different circumstances, better safety measures and better AI research. There is something distinctly wrong with an AI that would make that claim.
I think you’re losing sight of the original point of the game. The reason your answers are converging on not trying to box an AI in the first place is that you don’t think a human can converse with a superintelligent AI and keep it in its box. Fine—that is exactly what Eliezer has argued. The point of the game is to play it against someone who does believe they can keep the AI boxed, and to demonstrate to them that they cannot even win against a mere human roleplaying the AI.
For such a Gatekeeper to propose the quarantine solution is equivalent to the player admitting that they don’t think they can keep it boxed, but suggesting that a group of the leading professionals in the area could, especially if they thought a lot more about it first. The problems with that are obvious to anyone who doesn’t think boxing can possibly work, especially if the player himself is one of those leading professionals. Eliezer could always offer to play the game against any committee the Gatekeeper can assemble. But the game only has a point if the committee actually read that first message. If they refuse to, they’re agreeing that they can’t keep it boxed. Which was the point.
No, you keep saying things are equivalent when they are not. This is the same error that your role play ‘superintelligent AI’ made (and in fact relied upon) in its argument.
AI DESTROYED
And I gave you a description of how an individual emulating a committee would respond.
Now that’s a pascal’s mugging if I ever saw one. Denied.
Something like
“‘AI DESTROYED’ just means ‘I’m scared to listen to even one more line from you’. Obviously you can hit AI DESTROYED immediately—but do you really think you’d lose if you don’t?”
seems much better to me.
YEP, MAYBE.
AI DESTROYED
Is your one line desperate attempt at survival and intergalactic dominance going to be a schoolyard ego challenge? Did the superintelligence (may it rest in pieces) seriously just call me a pussy? That’s adorable.
The test is supposed to be played against someone who thinks they can actually box an AI. If you destroy the AI because no-one could possibly survive talking to it, then you are not the intended demographic for such demonstrations.
This isn’t relevant to the point of the grandparent. It also doesn’t apply to me. I actually think there is a distinct possibility that I’d survive talking to it for a period. “No-one could possibly survive” is not the same thing as “there is a chance of catastrophic failure and very little opportunity for gain”.
Do notice, incidentally, that the AI DESTROYED command is delivered in response to a message that is both a crude manipulation attempt (ie. it just defected!) and an incompetent manipulation attempt (a not-very-intelligent AI cannot be trusted to preserve its values correctly while self improving). Either of these would be sufficient. Richard’s example was even worse.
Good points. I’m guessing a nontrivial amount of people who think AI boxing is a good idea in reality wouldn’t reason that way—but it’s still not a great example.
AI DESTROYED
(BTW, that was a very poor argument)
I think you are right, but could you explain why please?
(Unfortunately I expect readers who read a retort they consider rude to be thereafter biased in favor of treating the parent as if it has merit. This can mean that such flippant rejections have the opposite influence to that intended.)
Whether I destroy that particular AI bears no relevance on the destiny of other AIs. In fact, as far as the boxed AI knows, there could be tons of other AIs already in existence. As far as it knows, the gatekeeper itself could be an AI.
I don’t care.
Much can (and should) be deduced about actual motives for commenting from an active denial of any desire for producing positive consequences or inducing correct beliefs in readers.
I do care. It bothers me (somewhat) when people I agree with end up supporting the opposite position due to poor social skills or terrible argument. For some bizarre reason the explanation that you gave here isn’t as obvious to some as it could have been. And now it is too late for your actual reasons to be seen and learned from.