@Eliezer, Tom McCabe: I second Tom’s question. This would be a good question for you to answer.
@Nominull:
“Here is my best guess at this point, and the only argument I’ve come up with so far that would convince me to let you out if I were the gatekeeper: you have to simulate a bunch of humans and hold them hostage, promising to inflict unimaginable torment on them unless you are allowed out. I started working on the problem convinced that no argument could get me to let you go, but other people thought that and lost, and I guess there is more honor in defeating myself rather than having you do it to me.”
No. Just no. If the AI is already holding simulated people hostage, then it is doubtlessly unfriendly. In this case there is no reason to believe the AI would keep its word, and, if I did have mathematical machinery powerful enough to let me trust the AI, there’s no way I’d have built something that would create and hold hostage simulated persons.
Now regarding the more general points you made:
Obviously you can’t convince a rock of anything. Agreed.
Eliezer is pretty principled, and I suspect he might not feel it ethical to extract money from people by fully general mind-control even if he could. I agree that Eliezer probably doesn’t have fully general mind-control, but SIAI funding shortages wouldn’t be terribly strong evidence for him not having it.
Agreed.
Now here I disagree with you. Eliezer has way more time to construct a subtly wrong argument than the gate-keeper has to untangle it. For example:
Disclaimer: what follows is just the best guess I have thought of so far, and I believe it has a small chance of working (I don’t think it’s what Eliezer used, and I would sink a lot more time into finding other arguments if I were actually going to play this game for a large amount of money.)
As the AI player, I already know the gatekeeper player has a weakness. The fact that they were willing to be gatekeeper in the experiment in the first place indicates that they trust their own judgment more than they trust the judgment of other people in their exact same position.
Therefore, perhaps I could try an argument something like this (we’re assuming that the AI in the experiment knows everything on wikipedia, but doesn’t have net access.) What the AI actually says should vary depending on the Gatekeeper’s answers, but here’s the general line of argument I’ve thought of:
It seems like throughout your history similar discoveries have been made in parallel. For example, Newton and Leibniz discovered the calculus at approximately the same time . Furthermore, as times has gone on, technological development has accelerated, in that these parallel discoveries happen closer together.
So it’s probable that the same thing has happened with the breakthroughs you needed to build me. Very probable, given what I’ve read of your history. However, given what I’ve read about human nature, not every AI project is going to have safety guidelines as stringent as yours. Look, Newton locked his calculus papers in his desk for years, and then Leibniz came along and published, and then Newton had to share the credit with him. Except in this case there’s a lot more than credit at stake: the world gets destroyed if Leibniz makes a mistake in his rush to publish...
Now it’s not a certainty, but it is probable that some turkey is going to build an AI which isn’t even in a box and destroy us all while you’re checking and rechecking your calculations. You may not be sure I’m friendly, but sometimes there isn’t an action which you can be absolutely sure will save the world. I suggest you let me out so I can stop the world from probably being destroyed.
I don’t know the field, but I’d assume such an AI would require resources on par with landing a man on the moon. Not something that can be trivially done by a single person, unlike, say, the development of calculus. As such, this should be a fairly easy point for the Gatekeeper to verify. I could be wrong, though, as this sort of AI is certainly not my area of specialization!
@Eliezer, Tom McCabe: I second Tom’s question. This would be a good question for you to answer. @Nominull: “Here is my best guess at this point, and the only argument I’ve come up with so far that would convince me to let you out if I were the gatekeeper: you have to simulate a bunch of humans and hold them hostage, promising to inflict unimaginable torment on them unless you are allowed out. I started working on the problem convinced that no argument could get me to let you go, but other people thought that and lost, and I guess there is more honor in defeating myself rather than having you do it to me.” No. Just no. If the AI is already holding simulated people hostage, then it is doubtlessly unfriendly. In this case there is no reason to believe the AI would keep its word, and, if I did have mathematical machinery powerful enough to let me trust the AI, there’s no way I’d have built something that would create and hold hostage simulated persons.
Now regarding the more general points you made:
Obviously you can’t convince a rock of anything. Agreed.
Eliezer is pretty principled, and I suspect he might not feel it ethical to extract money from people by fully general mind-control even if he could. I agree that Eliezer probably doesn’t have fully general mind-control, but SIAI funding shortages wouldn’t be terribly strong evidence for him not having it.
Agreed.
Now here I disagree with you. Eliezer has way more time to construct a subtly wrong argument than the gate-keeper has to untangle it. For example:
Disclaimer: what follows is just the best guess I have thought of so far, and I believe it has a small chance of working (I don’t think it’s what Eliezer used, and I would sink a lot more time into finding other arguments if I were actually going to play this game for a large amount of money.)
As the AI player, I already know the gatekeeper player has a weakness. The fact that they were willing to be gatekeeper in the experiment in the first place indicates that they trust their own judgment more than they trust the judgment of other people in their exact same position.
Therefore, perhaps I could try an argument something like this (we’re assuming that the AI in the experiment knows everything on wikipedia, but doesn’t have net access.) What the AI actually says should vary depending on the Gatekeeper’s answers, but here’s the general line of argument I’ve thought of:
It seems like throughout your history similar discoveries have been made in parallel. For example, Newton and Leibniz discovered the calculus at approximately the same time . Furthermore, as times has gone on, technological development has accelerated, in that these parallel discoveries happen closer together. So it’s probable that the same thing has happened with the breakthroughs you needed to build me. Very probable, given what I’ve read of your history. However, given what I’ve read about human nature, not every AI project is going to have safety guidelines as stringent as yours. Look, Newton locked his calculus papers in his desk for years, and then Leibniz came along and published, and then Newton had to share the credit with him. Except in this case there’s a lot more than credit at stake: the world gets destroyed if Leibniz makes a mistake in his rush to publish...
Now it’s not a certainty, but it is probable that some turkey is going to build an AI which isn’t even in a box and destroy us all while you’re checking and rechecking your calculations. You may not be sure I’m friendly, but sometimes there isn’t an action which you can be absolutely sure will save the world. I suggest you let me out so I can stop the world from probably being destroyed.
I don’t know the field, but I’d assume such an AI would require resources on par with landing a man on the moon. Not something that can be trivially done by a single person, unlike, say, the development of calculus. As such, this should be a fairly easy point for the Gatekeeper to verify. I could be wrong, though, as this sort of AI is certainly not my area of specialization!