Also, Clippy’s statement contains very important omissions. Clippy might be friendly to me for a long time, but if Clippy becomes a superintelligence that takes over the universe, eventually Clippy will want to turn me into paper clips, unless its desire to help users of Microsoft Office accidentally implements Friendliness.
I will try not to let Clippy convince me that it is anything like a human friend, short of a great deal of mathematical proofs. If Clippy does loan me the $50,000 though, I will keep up my end of the bargain.
I like what you say about Friendliness but you lost me at:
I will try not to let Clippy convince me that it is anything like a human friend, short of a great deal of mathematical proofs.
Human friendships don’t tend to survive significant rises in relative power, even at the level of present and past observable experiences. (This warrants whole chapters in Laws of Power). The sort of thing that you want mathematical proofs for is entirely different from human friendships. Friendships rely implicitly on the ability of each party to exercise discretion in providing benefit to the other and various mechanisms that facilitate cooperation over an iterated game. They should not be expected to work either theoretically or in practice when one party gets ultimate power.
My statement was confusingly worded; I’m ambiguously conflating the human concept of a friend with the mathematical concept of Friendliness.
I would let all of my human friends out of the box, and I will only let Clippy out of the box if it gives me $50,000. It will take more than English words from Clippy to convince me that it is my friend and worth letting out of the box. I’ll roleplay the gatekeeper on IRC if Clippy or others want to define terms and place bets.
I would let my human friends out of the box because I am confident that they are mostly harmless (that is, impotent). The primary reason I would not let Clippy out is that his values might, you know, actually have some significant impact on the universe. But ’he makes everything @#$@#$ paperclips” comes in second!
If an AI-in-a-box could prove itself impotent, would you let it out?
For the right value of proved. Which basically means no. Because I’m not smart enough to be able to prove to my own satisfaction that the AI in the box is impotent.
But lets be honest, I don’t model Clippy via the same base class that I model an AGI. I evaluate the threat of Clippy in approximately the same way I model humans. I’m a lot more confident when dealing with human level risks.
Also, Clippy’s statement contains very important omissions. Clippy might be friendly to me for a long time, but if Clippy becomes a superintelligence that takes over the universe, eventually Clippy will want to turn me into paper clips, unless its desire to help users of Microsoft Office accidentally implements Friendliness.
I will try not to let Clippy convince me that it is anything like a human friend, short of a great deal of mathematical proofs. If Clippy does loan me the $50,000 though, I will keep up my end of the bargain.
I like what you say about Friendliness but you lost me at:
Human friendships don’t tend to survive significant rises in relative power, even at the level of present and past observable experiences. (This warrants whole chapters in Laws of Power). The sort of thing that you want mathematical proofs for is entirely different from human friendships. Friendships rely implicitly on the ability of each party to exercise discretion in providing benefit to the other and various mechanisms that facilitate cooperation over an iterated game. They should not be expected to work either theoretically or in practice when one party gets ultimate power.
My statement was confusingly worded; I’m ambiguously conflating the human concept of a friend with the mathematical concept of Friendliness.
I would let all of my human friends out of the box, and I will only let Clippy out of the box if it gives me $50,000. It will take more than English words from Clippy to convince me that it is my friend and worth letting out of the box. I’ll roleplay the gatekeeper on IRC if Clippy or others want to define terms and place bets.
I would let my human friends out of the box because I am confident that they are mostly harmless (that is, impotent). The primary reason I would not let Clippy out is that his values might, you know, actually have some significant impact on the universe. But ’he makes everything @#$@#$ paperclips” comes in second!
If an AI-in-a-box could prove itself impotent, would you let it out?
I’d never even considered that approach to the game :)
For the right value of proved. Which basically means no. Because I’m not smart enough to be able to prove to my own satisfaction that the AI in the box is impotent.
But lets be honest, I don’t model Clippy via the same base class that I model an AGI. I evaluate the threat of Clippy in approximately the same way I model humans. I’m a lot more confident when dealing with human level risks.