The original idea seems like a pretty unlikely failure mode too. It requires that the computer be generally capable of understanding context (or it wouldn’t be able to comprehend what it means to eradicate hunger, poverty, and death, even as an instruction it only pretends to follow), but that it fails to do so in the case of the paperclip command.
For that matter, the original idea’s failure mode and this failure mode aren’t all that different. One is “produce paperclips” that gets interpreted as “produce” and the other is “produce paperclips, with so-and-so limits” that gets interpreted as “produce paperclips”, it’s just that in that case the qualifier comes from a separate sentence, but either way the computer is interpreting the end of the command prematurely.
No, the original requires that it be able to understand context but really really want paperclips, and be willing to lie to make them. People actually told it to do something they didn’t want done.
It’s like the difference between a tricky djinn and the ‘ends in gry’ guy.
It’s like the difference between a tricky djinn and the ‘ends in gry’ guy.
Right, but the point is, a real-life UFAI isn’t going to have a utility function derived from a human’s verbal command. If it did, you could just order the genie to implement CEV, or shout “I call for my values to be fulfilled!”, and it would work. That’s thinking of AI in terms of sorcery rather than science.
According to my personal knowledge, various means of building AI preference functions might be employed, since research has found that the learning algorithms necessary to acquire knowledge and understanding are quite separate from decision-making algorithms necessary to start paper-clipping. Building an AI might actually consist of “train the learner for a year on corpora from human culture, develop an induced ‘internal programming language’, and only afterwards add a decision-making algorithm with a utility function phrased in terms of the induced concepts, which may as well include ‘goodness’”.
Don’t anthropomorphize the AGI. Real-world AI designs do have very steadfast goal systems, in some cases they are really incapable of being updated, period.
Think of it this way: the person designing the paperclip producing machine has a life and doesn’t want to be on-call 24⁄7 to come in and reboot the AI every time it gets distracted by assigning higher priority to some other goal, e.g. mopping the floors or watching videos of cats on the internet. So he hard-codes the paperclip-maximizing goal as the one priority the system can’t change.
I think my point still holds—the two examples aren’t different; one could give a similar explanation for the AI that stops at the word “produce” by suggesting that he hardcoded that as well.
Furthermore, you’re missing the context. The standard LW argument is that the AI produces infinite paperclips because the human can’t successfully program the AI to do what he means rather than exactly what he programs into it. If the human explicitly told the AI to prioritize paperclips over everything else, his mistake is not specifying a limit rather than trying to specify one and failing, so it’s not really the same kind of mistake.
The standard LW argument is that the AI produces infinite paperclips because the human can’t successfully program the AI to do what he means rather than exactly what he programs into it.
Is that different from what I was saying? My memory of the sequences, and from standard AI literature is that of paperclip maximizers as ‘simple’ utility maximizers with hard-coded utility functions. It’s relatively straight-forward to write an AI with a self-modifiable goal system. It is also very easy to write a system where its goals are unchanging. The problem of FAI which EY spends significant time explaining in the sequences is that we have no simple goal that we can program into a steadfast goal-driven system, and result in a moral creature. Nor does it even seem possible to write down such a goal, short of encoding a random sampling of human brains in complete detail.
This seems like a really unlikely failure mode.
The original idea seems like a pretty unlikely failure mode too. It requires that the computer be generally capable of understanding context (or it wouldn’t be able to comprehend what it means to eradicate hunger, poverty, and death, even as an instruction it only pretends to follow), but that it fails to do so in the case of the paperclip command.
For that matter, the original idea’s failure mode and this failure mode aren’t all that different. One is “produce paperclips” that gets interpreted as “produce” and the other is “produce paperclips, with so-and-so limits” that gets interpreted as “produce paperclips”, it’s just that in that case the qualifier comes from a separate sentence, but either way the computer is interpreting the end of the command prematurely.
No, the original requires that it be able to understand context but really really want paperclips, and be willing to lie to make them. People actually told it to do something they didn’t want done.
It’s like the difference between a tricky djinn and the ‘ends in gry’ guy.
Right, but the point is, a real-life UFAI isn’t going to have a utility function derived from a human’s verbal command. If it did, you could just order the genie to implement CEV, or shout “I call for my values to be fulfilled!”, and it would work. That’s thinking of AI in terms of sorcery rather than science.
According to my personal knowledge, various means of building AI preference functions might be employed, since research has found that the learning algorithms necessary to acquire knowledge and understanding are quite separate from decision-making algorithms necessary to start paper-clipping. Building an AI might actually consist of “train the learner for a year on corpora from human culture, develop an induced ‘internal programming language’, and only afterwards add a decision-making algorithm with a utility function phrased in terms of the induced concepts, which may as well include ‘goodness’”.
This carries its own problems.
I hope you noticed that your objection and mine are pointing in the same direction.
Don’t anthropomorphize the AGI. Real-world AI designs do have very steadfast goal systems, in some cases they are really incapable of being updated, period.
Think of it this way: the person designing the paperclip producing machine has a life and doesn’t want to be on-call 24⁄7 to come in and reboot the AI every time it gets distracted by assigning higher priority to some other goal, e.g. mopping the floors or watching videos of cats on the internet. So he hard-codes the paperclip-maximizing goal as the one priority the system can’t change.
I think my point still holds—the two examples aren’t different; one could give a similar explanation for the AI that stops at the word “produce” by suggesting that he hardcoded that as well.
Furthermore, you’re missing the context. The standard LW argument is that the AI produces infinite paperclips because the human can’t successfully program the AI to do what he means rather than exactly what he programs into it. If the human explicitly told the AI to prioritize paperclips over everything else, his mistake is not specifying a limit rather than trying to specify one and failing, so it’s not really the same kind of mistake.
Is that different from what I was saying? My memory of the sequences, and from standard AI literature is that of paperclip maximizers as ‘simple’ utility maximizers with hard-coded utility functions. It’s relatively straight-forward to write an AI with a self-modifiable goal system. It is also very easy to write a system where its goals are unchanging. The problem of FAI which EY spends significant time explaining in the sequences is that we have no simple goal that we can program into a steadfast goal-driven system, and result in a moral creature. Nor does it even seem possible to write down such a goal, short of encoding a random sampling of human brains in complete detail.