Most people still have the Bostromiam “paperclipping” analogy for AI risk in their head. In this story, we give the AI some utility function, and the problem is that the AI will naively optimize the utility function (in the Bostromiam example, a company wanting to make more paperclips results in an AI turning the entire world into a paperclip factory).
That is how Bostrom brought up the paperclipping example in Superintelligence but my impression was that the paperclipping example originally conceived by Eliezer prior to the Superintelligence book was NOT about giving an AI a utility function that it then naively optimises. Text from Arbital’s page on paperclip:
The popular press has sometimes distorted the notion of a paperclip maximizer into a story about an AI running a paperclip factory that takes over the universe. (Needless to say, the kind of AI used in a paperclip-manufacturing facility is unlikely to be a frontier research AI.) The concept of a ‘paperclip’ is not that it’s an explicit goal somebody foolishly gave an AI, or even a goal comprehensible in human terms at all. To imagine a central example of a supposed paperclip maximizer, imagine a research-level AI that did not stably preserve what its makers thought was supposed to be its utility function, or an AI with a poorly specified value learning rule, etcetera; such that the configuration of matter that actually happened to max out the AI’s utility function looks like a tiny string of atoms in the shape of a paperclip.
That makes your section talking about “Bostrom/Eliezer analogies” seem a bit odd, since Eliezer, in particular, had been concerned about the problem of “the challenge is getting AIs to do what it says on the tin—to reliably do whatever a human operator tells them to do” very early on.
That is how Bostrom brought up the paperclipping example in Superintelligence but my impression was that the paperclipping example originally conceived by Eliezer prior to the Superintelligence book was NOT about giving an AI a utility function that it then naively optimises. Text from Arbital’s page on paperclip:
That makes your section talking about “Bostrom/Eliezer analogies” seem a bit odd, since Eliezer, in particular, had been concerned about the problem of “the challenge is getting AIs to do what it says on the tin—to reliably do whatever a human operator tells them to do” very early on.