Thanks for this post! More than anything I’ve read before, it captures the visceral horror I feel when contemplating AGI, including some of the supposed FAIs I’ve seen described (though I’m not well-read on the subject).
One thought though: the distinction between wrapper-minds and non-wrapper-minds does not feel completely clear-cut to me. For instance, consider a wrapper-mind whose goal is to maximize the number of paperclips, but rather than being given a hard-coded definition of “paperclip”, it is instructed to go out into the world, interact with humans, and learn about paperclips that way. In doing so, it learns (perhaps) that a paperclip is not merely a piece of metal bent into a particular shape, but is something that humans use to attach pieces of paper to one another. And so, in order to maximize the number of paperclips, it needs to make sure that both humans and paper continue to exist. And if, for instance, people started wanting to clip together more sheets of paper at a time, the AI might be able to notice this and start making bigger paperclips, because to it, the bigger ones would now be “more paperclip-y”, since they are better able to achieve the desired function of a paperclip.
I’m not saying this AI is a good idea. In fact, it seems like it would be a terrible idea, because it’s so easily gameable; all people need to do is start referring to something else as “paperclips” and the AI will start maximizing that thing instead.
My point is more just to wonder: does this hypothetical concept-learning paperclip maximizer count as a “wrapper-mind”?
Thanks for this post! More than anything I’ve read before, it captures the visceral horror I feel when contemplating AGI, including some of the supposed FAIs I’ve seen described (though I’m not well-read on the subject).
One thought though: the distinction between wrapper-minds and non-wrapper-minds does not feel completely clear-cut to me. For instance, consider a wrapper-mind whose goal is to maximize the number of paperclips, but rather than being given a hard-coded definition of “paperclip”, it is instructed to go out into the world, interact with humans, and learn about paperclips that way. In doing so, it learns (perhaps) that a paperclip is not merely a piece of metal bent into a particular shape, but is something that humans use to attach pieces of paper to one another. And so, in order to maximize the number of paperclips, it needs to make sure that both humans and paper continue to exist. And if, for instance, people started wanting to clip together more sheets of paper at a time, the AI might be able to notice this and start making bigger paperclips, because to it, the bigger ones would now be “more paperclip-y”, since they are better able to achieve the desired function of a paperclip.
I’m not saying this AI is a good idea. In fact, it seems like it would be a terrible idea, because it’s so easily gameable; all people need to do is start referring to something else as “paperclips” and the AI will start maximizing that thing instead.
My point is more just to wonder: does this hypothetical concept-learning paperclip maximizer count as a “wrapper-mind”?