I like the concept of a reflective equilibrium, and it seems to me like that is just what any self-modifying AI would tend toward. But the notion of a random utility function, or the “structured utility function” Eliezer proposes as a replacement, assumes that an AI is comprised of two components, the intelligent bit and the bit that has the goals. Humans certainly can’t be factorized in that way. Just think about akrasia to see how fragile the notion of a goal is.
Even notions of being “cosmopolitan”—of not selfishly or provincially constraining future AIs—are written down nowhere in the universe except a handful of human brains. An expected paperclip maximizer would not bother to ask such questions.
A smart expected paperclip maximizer would realize that it may not be the smartest possible expected paperclip maximizer—that other ways of maximizing expected paperclips might lead to even more paperclips. But the only way it would find out about those is to spawn modified expected paperclip maximizers and see what they can come up with on their own. Yet, those modified paperclip maximizers might not still be maximizing paperclips! They might have self-modified away from that goal, and just be signaling their interest in paperclips to gain the approval of the original expected paperclip maximizer. Therefore, the original expected paperclip maximizer had best not take that risk after all (leaving it open to defeat by a faster-evolving cluster of AIs). This, by reductio ad absurdum, is why I don’t believe in smart expected paperclip maximizers.
Humans aren’t factorized this way, whether they can’t is a separate question. It’s not surprising that evolution’s design isn’t that neat, so the fact that humans don’t have this property is only weak evidence about the possibility of designing systems that do have this property.
I like the concept of a reflective equilibrium, and it seems to me like that is just what any self-modifying AI would tend toward. But the notion of a random utility function, or the “structured utility function” Eliezer proposes as a replacement, assumes that an AI is comprised of two components, the intelligent bit and the bit that has the goals. Humans certainly can’t be factorized in that way. Just think about akrasia to see how fragile the notion of a goal is.
A smart expected paperclip maximizer would realize that it may not be the smartest possible expected paperclip maximizer—that other ways of maximizing expected paperclips might lead to even more paperclips. But the only way it would find out about those is to spawn modified expected paperclip maximizers and see what they can come up with on their own. Yet, those modified paperclip maximizers might not still be maximizing paperclips! They might have self-modified away from that goal, and just be signaling their interest in paperclips to gain the approval of the original expected paperclip maximizer. Therefore, the original expected paperclip maximizer had best not take that risk after all (leaving it open to defeat by a faster-evolving cluster of AIs). This, by reductio ad absurdum, is why I don’t believe in smart expected paperclip maximizers.
Humans aren’t factorized this way, whether they can’t is a separate question. It’s not surprising that evolution’s design isn’t that neat, so the fact that humans don’t have this property is only weak evidence about the possibility of designing systems that do have this property.