The other side of this is that I would expect my brain to NOTICE it’s actual goals. If my goal is to make paperclips, I will think “I should do this because it makes paperclips”, instead of “I should do this because it makes people happy”.
Secondary goals often feel like primary. Breathing and quenching thirst are means of achieving the primary goal of survival (and procreation), yet they themselves feel like primary. Similarly, a paperclip maximizer may feel compelled to harvest iron without any awareness that it wants to do it in order to produce paperclips.
Bull! I’m quite aware of why I eat, breathe, and drink. Why in the world would a paperclip maximizer not be aware of this?
Unless you assume Paperclippers are just rock-bottom stupid I’d also expect them to eventually notice the correlation between mining iron, smelting it, and shaping it in to a weird semi-spiral design… and the sudden rise in the number of paperclips in the world.
I’m not sure that awareness is needed for paperclip maximizing. For example, one might call fire a very good CO2 maximizer. Actually, I’m not even sure you can apply the word awareness to non-human-like optimizers.
“If we reprogrammed you to count paperclips instead”
This is a conversation about changing my core utility function / goals, and what you are discussing would be far more of an architectural change. I meant, within my architecture (and, I assume, generalizing to most human architectures and most goals), we are, on some level, aware of the actual goal. There are occasional failure states (Alicorn mentioned iron deficiencies register as a craving for ice o.o), but these tend to tie in to low-level failures, not high-order goals like “make a paperclip”, and STILL we tend to manage to identify these and learn how to achieve our actual goals.
Survival and procreation aren’t primary goals in any direct sense. We have urges that have been selected for because they contribute to inclusive genetic fitness, but at the implementation level they don’t seem to be evaluated by their contributions to some sort of unitary probability-of-survival metric; similarly, some actions that do contribute greatly to inclusive genetic fitness (like donating eggs or sperm) are quite rare in practice and go almost wholly unrewarded by our biology. Because of this architecture, we end up with situations where we sate our psychological needs at the expense of the factors that originally selected for them: witness birth control or artificial sweeteners. This is basically the same point Eliezer was making here.
It might be meaningful to treat supergoals as intentional if we were discussing an AI, since in that case there would be a unifying intent behind each fitness metric that actually gets implemented, but even in that case I’d say it’s more accurate to talk about the supergoal as a property not of the AI’s mind but of its implementors. Humans, of course, don’t have that excuse.
Evolved creatures as we know them (at least the ones with complex brains) are reward-center-reward maximizers, which implicitly correlates with being offspring maximizers. (Actual, non-brainy organisms are probably closer to offspring maximizers).
Secondary goals often feel like primary. Breathing and quenching thirst are means of achieving the primary goal of survival (and procreation), yet they themselves feel like primary. Similarly, a paperclip maximizer may feel compelled to harvest iron without any awareness that it wants to do it in order to produce paperclips.
Bull! I’m quite aware of why I eat, breathe, and drink. Why in the world would a paperclip maximizer not be aware of this?
Unless you assume Paperclippers are just rock-bottom stupid I’d also expect them to eventually notice the correlation between mining iron, smelting it, and shaping it in to a weird semi-spiral design… and the sudden rise in the number of paperclips in the world.
I’m not sure that awareness is needed for paperclip maximizing. For example, one might call fire a very good CO2 maximizer. Actually, I’m not even sure you can apply the word awareness to non-human-like optimizers.
“If we reprogrammed you to count paperclips instead”
This is a conversation about changing my core utility function / goals, and what you are discussing would be far more of an architectural change. I meant, within my architecture (and, I assume, generalizing to most human architectures and most goals), we are, on some level, aware of the actual goal. There are occasional failure states (Alicorn mentioned iron deficiencies register as a craving for ice o.o), but these tend to tie in to low-level failures, not high-order goals like “make a paperclip”, and STILL we tend to manage to identify these and learn how to achieve our actual goals.
Survival and procreation aren’t primary goals in any direct sense. We have urges that have been selected for because they contribute to inclusive genetic fitness, but at the implementation level they don’t seem to be evaluated by their contributions to some sort of unitary probability-of-survival metric; similarly, some actions that do contribute greatly to inclusive genetic fitness (like donating eggs or sperm) are quite rare in practice and go almost wholly unrewarded by our biology. Because of this architecture, we end up with situations where we sate our psychological needs at the expense of the factors that originally selected for them: witness birth control or artificial sweeteners. This is basically the same point Eliezer was making here.
It might be meaningful to treat supergoals as intentional if we were discussing an AI, since in that case there would be a unifying intent behind each fitness metric that actually gets implemented, but even in that case I’d say it’s more accurate to talk about the supergoal as a property not of the AI’s mind but of its implementors. Humans, of course, don’t have that excuse.
All good points. I was mostly thinking about an evolved paperclip maximizer, which may or may not be a result of a fooming paperclip-maximizing AI.
Evolved creatures as we know them (at least the ones with complex brains) are reward-center-reward maximizers, which implicitly correlates with being offspring maximizers. (Actual, non-brainy organisms are probably closer to offspring maximizers).
An evolved agent wouldn’t evolve to maximize paper clips.
It could if the environment rewarded paperclips. Admittedly this would require an artificial environment, but that’s hardly impossible.