Speaking from personal experience, I can say that he’s right.
Explaining how I know this, much less sharing the experience, is more difficult.
The simplest idea I can present is that you probably have multiple utility functions. If you’re buying apples, you’ll evaluate whether you like that type of apple, what the quality of the apple is, and how good the price is. For me, at least, these all FEEL different—a bruised apple doesn’t “feel” overpriced the way a $5 apple at the airport does. Even disliking soft apples feels very different from recognizing a bruised apple, even though they both also go in to a larger basket of “no good”.
What’s more, I can pick apples based on someone ELSE’S utility function, and actually often shop with my roommate’s function in mind (she likes apples a lot more than me, but is also much pickier, as it happens). This feels different from using my own utility function.
The other side of this is that I would expect my brain to NOTICE it’s actual goals. If my goal is to make paperclips, I will think “I should do this because it makes paperclips”, instead of “I should do this because it makes people happy”. My brain doesn’t have a generic “I should do this” emotion, as near as I can tell—it just has ways of signalling that an activity will accomplish my goals.
Thus, it seems reasonable to conclude that my feelings are more a combination of activity + outcome, not some raw platonic ideal. While sex, hiking, and a nice meal all make me “happy”, they still feel completely different—I just lump them in to a larger category of “happiness” for some reason.
I’d strongly suspect you can add make-more-paperclips to that emotional category, but I see absolutely no reason you could make me treat it the same as a nice dinner, because that wouldn’t even make sense.
Speaking from personal experience, I can say that he’s right.
So, you introspect the way that he introspects. Do all humans? Would all humans need to introspect that way for it to do the work that he wants it to do?
Ooh, good call, thank you. I suppose it might be akin to visualization, where it actually varies from person to person. Does anyone here on LessWrong have conflicting anecdotes, though? Does anyone disagree with what I said? If not, it seems like a safe generalization for now, but it’s still useful to remember I’m generalizing from one example :)
Remembering that other people have genuinely alien minds is surprisingly tricky.
The other side of this is that I would expect my brain to NOTICE it’s actual goals. If my goal is to make paperclips, I will think “I should do this because it makes paperclips”, instead of “I should do this because it makes people happy”. My brain doesn’t have a generic “I should do this” emotion, as near as I can tell—it just has ways of signalling that an activity will accomplish my goals.
Remembering that other people have genuinely alien minds is surprisingly tricky.
Other people? I find my own mind quite alien below the thin layer accessible to my introspection. Heck, most of the time I cannot even tell if my introspection lies to me.
When I have a feeling such as ‘doing-whats-right’ there is a positive emotional response associated with it. Immediately I attach semantic content to that emotion: I identify it as being produced by the ‘doing-whats-right’ emotion. How do I do this? I suspect that my brain has done the work to figure out that emotional response X is associated with behavior Y, and just does the work quickly.
But this is maleable. Over time, the emotional response associated with an act can change and this does not necessarily indicate a change in semantic content. I can, for example, give to a charity that I am not convinced is good and I still will often get the ‘doing-whats-right’ emotion even though the semantic content isn’t really there. I can also find new things I value, and occasionally I will acknowledge that I value something before I get positive emotional reinforcement. So in my experience, they aren’t identical.
I strongly suspect that if you reprogrammed my brain to value counting paperclips, it would feel the same as doing what is right. At very least, this would not be inconsistent. I might learn to attach paperclippy instead of good to that emotional state, but it would feel the same.
Because I’m not sure how else to capture a “scale of alien-ness”:
I once wrote a sci-fi race that was a blind, deaf ooze, but extremely intelligent and very sensitive to tactile input. Over the years, and with the help of a few other people, I’ve gotten a fairly good feel for their mindset and how they approach the world.
There’s a distinct subset of humans which I find vastly more puzzling than these guys.
But the real problem is not shape, it is mind. “Humans in funny suits” is a well-known term in literary science-fiction fandom, and it does not refer to something with four limbs that walks upright. An angular creature of pure crystal is a “human in a funny suit” if she thinks remarkably like a human—especially a human of an English-speaking culture of the late-20th/early-21st century.
I don’t watch a lot of ancient movies. When I was watching the movie Psycho (1960) a few years back, I was taken aback by the cultural gap between the Americans on the screen and my America. The buttoned-shirted characters of Psycho are considerably more alien than the vast majority of so-called “aliens” I encounter on TV or the silver screen.
The race was explicitly designed to try and avoid “humans in funny suits”, and have a culture that’s probably more foreign than the 1960s. But I’m only 29, and haven’t traveled outside of English-speaking countries, so take that with a dash of salt!
On a 0-10 scale, with myself at 0, humans in funny suits at 1, and the 1960s at 2, I’d rate my creation as a 4, and a subset of humanity exists in the 4-5 range. Around 5, I have trouble with the idea that there’s coherent intelligent reasoning happening, because the process is just completely lost on me, and I don’t think I’d be able to easily assign anything more than a 5, much less even speculate on what a 10 would look like.
Trying to give a specific answer to “how alien is it” is a lot harder than it seems! :)
If I may make a recommendation, if you are concerned about “alien aliens”, read a few things by Stanislaw Lem. The main theme of Lem’s scifi, I would say, is alien minds, and failure of first contact. “Solaris” is his most famous work (but the adaptation with Clooney is predictably terrible).
Not sure if I’ve read Lem, but I’ll be sure to check it out. I have a love for “truly alien” science fiction, which is why I had to try my hand at making one of my own :)
The race was explicitly designed to try and avoid “humans in funny suits”, and have a culture that’s probably more foreign than the 1960s. But I’m only 29, and haven’t traveled outside of English-speaking countries, so take that with a dash of salt!
Well reading fiction (and non-fiction) for which English speakers of your generation weren’t the target audience is a good way to start compensating.
I’ve got a lot of exposure to “golden age” science fiction and fantasy, so going back a few decades isn’t hard for me. I just don’t get exposed to many other good sources. The “classics” seem to generally fail to capture that foreignness.
If you have recommendations, especially a broader method than just naming a couple authors, I’d love to hear it. Most of my favourite authors have a strong focus on foreign cultures, either exploring them or just having characters from diverse backgrounds.
… it is really sad that I completely forgot that anime and manga isn’t English. I grew up around it, so it’s just a natural part of my culture. Suffice to say, I’ve had a lot of exposure—but not to anything older than I am.
Any recommendations for OLD anime or manga, given I don’t speak/read Japanese? :)
I’ve got a lot of exposure to “golden age” science fiction and fantasy, so going back a few decades isn’t hard for me.
Which time period do you mean by this? “Golden age of science fiction” typically refers to the 1940′s and 1950′s, “golden age of fantasy” to the late 1970′s and early 1980′s. If you mean the latter time period, read stuff from the former as a start. Also try going back at least a century to the foundational fantasy authors, e.g., Edgar Rice Burroughs, William Morris’s The Well at the World’s End. Go even further back to things like Treasure Island, or The Three Musketeers. Or even further back to the days when people believed the stuff in their “fantasy” could actually happen. Read Dante’s Divine Comedy, Thomas Moore’s Utopia, an actual chivalric romance (I haven’t read any so I can’t give recommendations).
A good rule of thumb is that you should experience values dissonance while reading them. A culture whose values don’t make you feel uncomfortable isn’t truly alien. Also for this reason, avoid modern adaptations as these tend to do their best clean up the politically incorrect parts and otherwise modernize the worldview.
The other side of this is that I would expect my brain to NOTICE it’s actual goals. If my goal is to make paperclips, I will think “I should do this because it makes paperclips”, instead of “I should do this because it makes people happy”.
Secondary goals often feel like primary. Breathing and quenching thirst are means of achieving the primary goal of survival (and procreation), yet they themselves feel like primary. Similarly, a paperclip maximizer may feel compelled to harvest iron without any awareness that it wants to do it in order to produce paperclips.
Bull! I’m quite aware of why I eat, breathe, and drink. Why in the world would a paperclip maximizer not be aware of this?
Unless you assume Paperclippers are just rock-bottom stupid I’d also expect them to eventually notice the correlation between mining iron, smelting it, and shaping it in to a weird semi-spiral design… and the sudden rise in the number of paperclips in the world.
I’m not sure that awareness is needed for paperclip maximizing. For example, one might call fire a very good CO2 maximizer. Actually, I’m not even sure you can apply the word awareness to non-human-like optimizers.
“If we reprogrammed you to count paperclips instead”
This is a conversation about changing my core utility function / goals, and what you are discussing would be far more of an architectural change. I meant, within my architecture (and, I assume, generalizing to most human architectures and most goals), we are, on some level, aware of the actual goal. There are occasional failure states (Alicorn mentioned iron deficiencies register as a craving for ice o.o), but these tend to tie in to low-level failures, not high-order goals like “make a paperclip”, and STILL we tend to manage to identify these and learn how to achieve our actual goals.
Survival and procreation aren’t primary goals in any direct sense. We have urges that have been selected for because they contribute to inclusive genetic fitness, but at the implementation level they don’t seem to be evaluated by their contributions to some sort of unitary probability-of-survival metric; similarly, some actions that do contribute greatly to inclusive genetic fitness (like donating eggs or sperm) are quite rare in practice and go almost wholly unrewarded by our biology. Because of this architecture, we end up with situations where we sate our psychological needs at the expense of the factors that originally selected for them: witness birth control or artificial sweeteners. This is basically the same point Eliezer was making here.
It might be meaningful to treat supergoals as intentional if we were discussing an AI, since in that case there would be a unifying intent behind each fitness metric that actually gets implemented, but even in that case I’d say it’s more accurate to talk about the supergoal as a property not of the AI’s mind but of its implementors. Humans, of course, don’t have that excuse.
Evolved creatures as we know them (at least the ones with complex brains) are reward-center-reward maximizers, which implicitly correlates with being offspring maximizers. (Actual, non-brainy organisms are probably closer to offspring maximizers).
Speaking from personal experience, I can say that he’s right.
Explaining how I know this, much less sharing the experience, is more difficult.
The simplest idea I can present is that you probably have multiple utility functions. If you’re buying apples, you’ll evaluate whether you like that type of apple, what the quality of the apple is, and how good the price is. For me, at least, these all FEEL different—a bruised apple doesn’t “feel” overpriced the way a $5 apple at the airport does. Even disliking soft apples feels very different from recognizing a bruised apple, even though they both also go in to a larger basket of “no good”.
What’s more, I can pick apples based on someone ELSE’S utility function, and actually often shop with my roommate’s function in mind (she likes apples a lot more than me, but is also much pickier, as it happens). This feels different from using my own utility function.
The other side of this is that I would expect my brain to NOTICE it’s actual goals. If my goal is to make paperclips, I will think “I should do this because it makes paperclips”, instead of “I should do this because it makes people happy”. My brain doesn’t have a generic “I should do this” emotion, as near as I can tell—it just has ways of signalling that an activity will accomplish my goals.
Thus, it seems reasonable to conclude that my feelings are more a combination of activity + outcome, not some raw platonic ideal. While sex, hiking, and a nice meal all make me “happy”, they still feel completely different—I just lump them in to a larger category of “happiness” for some reason.
I’d strongly suspect you can add make-more-paperclips to that emotional category, but I see absolutely no reason you could make me treat it the same as a nice dinner, because that wouldn’t even make sense.
So, you introspect the way that he introspects. Do all humans? Would all humans need to introspect that way for it to do the work that he wants it to do?
Ooh, good call, thank you. I suppose it might be akin to visualization, where it actually varies from person to person. Does anyone here on LessWrong have conflicting anecdotes, though? Does anyone disagree with what I said? If not, it seems like a safe generalization for now, but it’s still useful to remember I’m generalizing from one example :)
Remembering that other people have genuinely alien minds is surprisingly tricky.
Iron deficiency feels like wanting ice. For clever, verbal reasons. Not being iron deficient doesn’t feel like anything. My brain did not notice that it was trying to get iron—it didn’t even notice it was trying to get ice, it made up reasons according to which ice was an instrumental value for some terminal goal or other.
Other people? I find my own mind quite alien below the thin layer accessible to my introspection. Heck, most of the time I cannot even tell if my introspection lies to me.
I think I have a different introspection here.
When I have a feeling such as ‘doing-whats-right’ there is a positive emotional response associated with it. Immediately I attach semantic content to that emotion: I identify it as being produced by the ‘doing-whats-right’ emotion. How do I do this? I suspect that my brain has done the work to figure out that emotional response X is associated with behavior Y, and just does the work quickly.
But this is maleable. Over time, the emotional response associated with an act can change and this does not necessarily indicate a change in semantic content. I can, for example, give to a charity that I am not convinced is good and I still will often get the ‘doing-whats-right’ emotion even though the semantic content isn’t really there. I can also find new things I value, and occasionally I will acknowledge that I value something before I get positive emotional reinforcement. So in my experience, they aren’t identical.
I strongly suspect that if you reprogrammed my brain to value counting paperclips, it would feel the same as doing what is right. At very least, this would not be inconsistent. I might learn to attach paperclippy instead of good to that emotional state, but it would feel the same.
… they do? For what values of “alien”?
Because I’m not sure how else to capture a “scale of alien-ness”:
I once wrote a sci-fi race that was a blind, deaf ooze, but extremely intelligent and very sensitive to tactile input. Over the years, and with the help of a few other people, I’ve gotten a fairly good feel for their mindset and how they approach the world.
There’s a distinct subset of humans which I find vastly more puzzling than these guys.
From Humans in Funny Suits:
The race was explicitly designed to try and avoid “humans in funny suits”, and have a culture that’s probably more foreign than the 1960s. But I’m only 29, and haven’t traveled outside of English-speaking countries, so take that with a dash of salt!
On a 0-10 scale, with myself at 0, humans in funny suits at 1, and the 1960s at 2, I’d rate my creation as a 4, and a subset of humanity exists in the 4-5 range. Around 5, I have trouble with the idea that there’s coherent intelligent reasoning happening, because the process is just completely lost on me, and I don’t think I’d be able to easily assign anything more than a 5, much less even speculate on what a 10 would look like.
Trying to give a specific answer to “how alien is it” is a lot harder than it seems! :)
If I may make a recommendation, if you are concerned about “alien aliens”, read a few things by Stanislaw Lem. The main theme of Lem’s scifi, I would say, is alien minds, and failure of first contact. “Solaris” is his most famous work (but the adaptation with Clooney is predictably terrible).
Not sure if I’ve read Lem, but I’ll be sure to check it out. I have a love for “truly alien” science fiction, which is why I had to try my hand at making one of my own :)
Well reading fiction (and non-fiction) for which English speakers of your generation weren’t the target audience is a good way to start compensating.
I’ve got a lot of exposure to “golden age” science fiction and fantasy, so going back a few decades isn’t hard for me. I just don’t get exposed to many other good sources. The “classics” seem to generally fail to capture that foreignness.
If you have recommendations, especially a broader method than just naming a couple authors, I’d love to hear it. Most of my favourite authors have a strong focus on foreign cultures, either exploring them or just having characters from diverse backgrounds.
Anime&Manga, particularly the older stuff is a decent source.
… it is really sad that I completely forgot that anime and manga isn’t English. I grew up around it, so it’s just a natural part of my culture. Suffice to say, I’ve had a lot of exposure—but not to anything older than I am.
Any recommendations for OLD anime or manga, given I don’t speak/read Japanese? :)
You’re probably best of asking on a manga/forum, but Barefoot Gen is a good, and depressing, start.
Which time period do you mean by this? “Golden age of science fiction” typically refers to the 1940′s and 1950′s, “golden age of fantasy” to the late 1970′s and early 1980′s. If you mean the latter time period, read stuff from the former as a start. Also try going back at least a century to the foundational fantasy authors, e.g., Edgar Rice Burroughs, William Morris’s The Well at the World’s End. Go even further back to things like Treasure Island, or The Three Musketeers. Or even further back to the days when people believed the stuff in their “fantasy” could actually happen. Read Dante’s Divine Comedy, Thomas Moore’s Utopia, an actual chivalric romance (I haven’t read any so I can’t give recommendations).
A good rule of thumb is that you should experience values dissonance while reading them. A culture whose values don’t make you feel uncomfortable isn’t truly alien. Also for this reason, avoid modern adaptations as these tend to do their best clean up the politically incorrect parts and otherwise modernize the worldview.
I’m intrigued. Do you have a link?
Sadly not. I really should do a proper write-up, but right now they’re mostly stored in the head of me and their co-creator.
Secondary goals often feel like primary. Breathing and quenching thirst are means of achieving the primary goal of survival (and procreation), yet they themselves feel like primary. Similarly, a paperclip maximizer may feel compelled to harvest iron without any awareness that it wants to do it in order to produce paperclips.
Bull! I’m quite aware of why I eat, breathe, and drink. Why in the world would a paperclip maximizer not be aware of this?
Unless you assume Paperclippers are just rock-bottom stupid I’d also expect them to eventually notice the correlation between mining iron, smelting it, and shaping it in to a weird semi-spiral design… and the sudden rise in the number of paperclips in the world.
I’m not sure that awareness is needed for paperclip maximizing. For example, one might call fire a very good CO2 maximizer. Actually, I’m not even sure you can apply the word awareness to non-human-like optimizers.
“If we reprogrammed you to count paperclips instead”
This is a conversation about changing my core utility function / goals, and what you are discussing would be far more of an architectural change. I meant, within my architecture (and, I assume, generalizing to most human architectures and most goals), we are, on some level, aware of the actual goal. There are occasional failure states (Alicorn mentioned iron deficiencies register as a craving for ice o.o), but these tend to tie in to low-level failures, not high-order goals like “make a paperclip”, and STILL we tend to manage to identify these and learn how to achieve our actual goals.
Survival and procreation aren’t primary goals in any direct sense. We have urges that have been selected for because they contribute to inclusive genetic fitness, but at the implementation level they don’t seem to be evaluated by their contributions to some sort of unitary probability-of-survival metric; similarly, some actions that do contribute greatly to inclusive genetic fitness (like donating eggs or sperm) are quite rare in practice and go almost wholly unrewarded by our biology. Because of this architecture, we end up with situations where we sate our psychological needs at the expense of the factors that originally selected for them: witness birth control or artificial sweeteners. This is basically the same point Eliezer was making here.
It might be meaningful to treat supergoals as intentional if we were discussing an AI, since in that case there would be a unifying intent behind each fitness metric that actually gets implemented, but even in that case I’d say it’s more accurate to talk about the supergoal as a property not of the AI’s mind but of its implementors. Humans, of course, don’t have that excuse.
All good points. I was mostly thinking about an evolved paperclip maximizer, which may or may not be a result of a fooming paperclip-maximizing AI.
Evolved creatures as we know them (at least the ones with complex brains) are reward-center-reward maximizers, which implicitly correlates with being offspring maximizers. (Actual, non-brainy organisms are probably closer to offspring maximizers).
An evolved agent wouldn’t evolve to maximize paper clips.
It could if the environment rewarded paperclips. Admittedly this would require an artificial environment, but that’s hardly impossible.