If we reprogrammed you to count paperclips instead, it wouldn’t feel like different things having the same kind of motivation behind it. It wouldn’t feel like doing-what’s-right for a different guess about what’s right. It would feel like doing-what-leads-to-paperclips.
Exactly. I mean, you could probably make it have its own quale, but you could also make it not, and I don’t see why that would be in question as long as we’re postulating brain-reprogramming powers.
Assume the subject of reprogramming is an existing human being, otherwise minimally altered by this reprogramming, i.e., we don’t do anything that isn’t necessary to switch their motivation to paperclips. So unless you do something gratuitiously non-minimal like moving the whole decision-action system out of the range of introspective modeling, or cutting way down on the detail level of introspective modeling, or changing the empathic architecture for modeling hypothetical selves, the new person will experience themselves as having ineffable ‘qualia’ associated with the motivation to produce paperclips.
The only way to make it seem to them like their motivational quales hadn’t changed over time would be to mess with the encoding of their previous memories of motivation, presumably in a structure-destroying way since the stored data and their introspectively exposed surfaces will not be naturally isomorphic. If you carry out the change to paperclip-motivation in the obvious way, cognitive comparisions of the retrieved memories to current thoughts will return ‘unequal ineffable quales’, and if the memories are visualized in different modalities from current thoughts, ‘incomparable ineffable quales’.
Doing-what-leads-to-paperclips will also be a much simpler ‘quale’, both from the outside perspective looking at the complexity of cognitive data, and in terms of the internal experience of complexity—unless you pack an awful lot of detail into the question of what constitutes a more preferred paperclip. Otherwise, compared to the old days when you thought about justice and fairness, introspection will show that less questioning and uncertainty is involved, and that there are fewer points of variation among the motivational thought-quales being considered.
I suppose you could put in some extra work to make the previous motivations map in cognitively comparable ways along as many joints as possible, and try to edit previous memories without destroying their structure so that they can be visualized in a least common modality with current experiences. But even if you did, memories of the previous quales for rightness-motivation would appear as different in retrospect when compared to current quales for paperclip-motivation as a memory of a 3D greyscale forest landscape vs. a current experience of a 2D red-and-green fractal, even if they’re both articulated in the visual sensory modality and your modal workspace allows you to search for, focus on, and compare commonly ‘experienced’ shapes between them.
I think you and Alicorn may be talking past each other somewhat.
Throughout my life, it seems that what I morally value has varied more than what rightness feels like—just as it seems that what I consider status-raising has changed more than what rising in status feels like, and what I find physically pleasurable has changed more than what physical pleasures feel like. It’s possible that the things my whole person is optimizing for have not changed at all, that my subjective feelings are a direct reflection of this, and that my evaluation of a change of content is merely a change in my causal model of the production of the desiderata (I thought voting for Smith would lower unemployment, but now I think voting for Jones would, etc.) But it seems more plausible to me that
1) the whole me is optimizing for various things, and these things change over time, 2) and that the conscious me is getting information inputs which it can group together by family resemblance, and which can reinforce or disincentivize its behavior.
Imagine a ship which is governed by an anarchic assembly beneath board and captained by an employee of theirs whom they motivate through in-kind bonuses. So the assembly at one moment might be looking for buried treasure, which they think is in such-and-such a place, and so they send her baskets of fresh apples when she’s steering in that direction and baskets of stinky rotten apples when she’s steering in the wrong. For other goals (refueling, not crashing into reefs) they send her excellent or tedious movies and gorgeous or ugly cabana boys. The captain doesn’t even have direct access to what the apples or whatever are motivating her to do; although she can piece it together. She might even start thinking of apples as irreducibly connected to treasure. But if the assembly decided that they wanted to look for ports of call instead of treasure, I don’t see why in principle they couldn’t start sending her apples in order to do so. And if they did, I think her first response would be, if she was verbally asked, that the treasure—or whatever the dubloons constituting the treasure ultimately represent in terms of the desiderata of the assembly—had moved to the ports of call. This might be a correct inference—perhaps the assembly wants the treasure for money and now they think that comes better from heading to ports of call—but it hardly seems to be a necessarily correct one.
If I met two vampires, and one said his desire to drink blood was mediated through hunger (and that he no longer felt hunger for food, or lust) and another said her desire to drink blood was mediated through lust (and that she no longer felt lust for sex, or hunger) then I do think—presuming they were both once human, experiencing lust and hunger like me—they’ve told me something that allows me to distinguish their experiences from one another, even though they both desire blood and not food or sex.
They may or may not be able to explain to what it is like to be a bat.
Unless I’m inserting a further layer of misunderstanding your position seems to be curiously disjunctivist. I or you or Alicorn or all of us may be making bad inferences in taking “feels like” to mean “reminds one of the sort of experience that brings to mind...” (“I feel like I got mauled by a bear,” says someone not just and maybe never mauled by a bear) or “constituting an experience of” (“what an algorithm feels like from the inside”) when the other is intended. This seems to be a pretty easy elision to make—consider all the philosophers who say things like “well, it feels like we have libertarian free will...”
This comment expands how you’d go about reprogramming someone in this way with another layer of granularity, which is certainly interesting on its own merits, but it doesn’t strongly support your assertion about what it would feel like to be that someone. What makes you think this is how qualia work? Have you been performing sinister experiments in your basement? Do you have magic counterfactual-luminosity-powers?
I think Eliezer is simply suggesting that qualia don’t in fact exist in a vacuum. Green feels the way it does partly because it’s the color of chlorophyll. In a universe where plants had picked a different color for chlorophyll (melanophyll, say), with everything else (per impossibile) held constant, we would associate an at least slightly different quale with green and with black, because part of how colors feel is that they subtly remind us of the things that are most often colored that way. Similarly, part of how ‘goodness’ feels is that it imperceptibly reminds us of the extension of good; if that extension were dramatically different, then the feeling would (barring any radical redesigns of how associative thought works) be different too. In a universe where the smallest birds were ten feet tall, thinking about ‘birdiness’ would involve a different quale for the same reason.
It sounds to me like you don’t think the answer had anything to do with the question. But to think that, you’d pretty much have to discard both the functionalist and physicalist theories of mind, and go full dualist/neutral monist; wouldn’t you?
I think I’ll go with this as my reply—“Well, imagine that you lived in a monist universe—things would pretty much have to work that way, wouldn’t they?”
Possibly (this is total speculation) Eliezer is talking about the feeling of one’s entire motivational system (or some large part of it), while you’re talking about the feeling of some much narrower system that you identify as computing morality; so his conception of a Clippified human wouldn’t share your terminal-ish drives to eat tasty food, be near friends, etc., and the qualia that correspond to wanting those things.
The Clippified human categorizes foods into a similar metric of similarity—still believes that fish tastes more like steak than like chocolate—but of course is not motivated to eat except insofar as staying alive helps to make more paperclips. They have taste, but not tastiness. Actually that might make a surprisingly good metaphor for a lot of the difficulty that some people have with comprehending how Clippy can understand your pain and not care—maybe I’ll try it on the other end of that Facebook conversation.
The metaphor seems like it could lose most of its effectiveness on people who have never applied the outside view to how taste and tastiness feel from inside—they’ve never realized that chocolate tastes good because their brain fires “good taste” when it perceives the experience “chocolate taste”. The obvious resulting cognitive dissonance (from “tastes bad for others”) predictions match my observations, so I suspect this would be common among non-rationalists. If the Facebook conversation you mention is with people who haven’t crossed that inferential gap yet, it might prove not that useful.
Consider Bob. Bob, like most unreflective people, settles many moral questions by “am I disgusted by it?” Bob is disgusted by, among other things, feces, rotten fruit, corpses, maggots, and men kissing men. Internally, it feels to Bob like the disgust he feels at one of those stimuli is the same as the disgust he feels at the other stimuli, and brain scans show that they all activate the insula in basically the same way.
Bob goes through aversion therapy (or some other method) and eventually his insula no longer activates when he sees men kissing men.
When Bob remembers his previous reaction to that stimuli, I imagine he would remember being disgusted, but not be disgusted when he remembers the stimuli. His positions on, say, same-sex marriage or the acceptability of gay relationships have changed, and he is aware that they have changed.
Do you think this example agrees with your account? If/where it disagrees, why do you prefer your account?
I think this is really a sorites problem. If you change what’s delicious only slightly, then deliciousness itself seems to be unaltered. But if you change it radically — say, if circuits similar to your old gustatory ones now trigger when and only when you see a bright light — then it seems plausible that the experience itself will be at least somewhat changed, because ‘how things feel’ is affected by our whole web of perceptual and conceptual associations. There isn’t necessarily any sharp line where a change in deliciousness itself suddenly becomes perceptible; but it’s nevertheless the case that the overall extension of ‘delicious’ (like ‘disgusting’ and ‘moral’) has some effect on how we experience deliciousness. E.g., deliciousness feels more foodish than lightish.
it seems plausible that the experience itself will be at least somewhat changed, because ‘how things feel’ is affected by our whole web of perceptual and conceptual associations.
When I look at the problem introspectively, I can see that as a sensible guess. It doesn’t seem like a sensible guess when I look at it from a neurological perspective. If the activation of the insula is disgust, then the claim that outputs of the insula will have a different introspective flavor when you rewire the inputs of the insula seems doubtful. Sure, it could be the case, but why?
When we hypnotize people to make them disgusted by benign things, I haven’t seen any mention that the disgust has a different introspective flavor, and people seem to reason about that disgust in the exact same way that they reason about the disgust they had before.
This seems like the claim that rewiring yourself leads to something like synesthesia, and that just seems like an odd and unsupported claim to me.
Certain patterns of behavior at the insula correlate with disgust. But we don’t know whether they’re sufficient for disgust, nor do we know which modifications within or outside of the insula change the conscious character of disgust. There are lots of problems with identity claims at this stage, so I’ll just raise one: For all we know, activation patterns in a given brain region correlate with disgust because disgust is experienced when that brain region inhibits another part of the brain; an experience could consist, in context, in the absence of a certain kind of brain activity.
When we hypnotize people to make them disgusted by benign things, I haven’t seen any mention that the disgust has a different introspective flavor
Hypnosis data is especially difficult to evaluate, because it isn’t clear (a) how reliable people’s self-reports about introspection are while under hypnosis; nor (b) how reliable people’s memories-of-hypnosis are afterward. Some ‘dissociative’ people even give contradictory phenomenological reports while under hypnosis.
That said, if you know of any studies suggesting that the disgust doesn’t have at all a different character, I’d be very interested to see them!
If you think my claim isn’t modest and fairly obvious, then it might be that you aren’t understanding my claim. Redness feels at least a little bit bloodish. Greenness feels at least a little bit foresty. If we made a clone who sees evergreen forests as everred and blood as green, then their experience of greenness and redness would be partly the same, but it wouldn’t be completely the same, because that overtone of bloodiness would remain in the background of a variety of green experiences, and that woodsy overtone would remain in the background of a variety of red experiences.
If you think my claim isn’t modest and fairly obvious, then it might be that you aren’t understanding my claim.
I’m differentiating between “red evokes blood” and “red feels bloody,” because those seem like different things to me. The former deals with memory and association, and the second deals with introspection, and so I agree that the same introspective sensation could evoke very different memories.
The dynamics of introspective sensations could plausibly vary between people, and so I’m reluctant to discuss it extensively except in the context of object-level comparisons.
I’m not sure exactly what you mean by “red evokes blood.” I agree that “red feels bloody” is intuitively distinct from “I tend to think explicitly about blood when I start thinking about redness,” though the two are causally related. Certain shades of green to me feel fresh, clean, ‘naturey;’ certain shades of red to me feel violent, hot, glaring; certain shades of blue feel cool; etc. My suggestion is that these qualia, which are part of the feeling of the colors themselves for most humans, would be experientially different even when decontextualized if we’d gone through life perceiving forests as blue, oceans as red, campfires as green, etc. By analogy, the feeling of ‘virtue’ may be partly independent of which things we think of under the concept ‘virtuous;’ but it isn’t completely independent of those things.
Certain shades of green to me feel fresh, clean, ‘naturey;’ certain shades of red to me feel violent, hot, glaring; certain shades of blue feel cool; etc.
I am aware that many humans have this sort of classification of colors, and have learned it because of its value in communication, but as far as I can tell this isn’t a significant part of my mental experience. A dark green might make it easier for me to think of leaves or forests, but I don’t have any experiences that I would describe as feeling ‘naturey’. If oceans and forests swapped colors, I imagine that seeing the same dark green would make it easier for me to think of waves and water, but I think my introspective experience would be the same.
If I can simplify your claim a bit, it sounds like if both oceans and forests were dark green, then seeing dark green would make you think of leaves and waves / feel associated feelings, and that this ensemble would be different from your current sensation of ocean blue or forest green. It seems sensible to me that the ensembles are different because they have different elements.
I’m happier with modeling that as perceptual bleedover- because forests and green are heavily linked, even forests that aren’t green are linked to green, and greens that aren’t on leaves are linked with forests- than I am modeling that as an atom of consciousness- the sensation of foresty greens- but if your purposes are different, a different model may be more suitable.
Part of the problem may be that I’m not so sure I have a distinct, empirically robust idea of an ‘atom of consciousness.’ I took for granted your distinction between ‘evoking blood’ and ‘feeling bloody,’ but in practice these two ideas blend together a great deal. Some ideas—phonological and musical ones, for example—are instantiated in memory by certain temporal sequences and patterns of association. From my armchair, I’m not sure how much my idea of green (or goodness, or clippiness) is what it is in virtue of its temporal and associative dispositions, too. And I don’t know if Eliezer is any less confused than I.
It wouldn’t surprise me if the sensation of disgust has some variation from one person to another, and even for the same person, from one object to another.
I think this is easier because disgust is relatively arbitrary to begin with, in that it seems to implement a function over the world-you relation (roughly, things that are bad for you to eat/be near). We wouldn’t expect that relation to have much coherence to begin with, so there’d be not much loss of coherence from modifying it—though, arguably, the same thing could be said for most qualia—elegance is kind of the odd one out.
I wouldn’t be all that suprised if the easiest way to get a human maximizing papperclips was to make it believe paperclips had epiphenomenal consciousnesses experiencing astronomical amounts of pleasure.
edit: or you could just give them a false memory of god telling them to do it.
I wouldn’t be all that suprised if the easiest way to get a human maximizing papperclips was to make it believe paperclips had epiphenomenal consciousnesses
The Enrichment Center would like to remind you that the Paperclip cannot speak. In the event that the Paperclip does speak, the Enrichment Center urges you to disregard its advice.
Wouldn’t it be easier to have the programee remember themself as misunderstanding morality—like a reformed racist who previously preferred options that harmed minorities. I know when I gain more insight into my ethics I remember making decisions that, in retrospect, are incomprehensible (unless I deliberately keep in mind how I thought I should act.)
Cached thoughts regularly supersede actual moral thinking, like all forms of thinking, and I am capable of remembering this experience. Am I misunderstanding your comment?
I have no problem with this passage. But it does not seem obviously impossible to create a device that stimulates that-which-feels-rightness proportionally to (its estimate of) the clippiness of the universe—it’s just a very peculiar kind of wireheading.
As you point out, it’d be obvious, on reflection, that one’s sense of rightness has changed; but that doesn’t necessarily make it a different qualia, any more than having your eyes opened to the suffering of (group) changes your experience of (in)justice qua (in)justice.
Although I think your point here is plausible, I don’t think it fits in a post where you are talking about the logicalness of morality. This qualia problem is physical; whether your feeling changes when the structure of some part of your decision system changes depends on your implementation.
Maybe your background understanding of neurology is enough for you to be somewhat confident stating this feeling/logical-function relation for humans. But mine is not and, although I could separate your metaethical explanations from your physical claims when reading the post, I think it would be better off without the latter.
Speaking from personal experience, I can say that he’s right.
Explaining how I know this, much less sharing the experience, is more difficult.
The simplest idea I can present is that you probably have multiple utility functions. If you’re buying apples, you’ll evaluate whether you like that type of apple, what the quality of the apple is, and how good the price is. For me, at least, these all FEEL different—a bruised apple doesn’t “feel” overpriced the way a $5 apple at the airport does. Even disliking soft apples feels very different from recognizing a bruised apple, even though they both also go in to a larger basket of “no good”.
What’s more, I can pick apples based on someone ELSE’S utility function, and actually often shop with my roommate’s function in mind (she likes apples a lot more than me, but is also much pickier, as it happens). This feels different from using my own utility function.
The other side of this is that I would expect my brain to NOTICE it’s actual goals. If my goal is to make paperclips, I will think “I should do this because it makes paperclips”, instead of “I should do this because it makes people happy”. My brain doesn’t have a generic “I should do this” emotion, as near as I can tell—it just has ways of signalling that an activity will accomplish my goals.
Thus, it seems reasonable to conclude that my feelings are more a combination of activity + outcome, not some raw platonic ideal. While sex, hiking, and a nice meal all make me “happy”, they still feel completely different—I just lump them in to a larger category of “happiness” for some reason.
I’d strongly suspect you can add make-more-paperclips to that emotional category, but I see absolutely no reason you could make me treat it the same as a nice dinner, because that wouldn’t even make sense.
Speaking from personal experience, I can say that he’s right.
So, you introspect the way that he introspects. Do all humans? Would all humans need to introspect that way for it to do the work that he wants it to do?
Ooh, good call, thank you. I suppose it might be akin to visualization, where it actually varies from person to person. Does anyone here on LessWrong have conflicting anecdotes, though? Does anyone disagree with what I said? If not, it seems like a safe generalization for now, but it’s still useful to remember I’m generalizing from one example :)
Remembering that other people have genuinely alien minds is surprisingly tricky.
The other side of this is that I would expect my brain to NOTICE it’s actual goals. If my goal is to make paperclips, I will think “I should do this because it makes paperclips”, instead of “I should do this because it makes people happy”. My brain doesn’t have a generic “I should do this” emotion, as near as I can tell—it just has ways of signalling that an activity will accomplish my goals.
Remembering that other people have genuinely alien minds is surprisingly tricky.
Other people? I find my own mind quite alien below the thin layer accessible to my introspection. Heck, most of the time I cannot even tell if my introspection lies to me.
When I have a feeling such as ‘doing-whats-right’ there is a positive emotional response associated with it. Immediately I attach semantic content to that emotion: I identify it as being produced by the ‘doing-whats-right’ emotion. How do I do this? I suspect that my brain has done the work to figure out that emotional response X is associated with behavior Y, and just does the work quickly.
But this is maleable. Over time, the emotional response associated with an act can change and this does not necessarily indicate a change in semantic content. I can, for example, give to a charity that I am not convinced is good and I still will often get the ‘doing-whats-right’ emotion even though the semantic content isn’t really there. I can also find new things I value, and occasionally I will acknowledge that I value something before I get positive emotional reinforcement. So in my experience, they aren’t identical.
I strongly suspect that if you reprogrammed my brain to value counting paperclips, it would feel the same as doing what is right. At very least, this would not be inconsistent. I might learn to attach paperclippy instead of good to that emotional state, but it would feel the same.
Because I’m not sure how else to capture a “scale of alien-ness”:
I once wrote a sci-fi race that was a blind, deaf ooze, but extremely intelligent and very sensitive to tactile input. Over the years, and with the help of a few other people, I’ve gotten a fairly good feel for their mindset and how they approach the world.
There’s a distinct subset of humans which I find vastly more puzzling than these guys.
But the real problem is not shape, it is mind. “Humans in funny suits” is a well-known term in literary science-fiction fandom, and it does not refer to something with four limbs that walks upright. An angular creature of pure crystal is a “human in a funny suit” if she thinks remarkably like a human—especially a human of an English-speaking culture of the late-20th/early-21st century.
I don’t watch a lot of ancient movies. When I was watching the movie Psycho (1960) a few years back, I was taken aback by the cultural gap between the Americans on the screen and my America. The buttoned-shirted characters of Psycho are considerably more alien than the vast majority of so-called “aliens” I encounter on TV or the silver screen.
The race was explicitly designed to try and avoid “humans in funny suits”, and have a culture that’s probably more foreign than the 1960s. But I’m only 29, and haven’t traveled outside of English-speaking countries, so take that with a dash of salt!
On a 0-10 scale, with myself at 0, humans in funny suits at 1, and the 1960s at 2, I’d rate my creation as a 4, and a subset of humanity exists in the 4-5 range. Around 5, I have trouble with the idea that there’s coherent intelligent reasoning happening, because the process is just completely lost on me, and I don’t think I’d be able to easily assign anything more than a 5, much less even speculate on what a 10 would look like.
Trying to give a specific answer to “how alien is it” is a lot harder than it seems! :)
If I may make a recommendation, if you are concerned about “alien aliens”, read a few things by Stanislaw Lem. The main theme of Lem’s scifi, I would say, is alien minds, and failure of first contact. “Solaris” is his most famous work (but the adaptation with Clooney is predictably terrible).
Not sure if I’ve read Lem, but I’ll be sure to check it out. I have a love for “truly alien” science fiction, which is why I had to try my hand at making one of my own :)
The race was explicitly designed to try and avoid “humans in funny suits”, and have a culture that’s probably more foreign than the 1960s. But I’m only 29, and haven’t traveled outside of English-speaking countries, so take that with a dash of salt!
Well reading fiction (and non-fiction) for which English speakers of your generation weren’t the target audience is a good way to start compensating.
I’ve got a lot of exposure to “golden age” science fiction and fantasy, so going back a few decades isn’t hard for me. I just don’t get exposed to many other good sources. The “classics” seem to generally fail to capture that foreignness.
If you have recommendations, especially a broader method than just naming a couple authors, I’d love to hear it. Most of my favourite authors have a strong focus on foreign cultures, either exploring them or just having characters from diverse backgrounds.
… it is really sad that I completely forgot that anime and manga isn’t English. I grew up around it, so it’s just a natural part of my culture. Suffice to say, I’ve had a lot of exposure—but not to anything older than I am.
Any recommendations for OLD anime or manga, given I don’t speak/read Japanese? :)
I’ve got a lot of exposure to “golden age” science fiction and fantasy, so going back a few decades isn’t hard for me.
Which time period do you mean by this? “Golden age of science fiction” typically refers to the 1940′s and 1950′s, “golden age of fantasy” to the late 1970′s and early 1980′s. If you mean the latter time period, read stuff from the former as a start. Also try going back at least a century to the foundational fantasy authors, e.g., Edgar Rice Burroughs, William Morris’s The Well at the World’s End. Go even further back to things like Treasure Island, or The Three Musketeers. Or even further back to the days when people believed the stuff in their “fantasy” could actually happen. Read Dante’s Divine Comedy, Thomas Moore’s Utopia, an actual chivalric romance (I haven’t read any so I can’t give recommendations).
A good rule of thumb is that you should experience values dissonance while reading them. A culture whose values don’t make you feel uncomfortable isn’t truly alien. Also for this reason, avoid modern adaptations as these tend to do their best clean up the politically incorrect parts and otherwise modernize the worldview.
The other side of this is that I would expect my brain to NOTICE it’s actual goals. If my goal is to make paperclips, I will think “I should do this because it makes paperclips”, instead of “I should do this because it makes people happy”.
Secondary goals often feel like primary. Breathing and quenching thirst are means of achieving the primary goal of survival (and procreation), yet they themselves feel like primary. Similarly, a paperclip maximizer may feel compelled to harvest iron without any awareness that it wants to do it in order to produce paperclips.
Bull! I’m quite aware of why I eat, breathe, and drink. Why in the world would a paperclip maximizer not be aware of this?
Unless you assume Paperclippers are just rock-bottom stupid I’d also expect them to eventually notice the correlation between mining iron, smelting it, and shaping it in to a weird semi-spiral design… and the sudden rise in the number of paperclips in the world.
I’m not sure that awareness is needed for paperclip maximizing. For example, one might call fire a very good CO2 maximizer. Actually, I’m not even sure you can apply the word awareness to non-human-like optimizers.
“If we reprogrammed you to count paperclips instead”
This is a conversation about changing my core utility function / goals, and what you are discussing would be far more of an architectural change. I meant, within my architecture (and, I assume, generalizing to most human architectures and most goals), we are, on some level, aware of the actual goal. There are occasional failure states (Alicorn mentioned iron deficiencies register as a craving for ice o.o), but these tend to tie in to low-level failures, not high-order goals like “make a paperclip”, and STILL we tend to manage to identify these and learn how to achieve our actual goals.
Survival and procreation aren’t primary goals in any direct sense. We have urges that have been selected for because they contribute to inclusive genetic fitness, but at the implementation level they don’t seem to be evaluated by their contributions to some sort of unitary probability-of-survival metric; similarly, some actions that do contribute greatly to inclusive genetic fitness (like donating eggs or sperm) are quite rare in practice and go almost wholly unrewarded by our biology. Because of this architecture, we end up with situations where we sate our psychological needs at the expense of the factors that originally selected for them: witness birth control or artificial sweeteners. This is basically the same point Eliezer was making here.
It might be meaningful to treat supergoals as intentional if we were discussing an AI, since in that case there would be a unifying intent behind each fitness metric that actually gets implemented, but even in that case I’d say it’s more accurate to talk about the supergoal as a property not of the AI’s mind but of its implementors. Humans, of course, don’t have that excuse.
Evolved creatures as we know them (at least the ones with complex brains) are reward-center-reward maximizers, which implicitly correlates with being offspring maximizers. (Actual, non-brainy organisms are probably closer to offspring maximizers).
So far as I can tell, he chose to carve the world at this joint when making the definition of ‘right’. In short, by definition. This is hardly the first time. Not too long ago, and perhaps in this sequence, there was a post about rightness and multiple-place-functions that justified the utility of this definition.
I think he’s talking about the obvious fact that you’d be able to think to yourself “it seems I’m trying to maximize paperclips”, as well as the other differences in your experience that would occur for similar reasons.
Um, how do you know?
It would depend on what exactly what we reprogrammed within you, I expect.
Exactly. I mean, you could probably make it have its own quale, but you could also make it not, and I don’t see why that would be in question as long as we’re postulating brain-reprogramming powers.
Assume the subject of reprogramming is an existing human being, otherwise minimally altered by this reprogramming, i.e., we don’t do anything that isn’t necessary to switch their motivation to paperclips. So unless you do something gratuitiously non-minimal like moving the whole decision-action system out of the range of introspective modeling, or cutting way down on the detail level of introspective modeling, or changing the empathic architecture for modeling hypothetical selves, the new person will experience themselves as having ineffable ‘qualia’ associated with the motivation to produce paperclips.
The only way to make it seem to them like their motivational quales hadn’t changed over time would be to mess with the encoding of their previous memories of motivation, presumably in a structure-destroying way since the stored data and their introspectively exposed surfaces will not be naturally isomorphic. If you carry out the change to paperclip-motivation in the obvious way, cognitive comparisions of the retrieved memories to current thoughts will return ‘unequal ineffable quales’, and if the memories are visualized in different modalities from current thoughts, ‘incomparable ineffable quales’.
Doing-what-leads-to-paperclips will also be a much simpler ‘quale’, both from the outside perspective looking at the complexity of cognitive data, and in terms of the internal experience of complexity—unless you pack an awful lot of detail into the question of what constitutes a more preferred paperclip. Otherwise, compared to the old days when you thought about justice and fairness, introspection will show that less questioning and uncertainty is involved, and that there are fewer points of variation among the motivational thought-quales being considered.
I suppose you could put in some extra work to make the previous motivations map in cognitively comparable ways along as many joints as possible, and try to edit previous memories without destroying their structure so that they can be visualized in a least common modality with current experiences. But even if you did, memories of the previous quales for rightness-motivation would appear as different in retrospect when compared to current quales for paperclip-motivation as a memory of a 3D greyscale forest landscape vs. a current experience of a 2D red-and-green fractal, even if they’re both articulated in the visual sensory modality and your modal workspace allows you to search for, focus on, and compare commonly ‘experienced’ shapes between them.
I think you and Alicorn may be talking past each other somewhat.
Throughout my life, it seems that what I morally value has varied more than what rightness feels like—just as it seems that what I consider status-raising has changed more than what rising in status feels like, and what I find physically pleasurable has changed more than what physical pleasures feel like. It’s possible that the things my whole person is optimizing for have not changed at all, that my subjective feelings are a direct reflection of this, and that my evaluation of a change of content is merely a change in my causal model of the production of the desiderata (I thought voting for Smith would lower unemployment, but now I think voting for Jones would, etc.) But it seems more plausible to me that
1) the whole me is optimizing for various things, and these things change over time,
2) and that the conscious me is getting information inputs which it can group together by family resemblance, and which can reinforce or disincentivize its behavior.
Imagine a ship which is governed by an anarchic assembly beneath board and captained by an employee of theirs whom they motivate through in-kind bonuses. So the assembly at one moment might be looking for buried treasure, which they think is in such-and-such a place, and so they send her baskets of fresh apples when she’s steering in that direction and baskets of stinky rotten apples when she’s steering in the wrong. For other goals (refueling, not crashing into reefs) they send her excellent or tedious movies and gorgeous or ugly cabana boys. The captain doesn’t even have direct access to what the apples or whatever are motivating her to do; although she can piece it together. She might even start thinking of apples as irreducibly connected to treasure. But if the assembly decided that they wanted to look for ports of call instead of treasure, I don’t see why in principle they couldn’t start sending her apples in order to do so. And if they did, I think her first response would be, if she was verbally asked, that the treasure—or whatever the dubloons constituting the treasure ultimately represent in terms of the desiderata of the assembly—had moved to the ports of call. This might be a correct inference—perhaps the assembly wants the treasure for money and now they think that comes better from heading to ports of call—but it hardly seems to be a necessarily correct one.
If I met two vampires, and one said his desire to drink blood was mediated through hunger (and that he no longer felt hunger for food, or lust) and another said her desire to drink blood was mediated through lust (and that she no longer felt lust for sex, or hunger) then I do think—presuming they were both once human, experiencing lust and hunger like me—they’ve told me something that allows me to distinguish their experiences from one another, even though they both desire blood and not food or sex.
They may or may not be able to explain to what it is like to be a bat.
Unless I’m inserting a further layer of misunderstanding your position seems to be curiously disjunctivist. I or you or Alicorn or all of us may be making bad inferences in taking “feels like” to mean “reminds one of the sort of experience that brings to mind...” (“I feel like I got mauled by a bear,” says someone not just and maybe never mauled by a bear) or “constituting an experience of” (“what an algorithm feels like from the inside”) when the other is intended. This seems to be a pretty easy elision to make—consider all the philosophers who say things like “well, it feels like we have libertarian free will...”
This comment expands how you’d go about reprogramming someone in this way with another layer of granularity, which is certainly interesting on its own merits, but it doesn’t strongly support your assertion about what it would feel like to be that someone. What makes you think this is how qualia work? Have you been performing sinister experiments in your basement? Do you have magic counterfactual-luminosity-powers?
I think Eliezer is simply suggesting that qualia don’t in fact exist in a vacuum. Green feels the way it does partly because it’s the color of chlorophyll. In a universe where plants had picked a different color for chlorophyll (melanophyll, say), with everything else (per impossibile) held constant, we would associate an at least slightly different quale with green and with black, because part of how colors feel is that they subtly remind us of the things that are most often colored that way. Similarly, part of how ‘goodness’ feels is that it imperceptibly reminds us of the extension of good; if that extension were dramatically different, then the feeling would (barring any radical redesigns of how associative thought works) be different too. In a universe where the smallest birds were ten feet tall, thinking about ‘birdiness’ would involve a different quale for the same reason.
It sounds to me like you don’t think the answer had anything to do with the question. But to think that, you’d pretty much have to discard both the functionalist and physicalist theories of mind, and go full dualist/neutral monist; wouldn’t you?
I think I’ll go with this as my reply—“Well, imagine that you lived in a monist universe—things would pretty much have to work that way, wouldn’t they?”
Possibly (this is total speculation) Eliezer is talking about the feeling of one’s entire motivational system (or some large part of it), while you’re talking about the feeling of some much narrower system that you identify as computing morality; so his conception of a Clippified human wouldn’t share your terminal-ish drives to eat tasty food, be near friends, etc., and the qualia that correspond to wanting those things.
The Clippified human categorizes foods into a similar metric of similarity—still believes that fish tastes more like steak than like chocolate—but of course is not motivated to eat except insofar as staying alive helps to make more paperclips. They have taste, but not tastiness. Actually that might make a surprisingly good metaphor for a lot of the difficulty that some people have with comprehending how Clippy can understand your pain and not care—maybe I’ll try it on the other end of that Facebook conversation.
The metaphor seems like it could lose most of its effectiveness on people who have never applied the outside view to how taste and tastiness feel from inside—they’ve never realized that chocolate tastes good because their brain fires “good taste” when it perceives the experience “chocolate taste”. The obvious resulting cognitive dissonance (from “tastes bad for others”) predictions match my observations, so I suspect this would be common among non-rationalists. If the Facebook conversation you mention is with people who haven’t crossed that inferential gap yet, it might prove not that useful.
Consider Bob. Bob, like most unreflective people, settles many moral questions by “am I disgusted by it?” Bob is disgusted by, among other things, feces, rotten fruit, corpses, maggots, and men kissing men. Internally, it feels to Bob like the disgust he feels at one of those stimuli is the same as the disgust he feels at the other stimuli, and brain scans show that they all activate the insula in basically the same way.
Bob goes through aversion therapy (or some other method) and eventually his insula no longer activates when he sees men kissing men.
When Bob remembers his previous reaction to that stimuli, I imagine he would remember being disgusted, but not be disgusted when he remembers the stimuli. His positions on, say, same-sex marriage or the acceptability of gay relationships have changed, and he is aware that they have changed.
Do you think this example agrees with your account? If/where it disagrees, why do you prefer your account?
I think this is really a sorites problem. If you change what’s delicious only slightly, then deliciousness itself seems to be unaltered. But if you change it radically — say, if circuits similar to your old gustatory ones now trigger when and only when you see a bright light — then it seems plausible that the experience itself will be at least somewhat changed, because ‘how things feel’ is affected by our whole web of perceptual and conceptual associations. There isn’t necessarily any sharp line where a change in deliciousness itself suddenly becomes perceptible; but it’s nevertheless the case that the overall extension of ‘delicious’ (like ‘disgusting’ and ‘moral’) has some effect on how we experience deliciousness. E.g., deliciousness feels more foodish than lightish.
When I look at the problem introspectively, I can see that as a sensible guess. It doesn’t seem like a sensible guess when I look at it from a neurological perspective. If the activation of the insula is disgust, then the claim that outputs of the insula will have a different introspective flavor when you rewire the inputs of the insula seems doubtful. Sure, it could be the case, but why?
When we hypnotize people to make them disgusted by benign things, I haven’t seen any mention that the disgust has a different introspective flavor, and people seem to reason about that disgust in the exact same way that they reason about the disgust they had before.
This seems like the claim that rewiring yourself leads to something like synesthesia, and that just seems like an odd and unsupported claim to me.
Certain patterns of behavior at the insula correlate with disgust. But we don’t know whether they’re sufficient for disgust, nor do we know which modifications within or outside of the insula change the conscious character of disgust. There are lots of problems with identity claims at this stage, so I’ll just raise one: For all we know, activation patterns in a given brain region correlate with disgust because disgust is experienced when that brain region inhibits another part of the brain; an experience could consist, in context, in the absence of a certain kind of brain activity.
Hypnosis data is especially difficult to evaluate, because it isn’t clear (a) how reliable people’s self-reports about introspection are while under hypnosis; nor (b) how reliable people’s memories-of-hypnosis are afterward. Some ‘dissociative’ people even give contradictory phenomenological reports while under hypnosis.
That said, if you know of any studies suggesting that the disgust doesn’t have at all a different character, I’d be very interested to see them!
If you think my claim isn’t modest and fairly obvious, then it might be that you aren’t understanding my claim. Redness feels at least a little bit bloodish. Greenness feels at least a little bit foresty. If we made a clone who sees evergreen forests as everred and blood as green, then their experience of greenness and redness would be partly the same, but it wouldn’t be completely the same, because that overtone of bloodiness would remain in the background of a variety of green experiences, and that woodsy overtone would remain in the background of a variety of red experiences.
I’m differentiating between “red evokes blood” and “red feels bloody,” because those seem like different things to me. The former deals with memory and association, and the second deals with introspection, and so I agree that the same introspective sensation could evoke very different memories.
The dynamics of introspective sensations could plausibly vary between people, and so I’m reluctant to discuss it extensively except in the context of object-level comparisons.
I’m not sure exactly what you mean by “red evokes blood.” I agree that “red feels bloody” is intuitively distinct from “I tend to think explicitly about blood when I start thinking about redness,” though the two are causally related. Certain shades of green to me feel fresh, clean, ‘naturey;’ certain shades of red to me feel violent, hot, glaring; certain shades of blue feel cool; etc. My suggestion is that these qualia, which are part of the feeling of the colors themselves for most humans, would be experientially different even when decontextualized if we’d gone through life perceiving forests as blue, oceans as red, campfires as green, etc. By analogy, the feeling of ‘virtue’ may be partly independent of which things we think of under the concept ‘virtuous;’ but it isn’t completely independent of those things.
I am aware that many humans have this sort of classification of colors, and have learned it because of its value in communication, but as far as I can tell this isn’t a significant part of my mental experience. A dark green might make it easier for me to think of leaves or forests, but I don’t have any experiences that I would describe as feeling ‘naturey’. If oceans and forests swapped colors, I imagine that seeing the same dark green would make it easier for me to think of waves and water, but I think my introspective experience would be the same.
If I can simplify your claim a bit, it sounds like if both oceans and forests were dark green, then seeing dark green would make you think of leaves and waves / feel associated feelings, and that this ensemble would be different from your current sensation of ocean blue or forest green. It seems sensible to me that the ensembles are different because they have different elements.
I’m happier with modeling that as perceptual bleedover- because forests and green are heavily linked, even forests that aren’t green are linked to green, and greens that aren’t on leaves are linked with forests- than I am modeling that as an atom of consciousness- the sensation of foresty greens- but if your purposes are different, a different model may be more suitable.
Part of the problem may be that I’m not so sure I have a distinct, empirically robust idea of an ‘atom of consciousness.’ I took for granted your distinction between ‘evoking blood’ and ‘feeling bloody,’ but in practice these two ideas blend together a great deal. Some ideas—phonological and musical ones, for example—are instantiated in memory by certain temporal sequences and patterns of association. From my armchair, I’m not sure how much my idea of green (or goodness, or clippiness) is what it is in virtue of its temporal and associative dispositions, too. And I don’t know if Eliezer is any less confused than I.
It wouldn’t surprise me if the sensation of disgust has some variation from one person to another, and even for the same person, from one object to another.
I just wanted to tell everyone that it is great fun to read this in the voice of that voice actor for the Enzyte commercial :)
I think this is easier because disgust is relatively arbitrary to begin with, in that it seems to implement a function over the world-you relation (roughly, things that are bad for you to eat/be near). We wouldn’t expect that relation to have much coherence to begin with, so there’d be not much loss of coherence from modifying it—though, arguably, the same thing could be said for most qualia—elegance is kind of the odd one out.
I wouldn’t be all that suprised if the easiest way to get a human maximizing papperclips was to make it believe paperclips had epiphenomenal consciousnesses experiencing astronomical amounts of pleasure.
edit: or you could just give them a false memory of god telling them to do it.
The Enrichment Center would like to remind you that the Paperclip cannot speak. In the event that the Paperclip does speak, the Enrichment Center urges you to disregard its advice.
Wouldn’t it be easier to have the programee remember themself as misunderstanding morality—like a reformed racist who previously preferred options that harmed minorities. I know when I gain more insight into my ethics I remember making decisions that, in retrospect, are incomprehensible (unless I deliberately keep in mind how I thought I should act.)
That depends on the details of how the human brain stores goals and memories.
Cached thoughts regularly supersede actual moral thinking, like all forms of thinking, and I am capable of remembering this experience. Am I misunderstanding your comment?
My point is that in order to “fully reprogram” someone it is also necessary to clear their “moral cache” at the very least.
Well … is it? Would you notice if your morals changed when you weren’t looking?
I probably would, but then again I’m in the habit of comparing the out of my moral intuitions with stored earlier versions of that output.
I guess it depends on how much you rely on cached thoughts in your moral reasoning.
Of course, it can be hard to tell how much you’re using ’em. Hmm...
I have no problem with this passage. But it does not seem obviously impossible to create a device that stimulates that-which-feels-rightness proportionally to (its estimate of) the clippiness of the universe—it’s just a very peculiar kind of wireheading.
As you point out, it’d be obvious, on reflection, that one’s sense of rightness has changed; but that doesn’t necessarily make it a different qualia, any more than having your eyes opened to the suffering of (group) changes your experience of (in)justice qua (in)justice.
Although I think your point here is plausible, I don’t think it fits in a post where you are talking about the logicalness of morality. This qualia problem is physical; whether your feeling changes when the structure of some part of your decision system changes depends on your implementation.
Maybe your background understanding of neurology is enough for you to be somewhat confident stating this feeling/logical-function relation for humans. But mine is not and, although I could separate your metaethical explanations from your physical claims when reading the post, I think it would be better off without the latter.
Speaking from personal experience, I can say that he’s right.
Explaining how I know this, much less sharing the experience, is more difficult.
The simplest idea I can present is that you probably have multiple utility functions. If you’re buying apples, you’ll evaluate whether you like that type of apple, what the quality of the apple is, and how good the price is. For me, at least, these all FEEL different—a bruised apple doesn’t “feel” overpriced the way a $5 apple at the airport does. Even disliking soft apples feels very different from recognizing a bruised apple, even though they both also go in to a larger basket of “no good”.
What’s more, I can pick apples based on someone ELSE’S utility function, and actually often shop with my roommate’s function in mind (she likes apples a lot more than me, but is also much pickier, as it happens). This feels different from using my own utility function.
The other side of this is that I would expect my brain to NOTICE it’s actual goals. If my goal is to make paperclips, I will think “I should do this because it makes paperclips”, instead of “I should do this because it makes people happy”. My brain doesn’t have a generic “I should do this” emotion, as near as I can tell—it just has ways of signalling that an activity will accomplish my goals.
Thus, it seems reasonable to conclude that my feelings are more a combination of activity + outcome, not some raw platonic ideal. While sex, hiking, and a nice meal all make me “happy”, they still feel completely different—I just lump them in to a larger category of “happiness” for some reason.
I’d strongly suspect you can add make-more-paperclips to that emotional category, but I see absolutely no reason you could make me treat it the same as a nice dinner, because that wouldn’t even make sense.
So, you introspect the way that he introspects. Do all humans? Would all humans need to introspect that way for it to do the work that he wants it to do?
Ooh, good call, thank you. I suppose it might be akin to visualization, where it actually varies from person to person. Does anyone here on LessWrong have conflicting anecdotes, though? Does anyone disagree with what I said? If not, it seems like a safe generalization for now, but it’s still useful to remember I’m generalizing from one example :)
Remembering that other people have genuinely alien minds is surprisingly tricky.
Iron deficiency feels like wanting ice. For clever, verbal reasons. Not being iron deficient doesn’t feel like anything. My brain did not notice that it was trying to get iron—it didn’t even notice it was trying to get ice, it made up reasons according to which ice was an instrumental value for some terminal goal or other.
Other people? I find my own mind quite alien below the thin layer accessible to my introspection. Heck, most of the time I cannot even tell if my introspection lies to me.
I think I have a different introspection here.
When I have a feeling such as ‘doing-whats-right’ there is a positive emotional response associated with it. Immediately I attach semantic content to that emotion: I identify it as being produced by the ‘doing-whats-right’ emotion. How do I do this? I suspect that my brain has done the work to figure out that emotional response X is associated with behavior Y, and just does the work quickly.
But this is maleable. Over time, the emotional response associated with an act can change and this does not necessarily indicate a change in semantic content. I can, for example, give to a charity that I am not convinced is good and I still will often get the ‘doing-whats-right’ emotion even though the semantic content isn’t really there. I can also find new things I value, and occasionally I will acknowledge that I value something before I get positive emotional reinforcement. So in my experience, they aren’t identical.
I strongly suspect that if you reprogrammed my brain to value counting paperclips, it would feel the same as doing what is right. At very least, this would not be inconsistent. I might learn to attach paperclippy instead of good to that emotional state, but it would feel the same.
… they do? For what values of “alien”?
Because I’m not sure how else to capture a “scale of alien-ness”:
I once wrote a sci-fi race that was a blind, deaf ooze, but extremely intelligent and very sensitive to tactile input. Over the years, and with the help of a few other people, I’ve gotten a fairly good feel for their mindset and how they approach the world.
There’s a distinct subset of humans which I find vastly more puzzling than these guys.
From Humans in Funny Suits:
The race was explicitly designed to try and avoid “humans in funny suits”, and have a culture that’s probably more foreign than the 1960s. But I’m only 29, and haven’t traveled outside of English-speaking countries, so take that with a dash of salt!
On a 0-10 scale, with myself at 0, humans in funny suits at 1, and the 1960s at 2, I’d rate my creation as a 4, and a subset of humanity exists in the 4-5 range. Around 5, I have trouble with the idea that there’s coherent intelligent reasoning happening, because the process is just completely lost on me, and I don’t think I’d be able to easily assign anything more than a 5, much less even speculate on what a 10 would look like.
Trying to give a specific answer to “how alien is it” is a lot harder than it seems! :)
If I may make a recommendation, if you are concerned about “alien aliens”, read a few things by Stanislaw Lem. The main theme of Lem’s scifi, I would say, is alien minds, and failure of first contact. “Solaris” is his most famous work (but the adaptation with Clooney is predictably terrible).
Not sure if I’ve read Lem, but I’ll be sure to check it out. I have a love for “truly alien” science fiction, which is why I had to try my hand at making one of my own :)
Well reading fiction (and non-fiction) for which English speakers of your generation weren’t the target audience is a good way to start compensating.
I’ve got a lot of exposure to “golden age” science fiction and fantasy, so going back a few decades isn’t hard for me. I just don’t get exposed to many other good sources. The “classics” seem to generally fail to capture that foreignness.
If you have recommendations, especially a broader method than just naming a couple authors, I’d love to hear it. Most of my favourite authors have a strong focus on foreign cultures, either exploring them or just having characters from diverse backgrounds.
Anime&Manga, particularly the older stuff is a decent source.
… it is really sad that I completely forgot that anime and manga isn’t English. I grew up around it, so it’s just a natural part of my culture. Suffice to say, I’ve had a lot of exposure—but not to anything older than I am.
Any recommendations for OLD anime or manga, given I don’t speak/read Japanese? :)
You’re probably best of asking on a manga/forum, but Barefoot Gen is a good, and depressing, start.
Which time period do you mean by this? “Golden age of science fiction” typically refers to the 1940′s and 1950′s, “golden age of fantasy” to the late 1970′s and early 1980′s. If you mean the latter time period, read stuff from the former as a start. Also try going back at least a century to the foundational fantasy authors, e.g., Edgar Rice Burroughs, William Morris’s The Well at the World’s End. Go even further back to things like Treasure Island, or The Three Musketeers. Or even further back to the days when people believed the stuff in their “fantasy” could actually happen. Read Dante’s Divine Comedy, Thomas Moore’s Utopia, an actual chivalric romance (I haven’t read any so I can’t give recommendations).
A good rule of thumb is that you should experience values dissonance while reading them. A culture whose values don’t make you feel uncomfortable isn’t truly alien. Also for this reason, avoid modern adaptations as these tend to do their best clean up the politically incorrect parts and otherwise modernize the worldview.
I’m intrigued. Do you have a link?
Sadly not. I really should do a proper write-up, but right now they’re mostly stored in the head of me and their co-creator.
Secondary goals often feel like primary. Breathing and quenching thirst are means of achieving the primary goal of survival (and procreation), yet they themselves feel like primary. Similarly, a paperclip maximizer may feel compelled to harvest iron without any awareness that it wants to do it in order to produce paperclips.
Bull! I’m quite aware of why I eat, breathe, and drink. Why in the world would a paperclip maximizer not be aware of this?
Unless you assume Paperclippers are just rock-bottom stupid I’d also expect them to eventually notice the correlation between mining iron, smelting it, and shaping it in to a weird semi-spiral design… and the sudden rise in the number of paperclips in the world.
I’m not sure that awareness is needed for paperclip maximizing. For example, one might call fire a very good CO2 maximizer. Actually, I’m not even sure you can apply the word awareness to non-human-like optimizers.
“If we reprogrammed you to count paperclips instead”
This is a conversation about changing my core utility function / goals, and what you are discussing would be far more of an architectural change. I meant, within my architecture (and, I assume, generalizing to most human architectures and most goals), we are, on some level, aware of the actual goal. There are occasional failure states (Alicorn mentioned iron deficiencies register as a craving for ice o.o), but these tend to tie in to low-level failures, not high-order goals like “make a paperclip”, and STILL we tend to manage to identify these and learn how to achieve our actual goals.
Survival and procreation aren’t primary goals in any direct sense. We have urges that have been selected for because they contribute to inclusive genetic fitness, but at the implementation level they don’t seem to be evaluated by their contributions to some sort of unitary probability-of-survival metric; similarly, some actions that do contribute greatly to inclusive genetic fitness (like donating eggs or sperm) are quite rare in practice and go almost wholly unrewarded by our biology. Because of this architecture, we end up with situations where we sate our psychological needs at the expense of the factors that originally selected for them: witness birth control or artificial sweeteners. This is basically the same point Eliezer was making here.
It might be meaningful to treat supergoals as intentional if we were discussing an AI, since in that case there would be a unifying intent behind each fitness metric that actually gets implemented, but even in that case I’d say it’s more accurate to talk about the supergoal as a property not of the AI’s mind but of its implementors. Humans, of course, don’t have that excuse.
All good points. I was mostly thinking about an evolved paperclip maximizer, which may or may not be a result of a fooming paperclip-maximizing AI.
Evolved creatures as we know them (at least the ones with complex brains) are reward-center-reward maximizers, which implicitly correlates with being offspring maximizers. (Actual, non-brainy organisms are probably closer to offspring maximizers).
An evolved agent wouldn’t evolve to maximize paper clips.
It could if the environment rewarded paperclips. Admittedly this would require an artificial environment, but that’s hardly impossible.
So far as I can tell, he chose to carve the world at this joint when making the definition of ‘right’. In short, by definition. This is hardly the first time. Not too long ago, and perhaps in this sequence, there was a post about rightness and multiple-place-functions that justified the utility of this definition.
I think he’s talking about the obvious fact that you’d be able to think to yourself “it seems I’m trying to maximize paperclips”, as well as the other differences in your experience that would occur for similar reasons.