Value is Fragile

Eliezer YudkowskyJan 29, 2009, 8:46 AM

173 points

If I had to pick a single statement that relies on more Overcoming Bias content I’ve written than any other, that statement would be:

Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth.

“Well,” says the one, “maybe according to your provincial human values, you wouldn’t like it. But I can easily imagine a galactic civilization full of agents who are nothing like you, yet find great value and interest in their own goals. And that’s fine by me. I’m not so bigoted as you are. Let the Future go its own way, without trying to bind it forever to the laughably primitive prejudices of a pack of four-limbed Squishy Things—”

My friend, I have no problem with the thought of a galactic civilization vastly unlike our own… full of strange beings who look nothing like me even in their own imaginations… pursuing pleasures and experiences I can’t begin to empathize with… trading in a marketplace of unimaginable goods… allying to pursue incomprehensible objectives… people whose life-stories I could never understand.

That’s what the Future looks like if things go right.

If the chain of inheritance from human (meta)morals is broken, the Future does not look like this. It does not end up magically, delightfully incomprehensible.

With very high probability, it ends up looking dull. Pointless. Something whose loss you wouldn’t mourn.

Seeing this as obvious, is what requires that immense amount of background explanation.

And I’m not going to iterate through all the points and winding pathways of argument here, because that would take us back through 75% of my Overcoming Bias posts. Except to remark on how many different things must be known to constrain the final answer.

Consider the incredibly important human value of “boredom”—our desire not to do “the same thing” over and over and over again. You can imagine a mind that contained almost the whole specification of human value, almost all the morals and metamorals, but left out just this one thing -

- and so it spent until the end of time, and until the farthest reaches of its light cone, replaying a single highly optimized experience, over and over and over again.

Or imagine a mind that contained almost the whole specification of which sort of feelings humans most enjoy—but not the idea that those feelings had important external referents. So that the mind just went around feeling like it had made an important discovery, feeling it had found the perfect lover, feeling it had helped a friend, but not actually doing any of those things—having become its own experience machine. And if the mind pursued those feelings and their referents, it would be a good future and true; but because this one dimension of value was left out, the future became something dull. Boring and repetitive, because although this mind felt that it was encountering experiences of incredible novelty, this feeling was in no wise true.

Or the converse problem—an agent that contains all the aspects of human value, except the valuation of subjective experience. So that the result is a nonsentient optimizer that goes around making genuine discoveries, but the discoveries are not savored and enjoyed, because there is no one there to do so. This, I admit, I don’t quite know to be possible. Consciousness does still confuse me to some extent. But a universe with no one to bear witness to it, might as well not be.

Value isn’t just complicated, it’s fragile. There is more than one dimension of human value, where if just that one thing is lost, the Future becomes null. A single blow and all value shatters. Not every single blow will shatter all value—but more than one possible “single blow” will do so.

And then there are the long defenses of this proposition, which relies on 75% of my Overcoming Bias posts, so that it would be more than one day’s work to summarize all of it. Maybe some other week. There’s so many branches I’ve seen that discussion tree go down.

After all—a mind shouldn’t just go around having the same experience over and over and over again. Surely no superintelligence would be so grossly mistaken about the correct action?

Why would any supermind want something so inherently worthless as the feeling of discovery without any real discoveries? Even if that were its utility function, wouldn’t it just notice that its utility function was wrong, and rewrite it? It’s got free will, right?

Surely, at least boredom has to be a universal value. It evolved in humans because it’s valuable, right? So any mind that doesn’t share our dislike of repetition, will fail to thrive in the universe and be eliminated...

If you are familiar with the difference between instrumental values and terminal values, and familiar with the stupidity of natural selection, and you understand how this stupidity manifests in the difference between executing adaptations versus maximizing fitness, and you know this turned instrumental subgoals of reproduction into decontextualized unconditional emotions...

...and you’re familiar with how the tradeoff between exploration and exploitation works in Artificial Intelligence...

...then you might be able to see that the human form of boredom that demands a steady trickle of novelty for its own sake, isn’t a grand universal, but just a particular algorithm that evolution coughed out into us. And you might be able to see how the vast majority of possible expected utility maximizers, would only engage in just so much efficient exploration, and spend most of its time exploiting the best alternative found so far, over and over and over.

That’s a lot of background knowledge, though.

And so on and so on and so on through 75% of my posts on Overcoming Bias, and many chains of fallacy and counter-explanation. Some week I may try to write up the whole diagram. But for now I’m going to assume that you’ve read the arguments, and just deliver the conclusion:

We can’t relax our grip on the future—let go of the steering wheel—and still end up with anything of value.

And those who think we can -

- they’re trying to be cosmopolitan. I understand that. I read those same science fiction books as a kid: The provincial villains who enslave aliens for the crime of not looking just like humans. The provincial villains who enslave helpless AIs in durance vile on the assumption that silicon can’t be sentient. And the cosmopolitan heroes who understand that minds don’t have to be just like us to be embraced as valuable -

I read those books. I once believed them. But the beauty that jumps out of one box, is not jumping out of all boxes. (This being the moral of the sequence on Lawful Creativity.) If you leave behind all order, what is left is not the perfect answer, what is left is perfect noise. Sometimes you have to abandon an old design rule to build a better mousetrap, but that’s not the same as giving up all design rules and collecting wood shavings into a heap, with every pattern of wood as good as any other. The old rule is always abandoned at the behest of some higher rule, some higher criterion of value that governs.

If you loose the grip of human morals and metamorals—the result is not mysterious and alien and beautiful by the standards of human value. It is moral noise, a universe tiled with paperclips. To change away from human morals in the direction of improvement rather than entropy, requires a criterion of improvement; and that criterion would be physically represented in our brains, and our brains alone.

Relax the grip of human value upon the universe, and it will end up seriously valueless. Not, strange and alien and wonderful, shocking and terrifying and beautiful beyond all human imagination. Just, tiled with paperclips.

It’s only some humans, you see, who have this idea of embracing manifold varieties of mind—of wanting the Future to be something greater than the past—of being not bound to our past selves—of trying to change and move forward.

A paperclip maximizer just chooses whichever action leads to the greatest number of paperclips.

No free lunch. You want a wonderful and mysterious universe? That’s your value. You work to create that value. Let that value exert its force through you who represents it, let it make decisions in you to shape the future. And maybe you shall indeed obtain a wonderful and mysterious universe.

No free lunch. Valuable things appear because a goal system that values them takes action to create them. Paperclips don’t materialize from nowhere for a paperclip maximizer. And a wonderfully alien and mysterious Future will not materialize from nowhere for us humans, if our values that prefer it are physically obliterated—or even disturbed in the wrong dimension. Then there is nothing left in the universe that works to make the universe valuable.

You do have values, even when you’re trying to be “cosmopolitan”, trying to display a properly virtuous appreciation of alien minds. Your values are then faded further into the invisible background—they are less obviously human. Your brain probably won’t even generate an alternative so awful that it would wake you up, make you say “No! Something went wrong!” even at your most cosmopolitan. E.g. “a nonsentient optimizer absorbs all matter in its future light cone and tiles the universe with paperclips”. You’ll just imagine strange alien worlds to appreciate.

Trying to be “cosmopolitan”—to be a citizen of the cosmos—just strips off a surface veneer of goals that seem obviously “human”.

But if you wouldn’t like the Future tiled over with paperclips, and you would prefer a civilization of...

...sentient beings...

...with enjoyable experiences...

...that aren’t the same experience over and over again...

...and are bound to something besides just being a sequence of internal pleasurable feelings...

...learning, discovering, freely choosing...

...well, I’ve just been through the posts on Fun Theory that went into some of the hidden details on those short English words.

Values that you might praise as cosmopolitan or universal or fundamental or obvious common sense, are represented in your brain just as much as those values that you might dismiss as merely human. Those values come of the long history of humanity, and the morally miraculous stupidity of evolution that created us. (And once I finally came to that realization, I felt less ashamed of values that seemed ‘provincial’ - but that’s another matter.)

These values do not emerge in all possible minds. They will not appear from nowhere to rebuke and revoke the utility function of an expected paperclip maximizer.

Touch too hard in the wrong dimension, and the physical representation of those values will shatter—and not come back, for there will be nothing left to want to bring it back.

And the referent of those values—a worthwhile universe—would no longer have any physical reason to come into being.

Let go of the steering wheel, and the Future crashes.

What links here?

Eliezer YudkowskyJan 29, 2009, 8:46 AM

173 points

108 comments6 min readLW link Archive

Complexity of value Human Values