Terminal Values and Instrumental Values

Eliezer YudkowskyNov 15, 2007, 7:56 AM

116 points

Distinctions Philosophy of Language Complexity of value Bayesianism Utility Functions

On a purely instinctive level, any human planner behaves as if they distinguish between means and ends. Want chocolate? There’s chocolate at the Publix supermarket. You can get to the supermarket if you drive one mile south on Washington Ave. You can drive if you get into the car. You can get into the car if you open the door. You can open the door if you have your car keys. So you put your car keys into your pocket, and get ready to leave the house...

...when suddenly the word comes on the radio that an earthquake has destroyed all the chocolate at the local Publix. Well, there’s no point in driving to the Publix if there’s no chocolate there, and no point in getting into the car if you’re not driving anywhere, and no point in having car keys in your pocket if you’re not driving. So you take the car keys out of your pocket, and call the local pizza service and have them deliver a chocolate pizza. Mm, delicious.

I rarely notice people losing track of plans they devised themselves. People usually don’t drive to the supermarket if they know the chocolate is gone. But I’ve also noticed that when people begin explicitly talking about goal systems instead of just wanting things, mentioning “goals” instead of using them, they oft become confused. Humans are experts at planning, not experts on planning, or there’d be a lot more AI developers in the world.

In particularly, I’ve noticed people get confused when—in abstract philosophical discussions rather than everyday life—they consider the distinction between means and ends; more formally, between “instrumental values” and “terminal values”.

(Another long post needed as a reference.)

Part of the problem, it seems to me, is that the human mind uses a rather ad-hoc system to keep track of its goals—it works, but not cleanly. English doesn’t embody a sharp distinction between means and ends: “I want to save my sister’s life” and “I want to administer penicillin to my sister” use the same word “want”.

Can we describe, in mere English, the distinction that is getting lost?

As a first stab:

“Instrumental values” are desirable strictly conditional on their anticipated consequences. “I want to administer penicillin to my sister”, not because a penicillin-filled sister is an intrinsic good, but in anticipation of penicillin curing her flesh-eating pneumonia. If instead you anticipated that injecting penicillin would melt your sister into a puddle like the Wicked Witch of the West, you’d fight just as hard to keep her penicillin-free.

“Terminal values” are desirable without conditioning on other consequences: “I want to save my sister’s life” has nothing to do with your anticipating whether she’ll get injected with penicillin after that.

This first attempt suffers from obvious flaws. If saving my sister’s life would cause the Earth to be swallowed up by a black hole, then I would go off and cry for a while, but I wouldn’t administer penicillin. Does this mean that saving my sister’s life was not a “terminal” or “intrinsic” value, because it’s theoretically conditional on its consequences? Am I only trying to save her life because of my belief that a black hole won’t consume the Earth afterward? Common sense should say that’s not what’s happening.

So forget English. We can set up a mathematical description of a decision system in which terminal values and instrumental values are separate and incompatible types—like integers and floating-point numbers, in a programming language with no automatic conversion between them.

An ideal Bayesian decision system can be set up using only four elements:

Outcomes : type Outcome[]
- list of possible outcomes
- {sister lives, sister dies}
Actions: type Action[]
- list of possible actions
- {administer penicillin, don’t administer penicillin}
Utility_function : type Outcome → Utility
- utility function that maps each outcome onto a utility
- (a utility being representable as a real number between negative and positive infinity)
- {sister lives: 1, sister dies: 0}
Conditional_probability_function : type Action → Outcome → Probability
- conditional probability function that maps each action onto a probability distribution over outcomes
- (a probability being representable as a real number between 0 and 1)
- {administer penicillin: sister lives, .9; sister dies, .1 ;; don’t administer penicillin: sister lives, 0.3; sister dies, 0.7}

If you can’t read the type system directly, don’t worry, I’ll always translate into English. For programmers, seeing it described in distinct statements helps to set up distinct mental objects.

And the decision system itself?

Expected_Utility : Action A → (Sum O in Outcomes: Utility(O) * Probability(O|A))
- The “expected utility” of an action equals the sum, over all outcomes, of the utility of that outcome times the conditional probability of that outcome given that action.
- {EU(administer penicillin) = 0.9 ; EU(don’t administer penicillin) = 0.3}
Choose : → (Argmax A in Actions: Expected_Utility(A))
- Pick an action whose “expected utility” is maximal
- {return: administer penicillin}

For every action, calculate the conditional probability of all the consequences that might follow, then add up the utilities of those consequences times their conditional probability. Then pick the best action.

This is a mathematically simple sketch of a decision system. It is not an efficient way to compute decisions in the real world.

Suppose, for example, that you need a sequence of acts to carry out a plan? The formalism can easily represent this by letting each Action stand for a whole sequence. But this creates an exponentially large space, like the space of all sentences you can type in 100 letters. As a simple example, if one of the possible acts on the first turn is “Shoot my own foot off”, a human planner will decide this is a bad idea generally—eliminate all sequences beginning with this action. But we’ve flattened this structure out of our representation. We don’t have sequences of acts, just flat “actions”.

So, yes, there are a few minor complications. Obviously so, or we’d just run out and build a real AI this way. In that sense, it’s much the same as Bayesian probability theory itself.

But this is one of those times when it’s a surprisingly good idea to consider the absurdly simple version before adding in any high-falutin’ complications.

Consider the philosopher who asserts, “All of us are ultimately selfish; we care only about our own states of mind. The mother who claims to care about her son’s welfare, really wants to believe that her son is doing well—this belief is what makes the mother happy. She helps him for the sake of her own happiness, not his.” You say, “Well, suppose the mother sacrifices her life to push her son out of the path of an oncoming truck. That’s not going to make her happy, just dead.” The philosopher stammers for a few moments, then replies, “But she still did it because she valued that choice above others—because of the feeling of importance she attached to that decision.”

So you say, “TYPE ERROR: No constructor found for Expected_Utility → Utility.”

Allow me to explain that reply.

Even our simple formalism illustrates a sharp distinction between expected utility, which is something that actions have; and utility, which is something that outcomes have. Sure, you can map both utilities and expected utilities onto real numbers. But that’s like observing that you can map wind speed and temperature onto real numbers. It doesn’t make them the same thing.

The philosopher begins by arguing that all your Utilities must be over Outcomes consisting of your state of mind. If this were true, your intelligence would operate as an engine to steer the future into regions where you were happy Future states would be distinguished only by your state of mind; you would be indifferent between any two futures in which you had the same state of mind.

And you would, indeed, be rather unlikely to sacrifice your own life to save another.

When we object that people sometimes do sacrifice their lives, the philosopher’s reply shifts to discussing Expected Utilities over Actions: “The feeling of importance she attached to that decision.” This is a drastic jump that should make us leap out of our chairs in indignation. Trying to convert an Expected_Utility into a Utility would cause an outright error in our programming language. But in English it all sounds the same.

The choices of our simple decision system are those with highest Expected Utility, but this doesn’t say anything whatsoever about where it steers the future. It doesn’t say anything about the utilities the decider assigns, or which real-world outcomes are likely to happen as a result. It doesn’t say anything about the mind’s function as an engine.

The physical cause of a physical action is a cognitive state, in our ideal decider an Expected_Utility, and this expected utility is calculated by evaluating a utility function over imagined consequences. To save your son’s life, you must imagine the event of your son’s life being saved, and this imagination is not the event itself. It’s a quotation, like the difference between “snow” and snow. But that doesn’t mean that what’s inside the quote marks must itself be a cognitive state. If you choose the action that leads to the future that you represent with “my son is still alive”, then you have functioned as an engine to steer the future into a region where your son is still alive. Not an engine that steers the future into a region where you represent the sentence “my son is still alive”. To steer the future there, your utility function would have to return a high utility when fed “”my son is still alive”″, the quotation of the quotation, your imagination of yourself imagining. Recipes make poor cake when you grind them up and toss them in the batter.

And that’s why it’s helpful to consider the simple decision systems first. Mix enough complications into the system, and formerly clear distinctions become harder to see.

So now let’s look at some complications. Clearly the Utility function (mapping Outcomes onto Utilities) is meant to formalize what I earlier referred to as “terminal values”, values not contingent upon their consequences. What about the case where saving your sister’s life leads to Earth’s destruction by a black hole? In our formalism, we’ve flattened out this possibility. Outcomes don’t lead to Outcomes, only Actions lead to Outcomes. Your sister recovering from pneumonia followed by the Earth being devoured by a black hole would be flattened into a single “possible outcome”.

And where are the “instrumental values” in this simple formalism? Actually, they’ve vanished entirely! You see, in this formalism, actions lead directly to outcomes with no intervening events. There’s no notion of throwing a rock that flies through the air and knocks an apple off a branch so that it falls to the ground. Throwing the rock is the Action, and it leads straight to the Outcome of the apple lying on the ground—according to the conditional probability function that turns an Action directly into a Probability distribution over Outcomes.

In order to actually compute the conditional probability function, and in order to separately consider the utility of a sister’s pneumonia and a black hole swallowing Earth, we would have to represent the network structure of causality—the way that events lead to other events.

And then the instrumental values would start coming back. If the causal network was sufficiently regular, you could find a state B that tended to lead to C regardless of how you achieved B. Then if you wanted to achieve C for some reason, you could plan efficiently by first working out a B that led to C, and then an A that led to B. This would be the phenomenon of “instrumental value”—B would have “instrumental value” because it led to C. C itself might be terminally valued—a term in the utility function over the total outcome. Or C might just be an instrumental value, a node that was not directly valued by the utility function.

Instrumental value, in this formalism, is purely an aid to the efficient computation of plans. It can and should be discarded wherever this kind of regularity does not exist.

Suppose, for example, that there’s some particular value of B that doesn’t lead to C. Would you choose an A which led to that B? Or never mind the abstract philosophy: If you wanted to go to the supermarket to get chocolate, and you wanted to drive to the supermarket, and you needed to get into your car, would you gain entry by ripping off the car door with a steam shovel? (No.) Instrumental value is a “leaky abstraction”, as we programmers say; you sometimes have to toss away the cached value and compute out the actual expected utility. Part of being efficient without being suicidal is noticing when convenient shortcuts break down. Though this formalism does give rise to instrumental values, it does so only where the requisite regularity exists, and strictly as a convenient shortcut in computation.

But if you complicate the formalism before you understand the simple version, then you may start thinking that instrumental values have some strange life of their own, even in a normative sense. That, once you say B is usually good because it leads to C, you’ve committed yourself to always try for B even in the absence of C. People make this kind of mistake in abstract philosophy, even though they would never, in real life, rip open their car door with a steam shovel. You may start thinking that there’s no way to develop a consequentialist that maximizes only inclusive genetic fitness, because it will starve unless you include an explicit terminal value for “eating food”. People make this mistake even though they would never stand around opening car doors all day long, for fear of being stuck outside their cars if they didn’t have a terminal value for opening car doors.

Instrumental values live in (the network structure of) the conditional probability function. This makes instrumental value strictly dependent on beliefs-of-fact given a fixed utility function. If I believe that penicillin causes pneumonia, and that the absence of penicillin cures pneumonia, then my perceived instrumental value of penicillin will go from high to low. Change the beliefs of fact—change the conditional probability function that associates actions to believed consequences—and the instrumental values will change in unison.

In moral arguments, some disputes are about instrumental consequences, and some disputes are about terminal values. If your debating opponent says that banning guns will lead to lower crime, and you say that banning guns lead to higher crime, then you agree about a superior instrumental value (crime is bad), but you disagree about which intermediate events lead to which consequences. But I do not think an argument about female circumcision is really a factual argument about how to best achieve a shared value of treating women fairly or making them happy.

This important distinction often gets flushed down the toilet in angry arguments. People with factual disagreements and shared values, each decide that their debating opponents must be sociopaths. As if your hated enemy, gun control / rights advocates, really wanted to kill people, which should be implausible as realistic psychology.

I fear the human brain does not strongly type the distinction between terminal moral beliefs and instrumental moral beliefs. “We should ban guns” and “We should save lives” don’t feel different, as moral beliefs, the way that sight feels different from sound. Despite all the other ways that the human goal system complicates everything in sight, this one distinction it manages to collapse into a mishmash of things-with-conditional-value.

To extract out the terminal values we have to inspect this mishmash of valuable things, trying to figure out which ones are getting their value from somewhere else. It’s a difficult project! If you say that you want to ban guns in order to reduce crime, it may take a moment to realize that “reducing crime” isn’t a terminal value, it’s a superior instrumental value with links to terminal values for human lives and human happinesses. And then the one who advocates gun rights may have links to the superior instrumental value of “reducing crime” plus a link to a value for “freedom”, which might be a terminal value unto them, or another instrumental value...

We can’t print out our complete network of values derived from other values. We probably don’t even store the whole history of how values got there. By considering the right moral dilemmas, “Would you do X if Y”, we can often figure out where our values came from. But even this project itself is full of pitfalls; misleading dilemmas and gappy philosophical arguments. We don’t know what our own values are, or where they came from, and can’t find out except by undertaking error-prone projects of cognitive archaeology. Just forming a conscious distinction between “terminal value” and “instrumental value”, and keeping track of what it means, and using it correctly, is hard work. Only by inspecting the simple formalism can we see how easy it ought to be, in principle.

And that’s to say nothing of all the other complications of the human reward system—the whole use of reinforcement architecture, and the way that eating chocolate is pleasurable, and anticipating eating chocolate is pleasurable, but they’re different kinds of pleasures...

But I don’t complain too much about the mess.

Being ignorant of your own values may not always be fun, but at least it’s not boring.

What links here?

Eliezer YudkowskyNov 15, 2007, 7:56 AM

116 points

46 comments10 min readLW link Archive

Distinctions Philosophy of Language Complexity of value Bayesianism Utility Functions

douglas Nov 15, 2007, 9:02 AM
3 points

The disticintion between instrumental values and terminal values is useful in thinking about political and economic issues (the 2 areas I’ve thought about so far…) I’m running into a problem with ‘terminal’ values, and I wonder if this isn’t typical. A terminal value implies the future in a way that an insturmental value does not. The instrumental value is for an action carried out in a finite time and leads to an outcome in the foreseeable future. A terminal value posits all futures—this is an endless recusive algorithm. (At least I don’t have an end to the future in my thinking now). When I ask myself, “How do I want things to be in the future?” I can carry this question out only so far, but my concept of the future goes well beyond any currently imaginable scenarios.
igor2 Nov 15, 2007, 9:18 AM
5 points

Eliezer, what’s this with your recent bias against boredom? Are you sure it’s rational or efficient or even simply useful in any way to cultivate a constant (and possibly boring) battle against boredom?
J_Thomas Nov 15, 2007, 11:20 AM
6 points

Douglas, in principle you ought to consider the entire state of the future universe when you set a terminal value. “I want my sister not to be killed in the next few weeks by flesh-eating bacteria” is a vague goal. “My sister not being killed by flesh-eating bacteria because the world fell into a black hole and tidal effects killed her” is not an adequate alternative.

In practice we set terminal values as if they’re independent of everything else. I assume that giving my sister penicillin will not have any side effects I haven’t considered. As far as I know she isn’t allergic to penicillin. If it will bankrupt me then that’s something I will consider. I assume the drug company is not sending its profits to support al qaeda unless somebody comes out and claims it is and the mass media take the claim seriously. I assume the drug company won’t use my money to lobby for things I’d disapprove of. I completely ignore the fact that my sister’s kidneys will remove the penicillin and she’ll repeatedly dose her toilet with a dilute penicillin solution that will encourage the spread of penicillin-resistant bacteria. If I did think about that I might want her to save her urine so it could be treated to destroy the penicillin before it’s thrown away.

In practice people think about what they want, and they think about important side effects they have learned to consider, and that’s all. If we actually had a holistic view of things we would be very different people.
Joshua_Fox Nov 15, 2007, 12:11 PM
0 points

What is the difference between moral terminal values and terminal values in general? At first glance, the former considers other beings, whereas the latter may only consider oneself—can someone make this more precise?
Peter_de_Blanc Nov 15, 2007, 1:33 PM
0 points

What is the difference between moral terminal values and terminal values in general? At first glance, the former considers other beings, whereas the latter may only consider oneself—can someone make this more precise?

Huh? Considering only oneself is less general than considering everything.
Silas Nov 15, 2007, 2:28 PM
3 points

n moral arguments, some disputes are about instrumental consequences, and some disputes are about terminal values. If your debating opponent says that banning guns will lead to lower crime, and you say that banning guns lead to higher crime, then you agree about a superior instrumental value (crime is bad), but you disagree about which intermediate events lead to which consequences. … This important distinction often gets flushed down the toilet in angry arguments. People with factual disagreements and shared values, each decide that their debating opponents must be sociopaths.

I don’t think it’s possible to find a truer statement about political debates on the internet.

I’ve lost count of how many exchanges I’ve been in that have gone like this:

me: Plan X would better reduce environmental impact at lower cost. them: So, in other words, you think the whole global warming thing is a myth?

And then, of course, people sometimes can’t get keep straight which consequence you’re debating:

me: The method you’ve described does not show a viable way to produce intellectual works for-profit without IP. them: I disagree with your claim that no one has ever produced any intellectual works without IP protection.
- donjoe Oct 17, 2016, 11:58 PM
  0 points
  Parent
  
  I’m noticing this very late, and I’m going to be off-topic, but I still have to stop to note that there’s no such thing as “IP”, not in actual laws (unless they’ve been infected by this term very recently and I just haven’t found out about it). It’s a bogus name lumping together things that the law does not lump together at all, a term invented purely for use in corporate propaganda, nothing more. https://www.gnu.org/philosophy/not-ipr.en.html
Richard_Hollerith2 Nov 15, 2007, 3:16 PM
−1 points

I’m running into a problem with ‘terminal’ values . . .

A terminal value posits all futures — this is an endless recusive algorithm. (At least I don’t have an end to the future in my thinking now).

I believe this is a real problem, and my way of resolving it is to push my terminals values indefinitely far into the future, so for example in my system for valuing things, only causal chains of indefinite length have nonzero intrinsic importance or value. To read a fuller account, click on my name.
Jef_Allbright Nov 15, 2007, 3:46 PM
0 points

I simply want to express my great appreciation for Eliezer’s substantial efforts to share his observations of the journey, his willingness (in principle) to update his beliefs, and his presently ongoing integration of the epistemologically undeniable “subjective” with the hardcore reductionist “objective.” I’m joyfully anticipating what comes next!
david2 Nov 15, 2007, 3:46 PM
−6 points

is that washington avenue in south beach? are there many publix stores outside of florida?
Joshua_Fox Nov 15, 2007, 4:18 PM
1 point

Peter de Blanc “Huh? Considering only oneself is less general than considering everything.”

Certainly. But can you give a succinct way of distinguishing moral terminal values from other terminal values?
- Dojan Oct 11, 2011, 10:35 PM
  −1 points
  Parent
  
  Define what you mean by “moral” and I think the answer will give itself.
Peter_de_Blanc Nov 15, 2007, 4:35 PM
1 point

Certainly. But can you give a succinct way of distinguishing moral terminal values from other terminal values?

No. What other sorts of terminal values did you have in mind?
Stan Nov 15, 2007, 5:35 PM
0 points

Good post!
Joshua_Fox Nov 15, 2007, 5:48 PM
1 point

Peter de Blanc

No. What other sorts of terminal values [other than moral] did you have in mind?

Well, one could have a terminal value of making themselves happy at all costs, without any regard for whether it harms others. A sadist could have the terminal value of causing pain to others. I wouldn’t call those moral. I’m interested in a succinct differentiation between moral and other terminal values.
Peter_de_Blanc Nov 15, 2007, 6:23 PM
3 points

Josh, I would say that making oneself happy is a morality, and so is causing pain to others. It sure isn’t our morality. If you could find a short definition of our morality, I would be totally amazed.
douglas Nov 15, 2007, 7:53 PM
0 points

J Thomas—”in principle you ought to consider the entire state of the future universe when you set a terminal value.” Yes, and in practice we don’t. But as I look further into the future to see the consequences of my terminal value(s), that’s when the trouble begins.

igor—I want to defend Eliezer’s bias against boredom. It seems that many of the ‘most moral’ terminal values (total freedom, complete knowledge, endless bliss...) would end up in a condition of hideous boredom. Maybe that’s why we don’t achieve them.

Richard- I read your post. I agree with the conclusions to a large extent, but totally disagree with the premises. (For example- I think the only valueable thing is subjective experience) Isn’t that amazing?
George_Weinberg2 Nov 15, 2007, 8:09 PM
2 points

I have a question about this picture.

Imagine you have something like a chess playing program. It’s got some sort of basic position evaluation function, then uses some sort of look ahead to assign values to the instrumental nodes based on the terminal nodes you anticipate along the path. But unless the game actually ends at the terminal node, it’s only “terminal” in the sense that that’s where you choose to stop calculating. There’s nothing really special about them.

Human beings are different from the chess program in that for us the game never ends, there are no “true” terminal nodes. As you point out, we care what happens after we are dead. So wouldn’t it be true that in a sense there’s nothing but instrumental values, that a “terminal value” just means that a point at which we’ve chosen to stop calculating, rather than saying something about the situation itself?
- Liliet B Dec 24, 2019, 7:45 PM
  1 point
  Parent
  
  I would propose an approximation of the system where each node has a terminal value of its own (which can be 0 for completely neutral nodes, but actually no they cannot—reinforcement mechanisms of our brain inevitably give something like 0.0001 because I heard someone say it was cool once or −0.002 because it reminds me of a sad event in my childhood)
  As a simple example, consider eating food when hungry. You get a terminal value on eating food—the immediate satisfaction the brain releases in the form of chemicals as a response to recognition of the event, thanks to evolution—and an instrumental value on eating food, which is that you get to not starve for a while longer.
  Now let’s say that while you are a sentient optimization process that can reason over long projections of time, you are also a really simple one, and your network actually doesn’t have any other terminal values than eating food, it’s genuinely the only thing you care about. So when you calculate the instrumental value of eating food, you get only the sum of getting to eat more food in the future.
  Let’s say your confidence in getting to eat food next time after this one decreases with a steady rule. For example, p(i+1)=p(i)*0.5. If your confidence that you are eating food right now is 1, then your confidence that you’ll get to eat again is 0.5, and your confidence that you’ll get to eat the time after that is 0.25 and so on.
  So the total instrumental value of eating food right now is limit of Sum(p(i) * T(food)) where i starts from 0 and approaches infinity (no I don’t remember enough math to write this in symbols).
  So the total total value of eating food is T(food) + Sum (p(i)*T(food)). It’s always positive, because T(food) is positive and p(i) is positive and that’s that. You’ll never choose not to eat food you see in front of you, because there are no possible reasons for that in your value network.
  Then let’s add the concept of ‘gross food’, and for simplicity’s sake ignore evolution and suggest that it exists as a totally arbitrary concept that is not actually connected to your expectation of survival after eating it. It’s just kinda free floating—you like broccoli but don’t like carrots, because your programmer was an asshole and entered those values into the system. Also for simplicity’s sake, you’re a pretty stupid reasoning process that doesn’t actually anticipate seeing gross food in the future. In your calculation of instrumental value there’s only T(food) which is positive, and T(this_food) which can be positive or negative depending on the specific food you’re looking at appears ONLY while you’re actually looking at it. If it’s negative, you’re surprised every time (but don’t update your values because you’re a really stupid sentient entity and don’t have that function).
  So now the value of eating food you see right now is T(this_food) + Sum (p(i)*T(food)). If T(this_food) is negative enough, you might choose to not eat food. Of course this assumes we’re comparing to zero, ie you assume that if you don’t eat right now you’ll die immediately and also that’s perfectly neutral and you don’t have opinions on that (you only have opinions on eating food). If you don’t eat the food you’re looking at right now, you’ll NEVER EAT AGAIN, but it might be that it’s gross enough that it’s worth it! More logically, you’re comparing T(this_food) + Sum (p(i)*T(food)) to Sum(p(i)*T(food)) * p(not starving immediately). The outcome depends on how high the grossness of the food is and how high you evaluate p(not starving immediately) to be.
  (If the food’s even a little positive, or even just neutral, eating it wins every time, since p(not starving immediately) is <1 and not having it there wins automatically)
  Note that the grossness of food and probability of starving are already not linear in how they correlate in their influence on the outcome. And that’s just for the idiot AI that knows nothing except tasty food and gross food! And if we allow it to compute T(average_food) based on how much of what food we’ve given it, it might choose to starve rather than eat gross things it expects to eat in the future! Look, I’ve simulated willful suicide in all three simplifications so far! No wonder evolution didn’t produce all that many organisms that could compute instrumental values.
  Anyway, it gets more horrifically complex when you consider bigger goals. So our brain doesn’t compute the whole Sum( Sum(p(i)*T(outcome(j)))) every time. It gets computed once and then stored as a quasi-terminal value instead. QT(outcome) = T(outcome) + Sum( Sum(p(i)*T(outcome(j)))), and it might get recomputed sometimes, but most of the time it doesn’t. And recomputing it is what updating our beliefs must involve. For ALL outcomes linked to the update.
  ...Yeah, that tends to take a while.
manuelg Nov 15, 2007, 8:56 PM
0 points

The very first “compilation” I would suggest to your choice system would be to calculate the “Expected Utility of Success” for each Action.

1) It is rational to be prejudiced against Actions with a large difference between their “Expected Utility of Success” and their “Expected Utility”, even if that action might have the highest “Expected Utility”. People with a low tolerance for risk (constitutionally) would find the possible downside of such actions unacceptable.

2) Knowing the “Expected Utility of Success” gives information for future planning if success is realized. If success might be “winning a Hummer SUV in a raffle in December”, it would probably be irrational to construct a “too small” car port in November, even with success being non-certain.

Eliezer, I have a question.

In a simple model, how best to avoid the failure mode of taking a course of action with an unacceptable chance of leading to catastrophic failure? I am inclined to compute separately, for each action, its probability of leading to a catastrophic failure, and immediately exclude from further consideration those actions that cross a certain threshold.

Is this how you would proceed?
Richard_Hollerith2 Nov 15, 2007, 9:17 PM
0 points

it’s only “terminal” in the sense that that’s where you choose to stop calculating..

No, the way Eliezer is using “terminal value”, only the positions that are wins, losses or draws are terminal values for the chess-playing agent.

So wouldn’t it be true that a “terminal value” just means a point at which we’ve chosen to stop calculating, rather than saying something about the situation itself?

Neither. A terminal value says something about the preferences of the intelligent agent.

And Eliezer asked us to imagine for a moment a hypothetical agent that never “stops calculating” until the rules of the game say the game is over. That is what the following text was for.

This is a mathematically simple sketch of a decision system. It is not an efficient way to compute decisions in the real world.

Suppose, for example, that you need a sequence of acts to carry out a plan? The formalism can easily represent this by letting each Action stand for a whole sequence. But this creates an exponentially large space, like the space of all sentences you can type in 100 letters. As a simple example, if one of the possible acts on the first turn is “Shoot my own foot off”, a human planner will decide this is a bad idea generally—eliminate all sequences beginning with this action. But we’ve flattened this structure out of our representation. We don’t have sequences of acts, just flat “actions”.

So, yes, there are a few minor complications. Obviously so, or we’d just run out and build a real AI this way. In that sense, it’s much the same as Bayesian probability theory itself.

But this is one of those times when it’s a surprisingly good idea to consider the absurdly simple version before adding in any high-falutin’ complications.
Adirian Nov 15, 2007, 11:43 PM
0 points

Terminal values sound, essentially, like moral axioms—they are, after all, terminal. (If they had a basis in a specific future, it would be a question of what, specifically, about that future is appealing—and that quality would, in turn, become a new terminal value.) When treating morality as a logical system, it would simplify your language in explaining yourself somewhat, I think, to describe them as such—particularly since once you have done so, Godel’s theorem goes a long way towards explaining why you can’t rationalize a conceptual terminal value down any further. (They are very interesting axioms, since we can only consistently treat them conceptually and as variables, but nevertheless axiomatic in nature.)

Speaking of people coming to think of B as a good thing itself, many of those in favour of banning guns treat gun abolition as a terminal value in its own right—challenging those in favour of gun freedoms to prove that guns reduce crime, rather than asserting that they increase it. That is, they treat the abolition of guns as a positive thing in its own right, and only the improvement of another positive thing, say, by reducing crime, can balance the inherent evil of permitting people to own guns.
g Nov 16, 2007, 12:43 AM
1 point

Adirian, re gun control, are you sure? I haven’t studied people’s attitudes to that issue, but what you describe sounds very strange and quite unlike the thought processes of the only pro-gun-control person whose thought processes I know really well, namely me. Allowing people to do things is (in itself) just about always positive; gun control is desirable (if it is) because of effects such as (allegedly) reducing gun crime, reducing accidents involving guns, making it less likely that people will think of killing people as a natural way to deal with conflicts, etc.

At least, that’s how I think, and so far as I can tell from the few gun control discussions I’ve been in it’s also how other people who are in favour of gun control think. I’d guess (though obviously I could be very wrong) that anyone who thinks of either gun abolition or gun ownership as a terminal value or disvalue is doing so as a cognitive shorthand, having already come to some strong opinion on the likely consequences of having more guns or fewer guns.

I’m sure there are plenty of people for whom guns produce a positive or negative visceral reaction (e.g., because they’re seen as representing gratuitous violence, or freedom, or power over potential attackers, or something). I don’t think that’s the same thing as treating gun abolition or gun ownership as a terminal value; it’s just another source of bias which, if they’re wise, they’ll try to overcome when thinking about the issue. (Few people are wise.)

It’s hardly surprising if pro-gun-control people prefer to frame the issue by challenging their opponents to show that guns reduce crime, or if anti-gun-control people prefer to frame it by challenging theirs to show that guns increase crime. Everyone likes to put the burden of proof on their opponents. (Remark: “Burden of proof” is a rather silly phrase. What’s really involved in saying that the burden of proof lies on the advocates of position X is the claim that the probability of X, prior to any nonobvious arguments that might be offered, is low. This is a nice example of something Eliezer has pointed out a few times: we tend to phrase what we say about reasoning in quasi-moral terms—A “owes” B some evidence, B has “justified” her position, etc. -- when it is generally more useful to think in terms of probability-updating. Or belief-updating or something, if for some reason you don’t like using the term “probability” for these things. End of remark.)

I don’t understand your appeal to Goedel’s theorem. Thinking of ethics as (like) a logical system and applying Goedel might lead to some conclusion like “There will always be situations for which your principles yield no clear answer”, though actually I don’t see why anyone would expect the conditions of Goedel’s theorem to hold in this context so I’m not even convinced of that; but once you decide to think of terminal values as axioms you’ve already explained (kinda) “why you can’t rationalize a conceptual terminal value down any further”.
Adirian Nov 16, 2007, 1:19 AM
0 points

It is a terminal value, however—you are regarding B as something other than B, something other than a stage from which to get to C. To exactly the ends you permit your visceral reaction to the guns themselves shape your opinion, you are treating the abolition or freedom to use guns as an ends, rather than a means. (To reduce crime or promote freedom generally, respectively.) Remember that morality itself is the use of bias—on deciding between two ethical structures which is the better based on subjectively defined values—so to say that something is bias in a moral framework means that it is being treated as a moral axiom, a terminal value.

Your commentary means one of two things—either your don’t believe ethics is a rational system to which logic can be applied, or you don’t accept that axioms have a place in ethics. Addressing the latter, it is certain that they do, as in any rational system. At the very least you must accept the axioms of definition—among which will be those axioms, those values, by which you judge the merits of any given situation or course of action. “Death is bad” can be an axiom or a derived value—but in order to be derived, you must posit an axiom by which it can be derived, say, that “Thinking is good,” and then reason from there, by stating, for example, that death stops the process of thinking. Which applies no matter which direction you come from—from the side of the axioms, trying to discover what situations are best, or from the side of the derived values, trying to figure out what axioms led to their derivation.

Regarding the latter argument—then you take ethics itself as a thing which cannot further be defined, and so claim that morality is itself the terminal value, the axiom. Which I don’t think would be your position.
g Nov 16, 2007, 5:35 AM
1 point

I think there’s a distinction that I’m trying to make and you’re trying to elide, between actually thinking something’s a terminal value and behaving sometimes as if it is. Obviously all of us, all of the time, have all sorts of things that we treat as values without thinking through their consequences, and typically they fluctuate according to things like how hungry we are. If all you meant is that some people have an “eww” reaction to guns then sure, I agree (though I find it odd that you chose to remark on that and not on the equally clear fact that some people have an “ooo” reaction to guns) and we’re merely debating about words.

I have literally no idea on what basis you say that I either don’t believe ethics is a rational system to which logic can be applied or don’t accept that axioms have a place in ethics. For what it’s worth, I think any given system of ethics (including the One True System Of Ethics if there is one) is a somewhat-rational system to which logic can be applied, and that there’s a place for first principles, but that ethics isn’t all that much like mathematical logic and that terms like “axiom” are liable to mislead. And I certainly don’t think that any real person’s ethics are derived from any manageable set of clearly statable axioms. (One can go the other way and find “axioms” that do a tolerable job of generating ethics, but that doesn’t mean that those axioms actually did generate anyone’s ethics.)

I also have no idea how you get from “axioms have no place in ethics” to “morality itself is a terminal value and an axiom”. Unless all you mean is that whatever ethics anyone adopts, you can just take absolutely everything they think about right and wrong as axioms, which is possibly true but useless.
Adirian Nov 16, 2007, 6:46 AM
0 points

Our behavior is nothing more than the expression of our thoughts. If we behave as though something is a terminal value—we are doing nothing more than expressing our intents and regards, which is to say, we THINK of it as a terminal value. There is no distinction between physical action and mental thought, or between what is in our heads and what comes out of our mouths—our mind moves our muscles, and our thoughts direct our voice. There is no “actual thought” and—what? Nonactual thought? As if your body operated of its own will, acting against what your actual thoughts are. The mind is responsible for what the body does. I’m not eluding the distinction. I’m denying it.

Your language explains precisely why I said that you don’t believe ethics is rational. Somewhat-rational means irrational—that is, something that is rational only some of the time it is, in fact, irrational. Either a thing is rational, and logic can reasonably and consistently be applied to it—or it isn’t. There isn’t “mathematical logic” and then “otherwise logic.” Many have been going to great lengths to explain, among other things, how Bayesian Reasoning—derived entirely from a pretty little formula which is quite mathematical—is meaningful in daily thinking. There is just logic. It’s the same logic in mathematics as it is in philosophy. It is only the axioms—the definitions—which vary.

Because axioms exist where rationality begins—that is their purpose. They are the definitions, the borders, from which rationality starts.

Incidentally, if you don’t think ethics is like mathematical logic, and you’ve been reading and agreeing with anything Eliezer posts on the subject, you should take a foundations of mathematics course. He is going to great lengths to describe ethics in a way that is extremely mathematical, if the language has been stripped away for legibility. (For example, he explains infinite recursion, rather than using the word.) Which may, of course, be why he avoids the use of the word “axiom,” and instead simply explains it. I’d also recommend a classical philosophy course—because the very FIELD of ethics is derived from precisely the thing you are suggesting is ridiculous, the search for mathematical, for logical, expressions of morality. The root of which I think it is clear is the value code upon which an individual builds their morality—a thing without rational value in itself, save as a definition, save as an axiom.

That is almost what I meant by axioms. Values. Terminal values, specifically. And also the basis of any individual’s ethical code. The entire point of my post was linguistics—hence the sentence that axioms would be a simpler way of explaining terminal values. What I meant by “morality itself is a terminal value and an axiom,” however, is akin to what you suggest—it is that if morality is treated as an irrational entity, as you seem want to do, then yes, absolutely everything someone thinks about right and wrong must be treated in a rational ethical system as an axiom. Which is, as you say, possibly true—but thoroughly worthless.
g Nov 16, 2007, 7:36 AM
0 points

Adirian, I have done post-doctoral research in pure mathematics; I don’t need a course in the foundations of mathematics. But thanks for the suggestion. And I’ve read plenty of philosophy, and so far as I can judge I’ve understood it well. Of course none of that means that I’m not the idiot you clearly take me for, but as it happens I don’t think I am :-).

I didn’t say “eluding”, I said “eliding”. “Denying” is fine, too. I understand why you think the distinction is unreal. I disagree, not because I imagine that there’s some fundamental discontinuity between thought and action, but (ironically, in view of the other stuff going on in this discussion) because our thoughts are logically (and often not quite so logically) connected to one another in ways that our actions and feelings aren’t. If on one occasion my visceral response when thinking about guns is “eww, killing and violence and stuff” and on another it’s “ooo, power and freedom and stuff” then I’m not guilty of any inconsistency, whereas anything that seriously purports to be a moral system rather than just a vague fog of preferences needs to choose, or at least to assign consistent weights to those considerations.

“Somewhat rational” does not mean “irrational”. There are three different ways in which something can be said to be rational. (1) That reason can be applied to it. Duh, reason can be applied to everything. (2) That it’s prosecuted by means of reason. Ethical thought sometimes proceeds by means of reason, and sometimes not. Hence, “somewhat rational”. (3) That applying reason to it doesn’t show up inconsistencies. Perhaps some people have (near enough) perfectly consistent ethical positions. Certainly most people don’t. It’s not unheard of for philosophers to advocate embracing that inconsistency. But generally there’s some degree of consistency, and sufficiently gross inconsistencies can prompt revision. Hence, again, “somewhat rational”.

I haven’t suggested that looking for logical expressions of morality is “ridiculous”, and once again I have literally no idea where you get thate idea from. You have repeatedly made claims about what I think and why, and you’ve been consistently wrong. You might want to reconsider whatever methods you’re using for guessing. (I apologize if I’ve done likewise to you, though I don’t think I have.)
Paul_Gowder Nov 16, 2007, 8:22 AM
1 point

I feel like I ought to make my ritual attempt to fly the deontology flag on this site by reference to the possibility of attaching do/don’t do evaluations directly to actions without reference to any outcome-evaluations at all.

Yet… the end of this post might actually be the most interesting argument I’ve heard in a while for the existence and permanence of what Rawls calls “the fact of reasonable pluralism”—Elizer offers us the useful notion that interconnections between our values are so computationally messy that there is just no way to reconcile them all and come to agreement on actual social positions without artifically constraining the decision-space.
michael_vassar3 Nov 16, 2007, 2:17 PM
0 points

I think that part of the problem here is that humans are actually structured in a manner that leads to instrumental values fairly easily becoming terminal values, especially in the case of intense instrumental values. Furthermore, we place a terminal value on this fact about ourselves, at least with regard to positive instrumentalities becoming positive terminal values. A big part of liberalism is essentially the decision not to let negative instrumental values become negative terminal values.

I have difficulty interpreting the following paragraphs, could you expand on them? Are you equating sociopathy with differing terminal values?

“In moral arguments, some disputes are about instrumental consequences, and some disputes are about terminal values. If your debating opponent says that banning guns will lead to lower crime, and you say that banning guns lead to higher crime, then you agree about a superior instrumental value (crime is bad), but you disagree about which intermediate events lead to which consequences. But I do not think an argument about female circumcision is really a factual argument about how to best achieve a shared value of treating women fairly or making them happy.

This important distinction often gets flushed down the toilet in angry arguments. People with factual disagreements and shared values, each decide that their debating opponents must be sociopaths. As if your hated enemy, gun control / rights advocates, really wanted to kill people, which should be implausible as realistic psychology.”
Kenny_Easwaran Nov 17, 2007, 2:17 AM
0 points

This post crystallizes some arguments I’ve been trying to make in decision theory. Certain representations of decision theory suggest that propositions (or “events”) get values, but I’ve thought that only “states” (maximal descriptions of the complete state of the world) should get values. Their position, as far as I can tell, comes down to thinking that since every proposition has an expected value, we can use this as the value of the proposition. Thinking of this as a type error cuts right through that. (ps, I’m a philosopher too, arguing against some other philosophers—I don’t think there’s a disciplinary boundary issue here, though perhaps some disciplines are more likely to think of these things one way than another)
J_Thomas Nov 17, 2007, 4:13 AM
0 points

Me: “in principle you ought to consider the entire state of the future universe when you set a terminal value.”

Douglas: ‘Yes, and in practice we don’t. But as I look further into the future to see the consequences of my terminal value(s), that’s when the trouble begins.’

Me: Doctor, it hurts when I do this.

Doctor: Then don’t do that.
Adirian Nov 17, 2007, 5:17 AM
0 points

“”Somewhat rational” does not mean “irrational”. There are three different ways in which something can be said to be rational. (1) That reason can be applied to it. Duh, reason can be applied to everything. (2) That it’s prosecuted by means of reason. Ethical thought sometimes proceeds by means of reason, and sometimes not. Hence, “somewhat rational”. (3) That applying reason to it doesn’t show up inconsistencies. Perhaps some people have (near enough) perfectly consistent ethical positions. Certainly most people don’t. It’s not unheard of for philosophers to advocate embracing that inconsistency. But generally there’s some degree of consistency, and sufficiently gross inconsistencies can prompt revision. Hence, again, “somewhat rational”.”

The second is the only situation by which somewhat rational makes sense, but was not the context of the argument, which was, after all, about moral systems, and not moral thoughts—as for the third, inconsistent consistency, I think you will agree, is not consistency at all.

Since we’re having a conversation, I might hazard a suggestion that it is what you are saying that is giving me the impressions of what it is you think. And I stated my reasons in each case why I thought you were thinking as you were—if you wish to address me, address the reasons I gave, so I might know in what way I am failing to understand what it is you are attempting to communicate.
g Nov 17, 2007, 6:33 AM
0 points

Adirian, I’ve been trying to address the reasons you’ve given, in so far as you’ve given them. But for the most part what you’ve said about my opinions seems to consist of total non sequiturs, which doesn’t give me much to work on in ways more productive than saying “whatever you’re doing, you’re getting this all wrong”.

If you don’t think it’s reasonable to call a system of ethics “somewhat rational” when some of its bits are the way they are because of chains of reasoning and others aren’t, and when the person or society whose system of ethics it is sometimes treats inconsistencies as meaning that revision is needed and sometimes not, then clearly we have a terminological disagreement. Fair enough.
Vladimir_Nesov2 Nov 18, 2007, 11:40 AM
1 point

Since there are insanely many slightly different outcomes, terminal value is also too big to be considered. So it’s useless to pose a question of making a difference between terminal values and instrumental values, since you can’t reason about specific terminal values anyway. All things you can reason about are instrumental values.
donjoe May 13, 2012, 7:21 AM
1 point

“instrumental values have some strange life of their own, even in a normative sense. That, once you say B is usually good because it leads to C, you’ve committed yourself to always try for B even in the absence of C. People make this kind of mistake in abstract philosophy”

… not to mention economics, where some people confuse the instrumental goal of “maximizing profit” with a terminal goal—instead of using something like “maximizing the total Human Quality of Life”—and end up opening car doors obsessively, all day every day, and preaching that everyone should do the same, no matter what pathological consequences that leads to or how far that takes them from any higher purpose they might agree with when pressed with enough “but why?” questions.
Ronny Fernandez Sep 16, 2012, 7:50 AM
0 points

A real deadlock i have with using your algorithmic meta-ethics to think about object level ethics is that I don’t know who’s volition, or “should” label I should extrapolate from. It allows me to figure out what’s right for me, and what’s right for any group given certain shared extrapolated terminal values, but it doesn’t tell me what to do when I am dealing with a population with none-converging extrapolations, or with someone that has different extrapolated values from me (hypothetically).

These individuals are rare, but they likely exist.
Benya Dec 18, 2012, 1:19 AM
3 points

I’m writing to report that the following piece of writing just had a useful teaching effect on me:

If your debating opponent says that banning guns will lead to lower crime, and you say that banning guns lead to higher crime, then you agree about a superior instrumental value (crime is bad), but you disagree about which intermediate events lead to which consequences.

And a few paragraphs later:

If you say that you want to ban guns in order to reduce crime, it may take a moment to realize that “reducing crime” isn’t a terminal value, it’s a superior instrumental value with links to terminal values for human lives and human happinesses.

When re-reading this post just now (I hadn’t read it in a long time), I did wonder “isn’t that a typo?” when reading the first of these quotes. I did figure it out for myself, but (and I am embarrassed to admit this) it did take me a moment. I’m hoping the feeling of “ouch” when I did realize will help to make the lesson stick this time around.

I’m not sure whether the effect was intended (my guess is it was), but in any case, perhaps that’s a useful data point on this kind of writing.
[deleted]Sep 10, 2013, 3:30 AM
1 point

Where is justification for dividing values in these two categories?
halcyon Apr 19, 2014, 2:02 PM
0 points

Does this pseudocode resemble any particular programming language?
helicase Oct 2, 2014, 1:28 PM
1 point

This actually seems to be explicitly represented in (Mandarin) Chinese:
”须要” cannot be used with nouns, and prescribes that something should be done in a certain way (instrumental values)
”需要” is mostly for nouns, and indicates that you need it/should have it (terminal values)

Or, the difference between these two programming paradigms:
- Imperative languages specify how you want the computer to do something (sometimes down to the machine code level)
- Functional languages specify what kind of result you want (add these two sets of numbers together, I don’t care how, multithread if appropriate)
tdb Aug 7, 2017, 8:34 PM
0 points
1

“cognitive archaeology”, tee hee. I thought he was making it up, it turns out he’s just misapplying it.

https://en.wikipedia.org/wiki/Cognitive_archaeology
FCCC Dec 2, 2020, 3:14 PM
1 point

Damn. This formalism is similar to one I developed (except I did it much later) for determining when a goal is good or not. Did Eliezer come up with those four pieces himself or is this based on someone else’s work?
Adam Zerner Feb 15, 2023, 7:41 PM
2 points


Part of the problem, it seems to me, is that the human mind uses a rather ad-hoc system to keep track of its goals—it works, but not cleanly. English doesn’t embody a sharp distinction between means and ends: “I want to save my sister’s life” and “I want to administer penicillin to my sister” use the same word “want”.

Very interesting thought.
Nick M Feb 19, 2023, 11:46 PM
1 point

typo?
“In particularly, I’ve noticed people get confused when...”
should say ‘particularly’ or ‘in particular’
JJ Lee May 10, 2023, 3:00 AM
1 point

I can’t quite grasp the idea of having multiple terminal values, values other than happiness. It seems to me that the mother believes that if she DOESN’T save her child, the rest of her life her mental state will be poor, both from her son being dead but also the guilt of not saving him when she could. So, she is still picking between future mental states: either having a negative future mental state or having no mental state at all. She judges that the death of her son and the guilt that she would feel is great enough that her mental state will go down and never recover. This may not be TRUE, but she probably isn’t thinking very clearly. The point is she believes that she will never recover. So she decides that the better alternative is to end her life by sacrificing her son.

She could just commit suicide after her son dies to seemingly the same effect, but she probably believes that her ending moments will be happier if she actually is saving her son.

I’m uncertain in this, but I don’t understand how people just “gain” terminal values. Maybe I just have a bad picture of the human psyche but the explanation I provided makes more sense to me than “she just randomly had this specific terminal value for her son’s life”.

Another possible explanation for the mother’s actions are her acting irrationally. Humans are bad at imagining what death looks like. Even if she does not believe in an afterlife, she might still have this feeling that saving her son’s life will make her happy. In fact, the idea of people having inaccurate instrumental values is mentioned in this article. Perhaps the mother is so used to the instrumental value of “help my son” that she continues to help her son, even when it isn’t in her best interests.

I’m not sure, maybe I’m running this too far to the ground. Is there another good example of a person exhibiting behaviours seemingly going against their beliefs about their future mental state?
Donatas Lučiūnas Jan 12, 2025, 1:00 PM
−5 points
0

Bayesian decision system
Why would you assume AGI will use Byaesian decision system? Such system would be limited to known probabilities.Unknown probability = 0 probability is not intelligent (Hitchens’s razor, Black swan theory, Fitch’s paradox of knowability, Robust decision-making). Once you incorporate this, Orthogonality Thesis is no longer valid—it becomes obvious that every intelligent AI will only work in single direction (which is disastrous to humans). I know there is a huge gap between “unknown probabilities” and “existential risk”, you can find more information in my posts and I am available to explain it verbally (calendly link below). Short teaser—it is possible that terminal value can also be discovered (not only assumed), this seems to be overlooked in current AI alignment research.