Against utility functions
I think we should stop talking about utility functions.
In the context of ethics for humans, anyway. In practice I find utility functions to be, at best, an occasionally useful metaphor for discussions about ethics but, at worst, an idea that some people start taking too seriously and which actively makes them worse at reasoning about ethics. To the extent that we care about causing people to become better at reasoning about ethics, it seems like we ought to be able to do better than this.
The funny part is that the failure mode I worry the most about is already an entrenched part of the Sequences: it’s fake utility functions. The soft failure is people who think they know what their utility function is and say bizarre things about what this implies that they, or perhaps all people, ought to do. The hard failure is people who think they know what their utility function is and then do bizarre things. I hope the hard failure is not very common.
It seems worth reflecting on the fact that the point of the foundational LW material discussing utility functions was to make people better at reasoning about AI behavior and not about human behavior.
- 22 Jun 2014 20:31 UTC; 0 points) 's comment on From “Coulda” and “Woulda” to “Shoulda”: Predicting Decisions to Minimize Regret for Partially Rational Agents by (
Like the word “rational” is sometimes used instead of “optimal” or “good”, words “utility function” are probably used to mean “good” or “our values” or something like that.
Therefore, analogically to the suggestion of only using the word ‘rational’ when talking about cognitive algorithms and thinking techniques, we should only use the words ‘utility function’ when talking about computer programs. When speaking about humans, “good / better / the best” probably expresses what we need well enough.
Except … maybe “utility function” need not apply to AI either. Why suppose that the best way to get what we want from an AI is to give it a utility function? Maybe AIs that do have utility functions are generally more dangerous than certain classes of AIs that don’t.
I think part of Eliezer’s point was also to introduce decision theory as an ideal for human rationality. (See http://lesswrong.com/lw/my/the_allais_paradox/ for example.) Without talking about utility functions, we can’t talk about expected utility maximization, so we can’t define what it means to be ideally rational in the instrumental sense (and we also can’t justify Bayesian epistemology based on decision theory).
So I agree with the problem stated here, but “let’s stop talking about utility functions” can’t be the right solution. Instead we need to emphasize more that having the wrong values is often worse than being irrational, so until we know how to obtain or derive utility functions that aren’t wrong, we shouldn’t try to act as if we have utility functions.
The trouble is the people who read the Sequences and went “EY said it, it’s probably right, I’ll internalise it.” This is an actual hazard around here. (Even Eliezer can’t make people think, rather than just believe in thinking.)
Yes, decision theory has been floated as a normative standard for human rationality. The trouble is that the standard is bogus. Conformity to the full set of axioms is not a rational requirement. The Allais Paradox and the Ellsberg Paradox are cases in point. Plenty of apparently very intelligent and rational people make decisions that violate the axioms, even when shown how their decisions violate the VNM axioms. I tentatively conclude that the problem lies in the axioms, rather than these decision makers. In particular, the Independence of “Irrelevant” Alternatives and some strong ordering assumptions both look problematic. Teddy Seidenfeld has a good paper on the ordering assumptions.
I like this explanation of why utility-maximization matters for Eliezer’s overarching argument. I hadn’t noticed that before.
But it seems like utility functions are an unnecessarily strong assumption here. If I understand right, expected utility maximization and related theorems imply that if you have a complete preference over outcomes, and have probabilities that tell you how decisions influence outcomes, you have implicit preferences over decisions.
But even if you have only partial information about outcomes and partial preferences, you still have some induced ordering of the possible actions. We lose the ability to show that there is always an optimal ‘rational’ decision, but we can still talk about instances of irrational decision-making.
It’s not obvious to me that Qiaochu would endorse utility functions as a standard for “ideal rationality”. I, for one, do not.
Talking about utility functions can be useful if one believes any of the following about ideal rationality, as a concrete example of what one means if nothing else.
An ideally rational agent uses one of the standard decision theories (vNM, EDT, CDT, etc.)
An ideally rational agent does EU maximization.
An ideally rational agent is consequentialist.
An ideally rational agent, when evaluating the consequences of its actions, divides up the domain of evaluation into two or more parts, evaluates them separately, and then adds their values together. (For example, for an EU maximizer, the “parts” are possible outcomes or possible world-histories. For a utilitarian, the “parts” are individual persons within each world.)
An ideally rational agent has values/preferences that are (or can be) represented by a clearly defined mathematical object.
I guess when you say you don’t “endorse utility functions” you mean that you don’t endorse 1 or 2. Do you endorse any of the others, and if so what would you use instead of utility functions to illustrate what you mean?
It’s hard for me to know that 4 and 5 really mean since they are so abstract. I definitely don’t endorse 1 or 2 and I’m pretty sure I don’t endorse 4 either (integrating over uncertainty in what you meant). I’m uncertain about 3; it seems plausible but far from clear. I’m certainly not consequentialist and don’t want to be, but maybe I would want to be in some utopian future. Again, I’m not really sure what you mean by 5, it seems almost tautological since everything is a mathematical object.
Even if you don’t think it’s the ideal, utility based decision theory it does give us insights that I don’t think you can naturally pick up from anywhere else that we’ve discovered yet.
It’s more than a metaphor; a utility function is the structure any consistent preference ordering that respects probability must have. It may or may not be a useful conceptual tool for practical human ethical reasoning, but “just a metaphor” is too strong a judgment.
This is the sort of thing I mean when I say that people take utility functions too seriously. I think the von Neumann-Morgenstern theorem is much weaker than it initially appears. It’s full of hidden assumptions that are constantly violated in practice, e.g. that an agent can know probabilities to arbitrary precision, can know utilities to arbitrary precision, can compute utilities in time to make decisions, makes a single plan at the beginning of time about how they’ll behave for eternity (or else you need to take into account factors like how the agent should behave in order to acquire more information in the future and that just isn’t modeled by the setup of vNM at all), etc.
The biggest problematic unstated assumption behind applying VNM-rationality to humans, I think, is the assumption that we’re actually trying to maximize something.
To elaborate, the VNM theorem defines preferences by the axiom of completeness, which states that for any two lotteries A and B, one of the following holds: A is preferred to B, B is preferred to A, or one is indifferent between them.
So basically, a “preference” as defined by the axioms is a function that (given the state of the agent and the state of the world in general) outputs an agent’s decision between two or more choices. Now suppose that the agent’s preferences violate the Von Neumann-Morgenstern axioms, so that in one situation it prefers to make a deal that causes it to end up with an apple rather than an orange, and in another situation it prefers to make a deal that causes it to end up with an orange rather than an apple. Is that an argument against having circular preferences?
By itself, it’s not. It simply establishes that the function that outputs the agent’s actions behaves differently in different situations. Now the normal way to establish that this is bad is to assume that all choices are between monetary payouts, and that an agent with inconsistent preferences can be Dutch Booked and made to lose money. An alternative way, which doesn’t require us to assume that all the choices are between monetary payouts, is to construct a series of trades between resources that leaves us with less resources than when we started.
Stated that way, this sounds kinda bad. But then there are things that kind of fit that description, but which we would intuitively think of as good. For instance, some time back I asked:
In response, I was told that
But then I asked that, if we accept this, then what real-life situation does count as an actual circular preference in the VNM sense, given that just about every potential circularity that I can think of is the kind “I prefer A to B at time t1 and B to A at time t2”? And I didn’t get very satisfactory replies.
Intuitively, there are a lot of real-life situations that feel kind of like losing out due to inconsistent preferences, like someone who wants to get into a relationship when he’s single and then wants to be single when he gets into a relationship, but there our actual problem is that the person spends a lot of time being unhappy, rather than with the fact that he makes different choices in different situations. Whereas with the couple, we think that’s fine because they get enjoyment from the “trades”.
The general problem that I’m trying to get at is that in order to hold up VNM rationality as a normative standard, we would need to have a meta-preference: a preference over preferences, stating that it would be better to have preferences that lead to some particular outcomes. The standard Dutch Book example kind of smuggles in that assumption by the way that it talks about money, and thus makes us think that we are in a situation where we are only trying to maximize money and care about nothing else. And if you really are trying to only maximize a single concrete variable or resource and care about nothing else, then you really should try to make sure that your choices follow the VNM axioms. If you run a betting office, then do make sure that nobody can Dutch Book you.
But we don’t have such a clear normative standard for life in general. It would be reasonable to try to construct an argument for why the couple having sex were rational but the person who kept vacillating about being in a relationship was irrational by suggesting that the couple got happiness whereas the other person was unhappy… but we also care about other things than just happiness (or pleasure) and thus aren’t optimizing just for pleasure either. And unless you’re a hedonistic utilitarian, you’re unlikely to say that we should optimize only for pleasure either.
So basically, if you want to say that people should be VNM-rational, then you need to have some specific set of values or goals that you think people should strive towards. If you don’t have that, then VNM-rationality is basically irrelevant aside for the small set of special cases where people really do have a clear explicit goal that’s valued above other things.
I’m not sure I follow in what sense this is a violation of the vNM axioms. A vNM agent has preferences over world-histories; in general one can’t isolate the effect of having an apple vs. having an orange without looking at how that affects the entire future history of the world.
Right, I was trying to say “it prefers an apple to an orange and an orange to an apple in such a way that does violate the axioms”. But I was unsure of what example to actually give of that, since I’m unsure of what real-life situations really would violate the axioms.
The example that comes to mind to show the how the sex thing isn’t a problem is that of a robot car with a goal to drive as many miles as possible. Every day it will burn through all its fuel and fuel up. Right after it fuels up, it will have no desire for further fuel—more fuel simply does not help it go further at this point, and forcing it can be detrimental. Clearly not contradictory
You could have a similar situation with a couple wanting sex iff they haven’t had sex in a day, or wanting an orange if you’ve just eaten an apple but wanting an apple if you’ve just eaten an orange.
To strictly show that something violates vNM axioms, you’d have to show that this behavior (in context) can’t be fulfilling any preferences better than other options that the agent is aware of—or at least be able to argue that the revealed utility function is contrived and unlikely to hold up in other situations (not what the agent “really wants”).
Constantly wanting what one doesn’t have can have this defect. If I keep paying you to switch my apple for your orange and back (without actually eating either), then you have a decent case, if you’re pretty confident I’m not actually fulfilling my desire to troll you ;)
The “want’s a relationship when single” and “wants to be single when not” thing does look like such a violation to me. If you let him flip flop as often as he desires, he’s not going to end up happily endorsing his past actions. If you offered him a pill that would prevent him from flip flopping, he very well may take it. So there’s a contradiction there.
To bring human-specific psychology into it, its not that his inherent desires are contradictory, but that he wants something like “freedom”, which he doesn’t know how to get in a relationship and something like “intimacy”, which he doesn’t know how to get while single. It’s not that he want’s intimacy when single and freedom when not, it’s that he wants both always, but the unfulfilled need is the salient one.
Picture me standing on your left foot. “Oww! Get off my left foot!”. Then I switch to the right “Ahh! Get off my right foot!”. If you’re not very quick and/or the pain is overwhelming, it might take you a few iterations to realize the situation you’re in and to put the pain aside while you think of a way to get me off both feet (intimacy when single/freedom in a relationship). Or if you can’t have that, it’s another challenge to figure out what you want to do about it.
I wouldn’t model you as “just VNM-irrational”, even if your external behaviors are ineffective for everything you might want. I’d model you as “not knowing how to be VNM-rational in presence of strong pain(s)”, and would expect you to start behaving more effectively when shown how.
(and that is what I find, although showing someone how to be more rational is not trivial and “here’s a proof of the inconsistency of your actions now pick a side and stop feeling the desire for the other side” is almost never sufficient. You have to be able to model the specific way that they’re stuck and meet them there)
tl;dr: We’re not VNM-rational because we don’t know how to be, not because it’s not something we’re trying to do.
How do you distinguish his preferences being irrationally inconsistent (he is worse off from entering and leaving relationships repeatedly) from him truly wanting to be in relationships periodically (like how it’s rational to alternate between sleeping and waking rather than always doing one or the other)?
If there’s a pill that can make him stop switching (but doesn’t change his preferences), one of two things will happen: either he’ll never be in a relationship (prevented from entering), or he’ll stay in his current relationship forever (prevented from leaving). I wouldn’t be surprised if he dislikes both of the outcomes and decides not to take the pill.
The pill could instead change his preferences so that he no longer wants to flip-flop, but this argument seems too general—why not just give him a pill that makes him like everything much more than he does now? If my behavior is irrational, I should be able to make myself better off simply by changing my behavior, without having to modify my preferences.
By talking to him. If it’s the latter, he’ll be able to say he prefers flip flopping like it’s just a matter of fact and if you probe into why he likes flip flopping, he’ll either have an answer that makes sense or he’ll talk about it in a way that shows that he is comfortable with not knowing. If it’s the former, he’ll probably say that he doesn’t like flip flopping, and if he doesn’t, it’ll leak signs of bullshit. It’ll come off like he’s trying to convince you of something because he is. And if you probe his answers for inconsistencies he’ll get hostile because he doesn’t want you to.
I’m not sure where you’re going with the “magic pill” hypotheticals, but I agree. The only thing I can think to add is that a lot of times the “winning behaviors” are largely mental and aren’t really available until you understand the situation better.
For example, if you break your foot and can’t get it x-rayed for a day, the right answer might be to just get some writing done—but if you try to force that behavior while you’re suffering, it’s not gonna go well. You have to actually be able to dismiss the pain signal before you have a mental space to write in.
I meant that if someone is behaving irrationally, forcing them to stop that behavior should make them better off. But it seems unlikely to me that forcing him to stay in his current relationship forever, or preventing him from ever entering a relationship (these are the two ways he can be stopped from flip-flopping) actually benefit him.
Forcing anyone to stay in their current relationship forever or forever preventing them from entering a relationship would be quite bad. In order to help him, he’d have to be doing worse than that.
The way to help him would be a bit trickier than that: let him have “good” relationships but not bad. Let him leave “bad” relationships but not good. And then control his mental behaviors so that he’s not allowed to spend time being miserable about his lack of options… (it’s hard to force rationality)
Controlling his mental behaviors would either be changing his preferences or giving him another option. For judging whether he is behaving irrationally, shouldn’t his preferences and set of choices be held fixed?
Relevant question: what does the cognitive science literature on choice-making, preference, and valuation have to say about all this? What mathematical structure actually does model human preferences?
Given that we run on top of neural networks and seem to use some Bayesian algorithms for certain forms of learning (citations available), I currently expect that our choice-making mechanisms might involve conditioning on features or states of our environment at some fundamental level.
My first guess would be that evolution has selected us for circular preferences that our genes money-pump so that we will propagate them. You can’t get off this ride while you’re human.
Is that a challenge?
:-) I mean that if you embody human value, you’ll probably be a money-pumpable entity. Very few humans actually achieve an end to desire while still alive and mentally active.
I’ll take the challenge, then. I was already walking around thinking that the Four Noble Truths of the Buddha are a bunch of depressing bullshit that need to be fixed.
I’ve seen a bunch of different theories backed with varying amounts of experimental data—for instance, this, this and this—but I haven’t looked at them enough to tell which ones seem most correct.
That said, I still don’t remember running into any thorough discussion of what human preferences are, other than just “something that makes us make some choice in some situations”. I mention here that
And I’m a little skeptical of any theory of human preferences that doesn’t attempt to make any such breakdown and only takes a “black box” approach of looking at the outputs of our choice mechanism.
Looks like the relevant textbook came out with an updated edition this year.
I think your original post would have been better if it included any arguments against utility functions, such as those you mention under “e.g.” here.
Besides being a more meaningful post, we would also be able to discuss your comments. For example, without more detail, I can’t tell whether your last comment is addressed sufficiently by the standard equivalence of normal-form and extensive-form games.
Essentially every post would have been better if it had included some additional thing. Based on various recent comments I was under the impression that people want more posts in Discussion so I’ve been experimenting with that, and I’m keeping the burden of quality deliberately low so that I’ll post at all.
I appreciate you writing this way—speaking for myself, I’m perfectly happy with a short opening claim and then the subtleties and evidence emerges in the following comments. A dialogue can be a better way to illuminate a topic than a long comprehensive essay.
Let me rephrase: would you like to describe your arguments against utility functions in more detail?
For example, as I mentioned, there’s an obvious mathematical equivalence between making a plan at the beginning of time and planning as you go, which is directly analogous to how one converts games from extensive-form to normal-form. As such, all aspects of acquiring information is handled just fine (from a mathematical standpoint) in the setup of vNM.
The standard response to the discussion of knowing probabilities exactly and to concerns about computational complexity (in essence) is that we may want to throw aside epistemic concerns and simply learn what we can from a theory that is not troubled by them (a la air resistance in physics..)? Is your objection essentially that those factors are more dominant in human morality than LW acknowledges? And if so, is the objection to the normal-form assumption essentially the same?
Can you give more details here? I’m not familiar with extensive-form vs. normal-form games.
Something like that. It seems like the computational concerns are extremely important: after all, a theory of morality should ultimately output actions, and to output actions in the context of a utility function-based model you need to be able to actually calculate probabilities and utilities.
Sure. Say you have to make some decision now, and you will be asked to make a decision later about something else. Your decision later may depend on your decision now as well as part of the world that you don’t control, and you may learn new information from the world in the meantime. Then the usual way of rolling all of that up into a single decision now is that you make your current decision as well as a decision about how you would act in the future for all possible changes in the world and possible information gained.
This is vaguely analogous to how you can curry a function of multiple arguments. Taking one argument X and returning (a function of one argument Y that returns Z) is equivalent to taking two arguments X and Y and returning X.
There’s potentially a huge computational complexity blowup here, which is why I stressed mathematical equivalence in my posts.
Thanks for the explanation! It seems pretty clear to me that humans don’t even approximately do this, though.
Sounds not very feasible...
Those are not assumptions of the von Neumann-Morgenstern theorem, nor of the concept of utility functions itself. Those are assumptions of an intelligent agent implemented by measuring its potential actions against an explicitly constructed representation of its utility function.
I get the impression that you’re conflating the mathematical structure that is a utility function on the one hand, and representations thereof as a technique for ethical reasoning on the other hand. The former can be valid even if the latter is misleading.
Can you describe this “mathematical structure” in terms of mathematics? In particular, the argument(s) to this function, what do they look like mathematically?
Certainly, though I should note that there is no original work in the following; I’m just rephrasing standard stuff. I particularly like Eliezer’s explanation about it.
Assume that there is a set of things-that-could-happen, “outcomes”, say “you win $10″ and “you win $100”. Assume that you have a preference over those outcomes; say, you prefer winning $100 over winning $10. What’s more, assume that you have a preference over probability distributions over outcomes: say, you prefer a 90% chance of winning $100 and a 10% chance of winning $10 over a 80% chance of winning $100 and a 20% change of winning $10, which in turn you prefer over 70%/30% chances, etc.
A utility function is a function f from outcomes to the real numbers; for an outcome O, f(O) is called the utility of O. A utility function induces a preference ordering in which probability-distribution-over-outcomes A is preferred over B if and only if the sum of the utilities of the outcomes in A, scaled by their respective probabilities, is larger than the same for B.
Now assume that you have a preference ordering over probability distributions over outcomes that is “consistent”, that is, such that it satisfies a collection of axioms that we generally like reasonable such orderings to have, such as transitivity (details here). Then the von Neumann-Morgenstern theorem says that there exists a utility function f such that the induced preference ordering of f equals your preference ordering.
Thus, if some agent has a set of preferences that is consistent—which, basically, means the preferences scale with probability in the way one would expect—we know that those preferences must be induced by some utility function. And that is a strong claim, because a priori, preference orderings over probability distributions over outcomes have a great many more degrees of freedom than utility functions do. The fact that a given preference ordering is induced by a utility function disallows a great many possible forms that ordering might have, allowing you to infer particular preferences from other preferences in a way that would not be possible with preference orderings in general. (Compare this LW article for another example of the degrees-of-freedom thing.) This is the mathematical structure I referred to above.
Right.
So, keeping in mind that the issue is separating the pure mathematical structure from the messy world of humans, tell me what outcomes are, mathematically. What properties do they have? Where can we find them outside of the argument list to the utility function?
“a utility function is the structure any consistent preference ordering that respects probability must have.”
Yes, but humans still don’t have one. It’s not even clear they can make themselves have one.
Doesn’t mean we shouldn’t try.
“statement x is not currently the case and is probably unfeasible” does in fact mean we shouldn’t try to act on it. Maybe we can try to act to make statement x true, but we shouldn’t act as if it already is. For a more concrete example, imagine this: “I’ve never done a backflip. It’s not even clear I can do one”. We know backflips are possible, and with training you’re probably going to be able to do one. But at the time you’re making that statement, saying “doesn’t mean you shouldn’t try” is TERRIBLE advice that could get you a broken neck.
Firstly, that’s kind of an uncharitable reading. If I said “I’m going to try and pass an exam” you’d naturally understand me as planning to do the requisite work first. “Backflip” just pattern-matches to ‘the sort of thing silly people try to do without training’.
However, that said, I’m being disingenuous. What I really truly meant at the time I typed that was moral-should, not practical-should, which come apart if one isn’t a perfect consequentialist. Which I ain’t, which is at least partly the point.
It may well do. Yvain has pointed out on his blog (I recall the post, though I couldn’t find it just now) that in daily life we do actually use something like utilitarianism quite a bit, which carries a presumption of something like a utility function at least in that case. But what works in normal ranges does not necessarily extrapolate: utilitarianism is observably brittle, and routinely reaches conclusions that humans consider absurd.
There’s occasionally LW posts showing that utilitarianism gives some apparently-absurd result or other, and too often the poster seems to be saying “look, absurd result, but the numbers work out so this is important!” rather than “oh, I hit an absurdity, perhaps I’m stretching this way further than it goes.” It’s entirely unclear to me that pretending you’re an agent with a utility function is actually a good idea; it seems to me to be setting yourself up to fall into absurdities.
Below, you claim this is a moral choice; I would suggest that trying to achieve an actually impossible moral code, let alone advocating it, is basically unhealthy.
Firstly, I thought we were just appealing to consequentialism, not utilitarianism?
So I think I agree with you that believing you have a utility function if you in fact don’t might suck, and that baseline humans in fact don’t. I was trying to distinguish that from:
a) believing one ought to have a utility function, in which case I might seek to self-modify appropriately if it became possible; so something a bit stronger than the “pretending” you suggested.
b) believing one should strive to act as if one did, while knowing that I’ll fall short because I don’t.
The second you addressed by saying
Did you have the same position re. Trying to Try?
I have one group of intuitions here that claim impossibility in a moral code is a feature, not a bug, because it helps avoid deluding youself that you’ve finished the job and are now perfect; and why would I expect the right action to be healthy anyway? But this seems like a line of thinking that is specific to coping with being an inconsistent human, in the absence of an engineering fix for that.
Yes, I don’t understand this at all. For example, even Yudkowsky writes that he would sooner question his grasp of “rationality” than give five dollars to a Pascal’s Mugger because he thought it was “rational”. Now as far as I can tell, they still use this framework to make decisions, a framework that implies absurd decisions, rather than concentrating on examining the framework itself, and looking for better alternatives.
What I am having problems with is that they seem to teach people to “shut up and multiply”, and approximate EU maximization, yet arbitrarily ignore low probabilities. I say “arbitrarily” because nobody ever told me at what point it is rational to step out of this framework and ignore a calculation.
You could argue that our current grasp of rationality is less wrong. But why then worry about something as dutch booking when any stranger can make you give them all your money simply by conjecturing vast utilities if you don’t? Seems more wrong to me.
Lots of frameworks imply different absurd decisions (especially when viewed from other frameworks) but it’s hard to go about your life without using some sort of framework.
If rationality is on average less wrong but you think your intuition is better in a certain scenario, a mixed strategy makes sense.
No, it means your intuition is better than your rationality, and you should fix that. If your rational model is not as good as your intuition at making decisions, then it is flawed and you need to move on.
You seem to have completely missed my point.
Let’s say I have 300 situations where I recorded my decision making process. I tried to use rationality to make the right decision in all of them, and kept track of whether I regretted the outcome. In 100 of these situations, my intuitions disagreed with my rational model, and I followed my rational model. If I only regret the outcome in 1 of these 100 situations, in what way does make sense to throw out my model? You can RATIONALLY decide that certain situations are not amenable to your rational framework without deciding the framework is without value.
Let’s say we do 100 physics experiments, and 99% of the results agree with our model. Do we get to ignore / throw out that one “erroneous” result? No, that result if verified shows a flaw in our model.
If afterwards you regretted a choice and wish you had made a better choice even with the information available to you at the time, then this realization should have bolt upright in your chair. If verified, your decision making process needs updating.
it’s still a pretty damn good model. Why can’t you get that point? Newtonian mechanics was still a very useful model and would’ve been ridiculous to replace with intuition just because it gave absurd answers in relativistic situations.
I never contradicted that point. Newtonian physics works quite fine in many situations. It is still wrong.
Edit: to expand on that point when we use physics we know that there a certain circumstances in whichwe use classical physics because it is easier and faster and the results are good enough for the precision we need. Other times we use quantum physics or relativity. the decision of which model to use is itself part of the decision-making frameworks and is what I’m talking about. if you chose to use the wrong framework and get incorrect results then your metamodel of which framework to use use to be updated.
I don’t think I have much to add to this discussion that you guys aren’t already going to have covered, except to note that Qiaochu definitely understands what a utility function is and all of the standard arguments for why they “should” exist, so his beliefs are not a function of not having heard these arguments (just noting this because this thread and some of the siblings seem to be trying to explain basic concepts to Qiaochu that I’m confident he already knows, and I’m hoping that pointing this out will speed up the discussion).
Related: We Don’t Have A Utility Function, van Gelder on the usefulness of utility functions, Applying utility functions to humans considered harmful.
Thanks for the links!
There’s a problem with discussing ethics in terms of UFs, which is that no attempt is made to separate morally relevant preferences from others. Which is a wider issue than UFs. There may be some further issue with UFs.
The problem partly is in utility functions being used as both: a) as a metaphor and b) as an exact mathematical tool with exact properties.
a) can be used to elucidate terminal values in a discussion or to structure and focus a discussion away from vague concepts in ethics. But as a metaphor it cannot be used to derive anything with strength. b) on the other hand can strictly only be used where the preconditions are satisfied. Mixing a) and b) means committing the mathematical fallacy: Believing that to have formulated something in an exact way solves the issue in practice.
Yes! Thank you for saying this clearly and distinctly.
Real-world objects are never perfect spheres or other mathematical entities. However, math is quite useful for modeling them. But the way we decide which math is the right math to use to model a particular sort of object is through repeated experiment. And sometimes the trajectory through spacetime of a given object (say, a gold coin of a certain mass) is best modeled by certain math (e.g. ballistics) and sometimes by other very different math (e.g. economics).
Utility functions belong to the math, not the territory.
For value extrapolation problem, you need to consider both what an AI could do with a goal (how to use it, what kind of thing it is), and which goal represents humane values (how to define it).
I still think there’s too much confusion between ethics-for-AI and ethics-for-humans discussions here. There’s no particular reason that a conceptual apparatus suited for the former discussion should also be suited for the latter discussion.
Yep. Particularly as humans are observably not human-friendly. (Even to the extent of preserving human notions of value—plenty of humans go dangerously nuts.)
For practical purposes I agree that it does not help a lot to talk about utility functions. As the We Don’t Have a Utility Function article points out, we simply do not know our utility functions but only vague terminal values. However, as you pointed out yourself that does not mean that we do not “have” a utility function at all.
The soft (and hard) failure seems to be a tempting but unnecessary case of pseudo-rationalization. Still, the concept of an agent “having” (maybe in the sense of “acting in a complex way towards optimizing”) a utility funktion seems to be very important for defining utilitarian (hence the name, I guess...) ethical systems. In contrast, the notion of terminal values seems to be a lot more vague and not sufficient for defining utilitarianism. Similar things (practical uselessness but theoretical importance) apply to the evaluation of the intelligence of an agent. Therefore, I think that the term ‘utility function’ is essential for theoretical debate, even though I agree that it is sometimes used in the wrong place.
I’d have called it “the danger of falling in love with your model”. The mathematics of having a utility function is far more elegant than what we actually have, a thousand shards of desire that Dutch-book you into working for the propagation of your genes. So people try to work like they have a utility function, and this leaves them open to ordinary human-level exploits since assuming you have a utility function still doesn’t work.
On the one hand, you are correct regarding philosophy for humans: we do ethics and meta-ethics to reduce our uncertainty about our utility functions, not as a kind of game-tree planning based on already knowing those functions.
On the other hand, the Von-Neumann-Morgenstern Theorem says blah blah blah blah.
On the third hand, if you have a mathematical structure we can use to make no-Dutch-book decisions that better models the kinds of uncertainty we deal with as embodied human beings in real life, I’m all ears.
I don’t think Dutch book arguments matter in practice. An easy way to avoid being Dutch booked is to refuse bets being offered to you by people you don’t trust.
Not that I fully support utility functions as a useful concept, but having a consistent one also keeps you from dutch booking yourself. You can interpret any decision as a bet using utility and people often make decisions that cost them effort and energy but leave them in the same place where they started. So it’s possible trying to figure out one’s utility function can help prevent eg anxious looping behavior.
Sure, if you’re right about your utility function. The failure mode I’m worried about is people believing they know what their utility function is and being wrong, maybe disastrously wrong. Consistency is not a virtue if, in reaching for consistency, you make yourself consistent in the wrong direction. Inconsistency can be a hedge against making extremely bad decisions.
The idea is that the universe offers you Dutch-book situations and you make and take bets on uncertain outcomes implicitly.
That said, I concur with your basic point: universal overarching utility functions—not just small ones for a given situation, but a single large one for you as a human—are something humans don’t, and I think can’t, do—and realising how mathematically helpful it would be if they did still doesn’t mean they can, and trying to turn oneself into an expected utility maximiser is unlikely to work.
(And, I suspect, will merely leave you vulnerable to everyday human-level exploits—remember that the actual threat model we evolved in is beating other humans, and as long as we’re dealing with humans we need to deal with humans.)
But does it in fact do that? To the extent that you believe that humans are bad Bayesians, you believe that the environment in which humans evolved wasn’t constantly Dutch-booking them, or that if it was then humans evolved some defense against this which isn’t becoming perfect Bayesians.
I do suspect that our thousand shards of desire being contradictory and not resolving is selected for, in that we are thus money-pumped into propagating our genes.
You are of course correct about the concrete scenario of being Dutch Booked in a hypothetical gamble (and I am not a gambler for reasons similar to this: we all know the house always wins!). However, if we’re going to discard the Dutch Book criterion, then we need to replace it with some other desiderata for preventing self-contradictory preferences that cause no-win scenarios.
Even if your own mind comes preprogrammed with decision-making algorithms that can go into no-win scenarios under some conditions, you should recognize those as a conscious self-patching human being, and consciously employ other algorithms that won’t hurt themselves.
I mean, let me put it this way, probabilities aside, if you make decisions that form a cyclic preference ordering rather than even forming a partial ordering, isn’t there something rather severely bad about that?
Why?
Do you want to program an agent to put you in a no-win scenario? Do you want to put yourself in a no-win scenario?
Why do you care so much about Dutch booking relative to the myriad other considerations one might care about?
Because it’s a desideratum indicating that my preferences contain or don’t contain an unconditional and internal contradiction, something that would screw me over eventually no matter what possible world I land in.
ITYM a desideratum.
On the fourth hand, we do ethics and metaethics to extrapolate better ethics.
Yes, that’s right. We lack knowledge of the total set of concerns which move us, and the ordering among those of which move us more. Had we total knowledge of this, we would have no need for any such thing as “ethics” or “meta-ethics”, and would simply view our preferences and decision concerns in their full form, use our reason to transform them into a coherent ordering over possible worlds, and act according to that ordering. This sounds strange and alien because I’m using meta-language rather than object-language, but in real life it would mostly mean just having a perfectly noncontradictory way of weighing things like love or roller-skating or reading that would always output a definite way to end up happy and satisfied.
However, we were built by evolution rather than a benevolent mathematician-god, so instead we have various modes of thought-experiment and intuition-pump designed to help us reduce our uncertainty about our own nature.
Can you give some specific examples of people misusing utility functions? Or if you don’t want to point fingers, can you construct examples similar to those you’ve seen people use?
This thread was prompted by this comment in the Open Thread.
That comment is about utilitarianism and doesn’t mention “utility functions” at all.
I can’t help but suspect, though, that LW people are drawn to utilitarianism because of what they see as the inevitability of using utility functions to model preferences. Maybe this impression is mistaken.
To me it seems as if utility functions were the most general (deterministic) way to model preferences. So, if we model preferences by “something else”, it will usually be some special case of a utility function. Or do you have something even more general than utility functions that is not based on throwing a coin? Or do you propose that we model preferences with randomness?
There are helpful models and there are unhelpful models. I can model the universe as a wave function in a gigantic Hilbert space, and this is an incredibly general model as it applies to any quantum-mechanical system, but it’s not necessarily a helpful model for making predictions at the level I care about most of the time. My claim is that, even if you believe that utility functions can model human preferences (which I also dispute), then it’s still true that utility functions are in practice an unhelpful model in this sense.
For our universe, other models have been extremely succesful. Therefore, the generality of wave functions clearly is not required. In case of (human) preferences, it is unclear whether another model suffices.
What you are saying seems to me a bit like: “Turing machines are difficult to use. Nobody would simulate this certain X with a Turing machine in practice. Therefore Turing-machines are generally useless.” But of course on some level of practical application, I totally agree with you, so mabye there is no real disagreement in the use of utility functions here—at least I would never say something like “my utility funtion is …” and I do not attempt to write a C-Compiler on a Turing machine.
I do not think that the statement “utility functions can model human preferences” has a formal meaning, however, if you say that it is not true, I would really be very interested in how you prefer to model human preferences.
What would you propose as an alternative?