Terminology suggestion: Say “degrees utility” instead of “utils” to prompt affine thinking
A common mistake people make with utility functions is taking individual utility numbers as meaningful, and performing operations such as adding them or doubling them. But utility functions are only defined up to positive affine transformation.
Talking about “utils” seems like it would encourage this sort of mistake; it makes it sound like some sort of quantity of stuff, that can be meaningfully added, scaled, etc. Now the use of a unit—“utils”—instead of bare real numbers does remind us that the scale we’ve picked is arbitrary, but it doesn’t remind us that the zero we’ve picked is also arbitrary, and encourages such illegal operations as addition and scaling. It suggests linear, not affine.
But there is a common everyday quantity which we ordinarily measure with an affine scale, and that’s temperature. Now, in fact, temperatures really do have an absolute zero (and if you make sufficient use natural units, they have an absolute scale, as well), but generally we measure temperature with scales that were invented before that fact was recognized. And so while we may have Kelvins, we have “degrees Fahrenheit” or “degrees Celsius”.
If you’ve used these scales long enough you recognize that it is meaningless to e.g. add things measured on these scales, or to multiply them by scalars. So I think it would be a helpful cognitive reminder to say something like “degrees utility” instead of “utils”, to suggest an affine scale like we use for temperature, rather than a linear scale like we use for length or time or mass.
The analogy isn’t entirely perfect, because as I’ve mentioned above, temperature actually can be measured on a linear scale (and with sufficient use of natural units, an absolute scale); but the point is just to prompt the right style of thinking, and in everyday life we usually think of temperature as an (ordered) affine thing, like utility.
As such I recommend saying “degrees utility” instead of “utils”. If there is some other familiar quantity we also tend to use an affine scale for, perhaps an analogy with that could be used instead or as well.
This post seems much more appropriate for the Discussion section.
Agreed. Moved.
Also time (there’s the Big Bang, but no-one uses it as the zero in everyday usage); for broader values of “everyday”, voltage and energy, too.
This all reminds me of torsors.
That’s true. People don’t seem to mess those up as often as “utils”. I wonder why?
Hypothesis: For energy and voltage, it’s becaue these are mostly only used by people who know what they’re talking about in the first place. For time, it’s because we usually measure time as “12:00”, etc.; the only people saying “the time is 5 seconds” are people who know what they’re doing.
...except that explanation doesn’t quite work, because it doesn’t explain years. But then, with years we usually use a bare number… hm, this is sounding pretty contrived.
Better hypothesis: Time is familiar enough that people know not to do that, utility isn’t.
(OTOH, people saying stuff like “X is twice as hot as Y” when X is 80 °C and Y is 40 °C aren’t totally unheard of.)
An advantage of using Fahrenheit—the zero is clearly arbitrary! :)
Another option is to replace “50 utils” or “50 degrees utility” with “50 utils”. Yes, always. The wiki link would have to be updated to address the affine caveat (as well as some others) but it might be worth it.
Edit: yet another option is to explicitly include a constant: “50+C utils”. This has a fine tradition stemming from calculus. If necessary, it could be combined with my previous suggestion.
(Moving a bit more to the concrete side for the sake of those who fall closer to the engineer perspective on the mathematician-engineer continuum):
Affine transformation == Linear transformation + translation.
It preserves ratios of distances, but not (necessarily) angles or distances themselves. Utility functions are only defined up to an affine transformation, which means that preferences are preserved, but that “doubling the utility assigned to an action” will not necessarily give the same results even if two people have the same utility function. (Just like doubling 15 degrees C == 30 degrees C, but the corresponding 59 degrees F and 86 degrees F are not doubles of each other.)
I guess don’t really see how “utils” encourages this mistake nor am I sure it is that common. I mean, the idea that most, if not all goods are not linear is 101-level stuff.
Meanwhile, “degrees of utility” is clumsier and “degree” has other meanings that don’t involve scales. I am also against replacing adequate terminology used in academia with new terms unless there is very good reason. It is a bad idea to artificially increase the apparent distance between the work done here on decision theory and the work done elsewhere. There is way too much of that already. It makes for bad press and is phyg-like.
I don’t understand this comment. I’m assuming the linearity of a good refers to whether its utility is a linear function of how many of it you have? In that sense, this is unrelated; this is a much broader issue, having nothing to do with how the utility of something varies with having multiple of it.
Although I suppose it is related in that, if “linear good” means u(kx)=ku(x) (where here u(x) means the utility of having x of the good), then no good can be linear in that strong sense, because the equation isn’t even meaningful! Edit: But, as I should have realized earlier, this is really a silly equation to consider in the first place, as it’s the difference u(x)-u(0) you really care about, not u(x) itself...
I don’t think this does increase the distance in any substantial way.
It’s not a breaking change (like, say, putting functions on the right, or declaring electrons to be positive). It’s not very-similar-but-slightly-different in a way that would cause confusion (like using tau instead of 2*pi, or using Delta(z) instead of Gamma(z+1)). It’s not replacing any key term that someone would be searching for (like using “meager” instead of “first category”, or “false hit” instead of “type I error”, or “computably enumerable” instead of “recursively enumerable”). It is a direct translation, of a term that people won’t be searching for and isn’t even strictly necessary, in a way that’s quickly transparent and nearly self-explanatory. I am honestly having trouble imagining a less obtrusive change. So I don’t think this is putting any substantial distance there, let alone approaching phyg status.
I’m worried “degrees utility” could encourage the conflation of the physical quantity ‘how closely an event corresponds to a set of individuals’ preferences’ with the metric we select for measuring that quantity. We don’t say ‘degrees temperature’; Fahrenheit and Celsius are specific ways of measuring temperature, whereas I gather you don’t have in mind a specific utility metric.
Indeed. It’s an improvement over “utils”, though, which has the same problem and also suggests linearity. I’m not sure what to do to fix this problem, but I’m also not sure it’s that important (it seems pretty clear that we have to measure it somehow, after all).
I am reluctant to accept a terminology change to something that is broken, even if the current terminology is broken as well. Accepting such incomplete solutions serves to reduce the incentive to come up with an actual workable fix to the problem and gives people the illusion that they have something that is solved.
“Degrees Utility” is not analogous to “Degrees Fahrenheit” or “Degrees Celsius”. When 34 degrees Fahrenheit is compared to 54 degrees Fahrenheit it is correct (and meaningful) to say that the latter is hotter than the former. When, using your terminology, “34 degrees Utility” is compared to “54 degrees Utility” the result is not meaningful even though it sometimes should be. For example when looking at a payoff matrix for a game involving agent A and agent B the 54 degrees Utility that B gets in some outcome cannot be compared meaningfully to “34 degrees Utility” that A gets in an outcome but can be compared to the “34 degrees Utility” that B gets in a different outcome (with the result “better”). That’s just sloppy expression with the illusion of rigour.
“34 DegreesUtility” would be viable but that sort of parametrised nomenclature is not sufficiently high status to reliably enforce as a standard just now.
...actually, now that I think about it some more, I agree that there is something to your line of thinking; I’m just not certain it leads to the conclusion you suggest.
The problem is that we don’t have any way of talking about this that intuitively prompts how it actually works, and “degrees utility” is problematic because it suggests it accounts for all the problems. OK. However, the thing is, so does “utils”. I mean, it’s possible that people see that and know to tread carefully; I don’t have any data here. I just feel like I’ve seen people try to add 1 util and 1 util often enough that I suspect that that isn’t the case, and that most people do read “utils” as indicating that it is correct to treat it as an amount of stuff.
But perhaps reverting to an even worse solution would suggest to tread carefully—namely, bare numbers. Again, this is pure speculation, I have no data; but I get the feeling that bare numbers will raise people’s hackles more than “utils”. Bare numbers suggest “something’s been left out here; tread carefully”; using a unit suggests “yes, this is a sensible way to measure it.”
So, if I’m correct about that, “utils” actually seems like the worst suggestion of the three—compared to “degrees utility”, it’s more misleading, but doesn’t come with an additional warning sign; compared to a bare number, it lacks the obvious warning sign, and isn’t that much more misleading. (Because adding and scaling will be the most tempting meaningless things to do anyway; multiplication seems a bit more exotic...)
Again: “Utils” has all the same problems, and more. For a single agent, the comparison is meaningful.
If you prefer sticking to stick with the existing terminology despite it suggesting even worse meaningless comparisons, OK, but don’t act like you are pointing out anything that isn’t obvious, or that is specific to my suggestion.
“degrees util?” (or if it has to be named after a person, “degrees von Neumann” or “degrees Morgenstein” depending on whether you’re closer to Hungary or Germany).
Maybe “degrees VNM”?
Thought-provoking indeed. I agree with you that the scale we’ve picked for utils is arbitrary, and the zero we’ve picked is also arbitrary. After reading through the comments, I begin to wonder if we should go farther. We talk about utils as if this is a quantity we can actually measure, but is that true? Are we measuring anything at all? Is a numeric measure of any kind at all helpful here?
Let me propose a situation: Given a choice between beer and steak, John chooses the steak. Given a choice between steak and ice cream, John chooses the ice cream. Given a choice between ice cream and beer, John chooses the beer. Which item has the highest utility to John? There’s just no way to make sense of that in terms of real-valued utils because real numbers are transitive, and utility doesn’t have to be.
If utils do make sense, I ask someone to produce an actual means of measuring them, fuzzy and approximate thought it may be. I can’t figure out any mathematically consistent way to do this that doesn’t resolve to some other more easily measured quantity such as money or dopamine levels or quality-adjusted life years (QALYs). And if one of those is what we’re measuring, then we should probably just go ahead and say so.
In fact, in different problems we’re likely to want different kinds of utility. Sometimes a problem is best understood in terms of money. In others, it’s better understood in QALYs, and money may be not the measure but rather the constraint. That is, given that we have X dollars to work with, how can we maximize QALYs?
Bu if we just use abstract “utils” or “utilons” without connecting those to something we can measure in the non-hypothetical world, I’m not sure we get any useful information that applies outside of an axiomatic system that may not model reality.
Does this really happen? Can money be pumped out? For example, offer John the opportunity to pay $0.05 to upgrade his beer for a steak, then $0.05 to upgrade that to an ice cream cone, then $0.05 to upgrade that to a beer. Run forever. I think if you actually did this you’d find that of the three there actually is one that John would prefer to have.
Yes, real human preference is routinely intransitive. That’s a topic perhaps worth its own discussion.
The parent post is still relevant, though. Inasmuch as utilitarian calculations are a useful approximation to human decisionmaking, it’s worth reminding people here that utilities don’t have a natural magnitude scale and that there’s no natural way to compare them across agents.
As far as I understand, it’s worse than that, linearity is not required, any (continuous) monotonic rescaling would do, since the only thing that needs to be preserved is the ranking of outcomes.
Linearity is required… what’s preserved is the ranking of lotteries over outcomes. Preserving the order of “a cookie” and “two cookies but no dollar” and “three cookies but a dollar in debt” isn’t enough, you also have to preserve “40% chance of a cookie and 60% chance of two cookies but no dollar”.
There may be some confusion over terms, because economists do in fact also have use for utility functions that only express an ordering of outcomes. (Incidentally, this is also true of some of the decision theory work that has appeared on LessWrong: the utility functions in our proof-based versions of UDT only express an ordering; these models don’t have a notion of probabilities at all.) The OP and the parent comment are about the utility functions given by the von Neumann-Morgenstern theorem; these are left invariant by any affine rescaling and (by the uniqueness part of the theorem) are changed by any non-affine rescaling.
It’s worth mentioning that all three kinds of utility functions can be constructed: ordinal scale, interval scale, and ratio scale. For an overview of ratio scale utility functions, see Peterson (2009), pp. 106-110.
Yes, to be absolutely clear, I’m talking about the sort of utility functions you get from the VNM theorem or Savage’s Theorem.
It’s not really clear to me what the use is for a utility function if all you have is ordering; why not just use an ordering? Seems that using a utility function then would just be needlessly restricting what sort of orderings you can have. Well, depending on what requirements you want that ordering to satisfy… after all if you have all of Savage’s axioms then you do get a utility function! But that requires ordering actions, not just outcomes...
The paradigmatic economic application I recall is consumer choice theory: You have a certain amount of money,
m
, and two goods you can buy. These goods have fixed pricesp
andq
. Your choices are pairs (x
,y
) saying how much of each good you buy; the “feasible set” of choices is{(x,y) : x,y >= 0 and xp + yq <= m}
. What’s your best choice in this set? We want to use calculus to solve this, so we’ll express your preferences as a differentiable utility function. The reasons VNM or Savage doesn’t enter into it is that actions lead to outcomes deterministically.In UDT, we don’t even start with a natural definition of “outcome”; in principle, we need to specify (1) a set of outcomes, (2) an ordering on these outcomes, and (3) a deterministic, no-input program which does some complicated computation and returns one of these outcomes. (The intuition is that the complicated computation computes everything that happens in our universe, then picks out the morally relevant parts and prints them out.) It’s just simpler to skip parts (1) and (2) in the formal specification and say that the program (3) returns a number. Since the proof-based models have no notion of probability (even implicitly like in Savage’s theorem), this makes the program an order-only “utility function.”
(Thanks for adding the point about Savage’s theorem!)
Yes, but “we want to use calculus to solve this” isn’t a very natural constraint on the set of orderings. :) It’s a “we want to make the math easier” constraint, not a “we have reason to believe that any rational agent should act this way” constraint.
Not that it’s necessarily inappropriate in the example you give—it probably makes sense there. Just a bit surprising that UDT would restrict itself in such a way.
In the UDT case, the set of outcomes is finite (well, or at least the set of equivalence classes of outcomes under the preference relation is finite) and the utility functions don’t have any particular properties, so every possible preference relation the model can treat at all can be represented by a utility function!
(I should note that this is not UDT as such we’re talking about here, but one particular formal way of implementing some of the ideas of UDT.)
Oh, OK then!
No, you don’t. Risk-aversion is legal.
Hm. I initially posted this in Discussion before quickly moving it to Main, and now it’s not showing up in the “recent posts” sidebar. Do I just have to wait or is this a bug?
Um, you can add and double utilities. Utilities are typically represented as real numbers—and they are pretty meaningful, the “affine transformation” business notwithstanding.
You can double the real numbers representing them, but the results of this won’t be preserved under affine transformations. So you can have two people whose utility functions are the same, tell them both “double your utility assigned to X” and get different results.
I think it would be very useful to explicitly state what the consequences of utility functions being only defined up to an “affine transformation” are in the grandparent post instead of assuming that everyone knows this at a 5-second level. My immediate reaction to the parent post was to look at the wikipedia article for affine transformations without much enlightenment.
Temperature is pretty useless as an analogy, because everyone knows that 2*40 degrees = 80 degrees; you have to think about moving between Celsius and Fahrenheit to actually get something useful out of the analogy. Voltage is even less helpful, because all it depends on is having a fixed reference point (i.e., differences between two voltages are always preserved, even if you change zero points), while distances aren’t preserved in general under affine transformations.
The downvoted post and your response are the most valuable in this entire thread, as they were the only ones that clearly communicated what the actual ramifications of utility functions only being defined up to an affine transformation were.
Ugh. Sorry if this came across as snarky, but the grandparent post came across as “If this young man expresses himself in terms too deep for me,/Why, what a very singularly deep young man/this deep young man must be!”
Except 2*40 degrees isn’t 80 degrees; the operation “2*40 degrees” is simply meaningless in the first place. (I mean, unless that’s a temperature difference of 40 degrees.)
(I mean, OK, you can strictly speaking double a temperature, but in order to do so you need to know what absolute zero is.)
Adding utilities is a totally routine operation. It is not necessarily a mistake.
Sure, you cam make mistakes by adding utilities—but that’s quite a different topic.
It looks to me like you haven’t gotten your head around the notion of an affine space.
In short: Utilites are not being added there. They are being linearly combined with coefficients summing to 1. I.e. you are taking an affine combination of them. Not a general linear combination, such as adding them (if you were adding two of them, the coefficients would sum to 2.)
If x and y are utilities, (x+y)/2 is meaningful, as is x/3+2y/3, as is 2x-y, etc. x+y is not meaningful, nor is 2x, or -x, or 3x-y.
Edit: To be clear, by “meaningful” here I mean meaningful as utilities, not meaningful as absolute numbers. Obviously none of these are meaningful as absolute numbers—to accomplish that, you need something like (x-y)/|z-w|.
Utilities are elements of an (ordered) affine space—not a vector space. Hence why (if represented by real numbers) they are only unique up to affine transformation.
I am not going to explain why, because this is bog-standard VNM. I am just hoping that by presenting a missing concept (that of an affine space) I might clear up some confusion.
You might also want to read John Baez’s thing on torsors that satt linked.
My goodness, this is getting ridiculous! Do I have to explain the concept of adding now? See those “plus” symbols? They represent the standard mathematical operation of addition. Utilites are being added there. For another example of adding utilities (with slightly less potentially-confusing math nearby) see the very next section of the same document. As I already explained, adding utilities is totally routine and is not necessarily a mistake.
...I wasn’t going to reply to this—I can’t expect to quickly correct the macro-mistake you’re making—but in this case the micro-mistake you’re making is a very simple one, so I can point it out.
Yes, the plus sign represents the operation of addition. But it isn’t utilities being added; pay attention to what the summands are! The summands are not utilities, but probabilities multiplied by utilities. At no point in anything you have pointed to are two utilities added without first being multiplied by probabilities (because that would be a meaningless operation).
Now, it’s true that if x and why are utilities, then x+y is the same as 1x+1y, and so this is a special case of “multiplying by probabilities and then summing”. But in fact the general operation of “multiply by probabilities and sum” is not meaningful; it is meaningful only in the special case when the probabilities in question sum to 1. (Though as I said above, it’s slightly more general than that, in that they don’t have to be probabilities—they can be any real number.)
Every “sum of utilities” you’ve pointed me to—which, as I’ve said, have not been sums of utilities, but rather sums of utilities scaled by real numbers—has taken this restricted form. Which is good, because otherwise the reasult would be meaningless. (Well, meaningless as a utility, anyway—we could consider something like x/3 + y/3 + z/3, where x, y, and z are utilities. Then you could point out that this contains the subsum x/3+y/3, which is not of the right form. But while meaningless as a utility, it’s a perfectly valid 2/3-of-a-utility. You could mulitply it by 3⁄2, or add another third-of-a-utility, or or have it and then add a 2/3-of-a-utility, and get a meaningful utility.)
(If I really wanted to pick nits, I could point out that “+” only really stands for addition if we first embed the affine space of utilities in a vector space, or assign particular real numbers to utilities; otherwise it’s just a notational shorthand. But in the case that we do assign real numbers to utilities, yes, it’s standard addition. Just not utilities that are being added.)
Your mistake is even simpler. Probabilities are unit-free. They are numbers between 0 and 1. As such, they are dimensionless. So: a utility multiplied by a probability is still a utility.
If it helps any, I have a degree in mathematics. I do actually know what I am talking about here.
That would be a sensible inference, yes, if utilities were elements of a vector space. This is exactly why I said the terminology “utils” was so misleading—it suggests that they work like meters, seconds, etc; that we can think in terms of units and dimensions. But that doesn’t work here!
Dimensional analysis is basically analysis of scaling symmetries. It works when the only symmetries are scaling symmetries. But utilities, being an affine thing rather than a linear thing, have more symmetries than that! They have translation symmetries too! Units and dimensions are a very useful tool but they are not universal and they can’t handle something like this, unless you make sure you carefully distinguish between utilities and utility differences.
(That last one seems to be the usual solution to this sort of problem—not in the case of utitilities but more generally. E.g. we don’t hesitate to talk about points in time as quantities of seconds, even though time is translation-invariant, but really it’s only durations that are quantities of seconds. When we describe a point in time as being at “5 seconds”, we really mean “5 seconds after some agreed upon starting point”. And so while adding or halving durations is meaningful, adding or halving positions in time is not, because what’s going on with the starting point? (By contrast, averaging positions in time is meaningful, as is 2*t_1 - t_2, etc.) But time is familiar, so people don’t tend to make that sort of mistake, of forgetting that times are measured relative to some implicit arbitrary baseline; whereas utility is not so familiar, and, well, you’re making that mistake right now.)
If you like, you can imagine—as I’ve essentially done in my post above—that this affine space is embedded in some larger vector space, like the line x+y=1 embedded in R^2, and that elements of x+y=k have type “k*utility”.
But this is becoming stupid. This is a hell of a lot of words; the fact of the matter is that if you were right, then we could take two outcomes a and b, with u(a)=1, u(b)=2, and observe that 3u(a) > u(b); then define an equivalent utility function v by v(x)=u(x)-2, and observe that now 3v(a) < v(b), so apparently in fact v was not equivalent. I.e. if you were right, then utility functions would only be unique up to positive scaling, not up to general positive affine transformations.
Only one of the following can be true: 1) It is meaningful to take non-affine combinations of utility functions 2) Two utility functions related by a positive affine transformation are equivalent
And it’s the latter. Why? Because if you look at the definition of what it means for a function u to be the utility function of a given agent, you’ll notice it only involves comparisons of affine combinations of values of u, not general linear combinations. Hence, applying any positive affine transformation will not change the comparisions, and the result will again be a utility function for the given agent.
And this is why everyone says that they’re only unique up to positive affine transformation, and correct in saying so. If the definition of a utility function relied on more general linear combinations of utilities, then that would restrict the symmetries further, and it would probably result in there only being scaling symmetries, in which case you would be right.
It seems as though you would agree that it is possible to add utility differences. The thing is, whenever anyone discusses utilities at all they are normally discussing utility differences. It’s the utility of having a banana over the utility over not having a banana. Or the utility of taking a medicine over not taking it. Such differences are the things that are being added together by those who add utilities.
Describing utility differences by using the term “utility” is like calling an elapsed time a “time”—both are commonplace. You can add utilities in much the same way that you can add times and distances.
Yes, of course you can add utility differences. Utilities form an affine space, their differences lie in the vector space acting on this affine space.
I disagree that discussion of utilities is normally discussion of utility differences, but, whatever. I’m not going to spend any more karma arguing over this. Regardless, it is important to recognize the difference between the two and keep the distinction clear, just as it is with positions in time vs. durations, positions vs. displacements, etc.
If people are only going to talk about utility differences rather than utilities, then sure, “utils” is fine. I feel like I’ve seen enough cases of trying to add utilities (not utility differences) that I think this is a bad idea, but, whatever; I’m not going to argue about that. (And it is possible I misunderstood what they were saying because it didn’t occur to me that maybe they meant utility differences and I wasn’t trying to read charitably. If that is the case, that might explain why some people thought my suggestion was so unnecessary...)