Value/Utility: A History

[ I am writing this post because, while many people on LessWrong know something about the history of value and utility, it’s vanishingly rare that someone is confident enough that they understand the whole thing, that they feel they can speak authoritatively about what these concepts ‘really mean’. This often blocks important discussions. I had a faint memory suggesting that the canonical history is too short and understandable to really warrant this, so I looked up the whole thing, and as far as I can tell, that’s true.

[One relevant Wikipedia section; I’m basically trying to write this up more tersely and straightforwardly and with less left up to assumed shared reader background.]

As a sometime hobbyist historian, I do wince publishing something so simplified. I know my picture of what the important events were would radically shift if I knew a little more. However, I think LessWrong working off this painfully simplified consensus would be a night-and-day improvement.

This post contains extensive quotes/cribbing from “Do Dice Play God?” by the mathematician Ian Stewart. IMO it’s a great book on probability theory, although not mathematically sophisticated or ideologically Bayesian, because it gives the surprising historical/cultural motivations for a comprehensive breadth of “accepted procedures” in statistics usually taken at face value. [ Word of caution: Stewart conspicuously leaves out such figures as ET Jaynes, and Karl Friston [in his chapter about the brain as a Bayesian dynamical system!] suggesting he isn’t aware of everything. ] ]

I. Cardano

The history books say: everybody was confused about chance until Girolamo Cardano, an Italian algebraist and gambler who wrote “Book on Games of Chance” in 1564 [ it was not published until 1663, long after he’d died ]. “At first sight [most Roman dice] look like cubes, but nine tenths of them have rectangular faces, not square ones. They lack [ . . . ] symmetry [ . . . ], so some numbers would have turned up more frequently than others. [ . . . ] So why didn’t Roman gamblers object when they were asked to play with biased dice? [ . . . ] a belief in fate, rather than physics, might be the explanation. If you thought your destiny was in the hands of the gods, then you’d win when they wanted you to win and lose when they didn’t. The shape of the dice would be irrelevant.” [Stewart 26]

Cardano “used dice to illustrate some basic concepts, and wrote: ‘To the extent to which you depart from … equity, if it is in your opponent’s favor, you are a fool, and if in your own, you are unjust.’ This is his definition of ‘fair’ [ i.e. a «fair game» ].” [Stewart 28]

Cardano was the first to analyze dice roll outcomes in terms of combinatorics. “Gamblers had long known from experience that when throwing three dice, a total of 10 is more likely than 9. This puzzled them, however, because there are six ways to get a total of 10:
[ 1 + 4 + 5 ], [ 1 + 3 + 6 ], [ 2 + 4 + 4 ], [ 2 + 2 + 6 ], [ 2 + 3 + 5 ], [ 3 + 3 + 4 ]
but also six ways to get a total of 9:
[ 1 + 2 + 6 ], [ 1 + 3 + 5 ], [ 1 + 4 + 4 ], [ 2 + 2 + 5 ], [ 3 + 3 + 4 ], [ 3 + 3 + 3 ]
So why does 10 occur more often? Cardano pointed out that there are 27 ordered triples totalling 10, but only 25 totalling 9.” [Stewart 30]

“He also discussed throwing dice many times repeatedly, and there he made his most important discoveries. The first was that the probability of the event is the proportion of occasions on which it happens, in the long run. This is now known as the ‘frequency’ definition of probability. The second was that the probability of an event occuring every time in n trials is p^n if the probability of a single event is p. It took him a while to get the right formula, and his book included the mistakes he made along the way.” [Stewart 30]

Cardano’s major work dealt with expected value only indirectly, as the implicit unnamed payout from bets at which the gambler could cheat—or just make wise decisions—by knowing Cardano’s laws of probability. In 1654, Blaise Pascal and Pierre de Fermat engaged in a correspondence which was the first work to deal with expected value directly.

II. Pascal and Fermat

Pascal and Fermat were trying to solve the so-called “problem of points”: if we’ve both put money into a pot, and we’ve agreed to pay the whole pot out to whichever of us wins a game of chance, and the outcome of this game of chance is decided by who reaches a certain total number of points first, then how do we divide up the pot if the game has to end before the total is reached?

The correspondence goes on for a while; we have all of it except for Pascal’s first letter, in which he apparently posed the wrong answer [Stewart 31].

“Fermat responded with a different calculation, urging Pascal to reply and say whether he agreed with the theory. The answer was as he’d hoped:

’Monseur,

Impatience has seized me as well as it has you, and although I am still abed, I cannot refrain from telling you [...] you have found the two divisions of the points and of the dice with perfect justice.′

Pascal admitted his previous attempt had been wrong, and the two of them batted the problem to and fro [...] Their key insight is that what matters is not the past history of the play [...] but what might happen over the remaining rounds. If the agreed target is 20 wins and the game is interrupted with the score 17 to 14, the money ought to be divided in exactly the same way as it would be for a target of 10 and scores 7 to 3.” [Stewart 31]

This division was based on the percentage [calculated with pure combinatorics, so, assuming no skill, cheating, or other concealed advantage for either player] of futures extrapolating forward from the current game state, in which Player A, versus Player B, wins the game. In this way, Pascal and Fermat calculated a kind of “expected value” for a given player and game state.

III. Huygens

In 1657, Christian Huygens explicated a version of expected value which was crystallized down from Pascal and Fermat’s version to refer not just to the expectation of winning one’s current game from a certain game state, but to the expected value earned from particular decisions to allocate resources to particular outcomes of events modeled as random.

“Suppose you play, many times, a dice game where your wins or losses are:

lose £4 if you throw 1 or 2
lose £3 if you throw 3
win £2 if you throw 4 or 5
win £6 if you throw 6

It’s not immediately clear whether you have an advantage in the long run. To find out, calculate:

the probability of losing £4 is [ [1/6] + [1/6] = ] [2/6] = [1/3]
the probability of losing £3 is [1/6]
the probability of winning £2 is [ [1/6] + [1/6] = ] [2/6] = [1/3]
the probability of winning £6 is [1/6]”
[Stewart 32]

[ This comes out to

-£4*[1/3] - £3*[1/6] + £2*[1/3] + £6*[1/6]

= -£4/3 - £1/2 + £2/3 + £1

= -£8/6 - £3/6 + £4/6 + £6/6

= -£1/6

IV. Jakob Bernoulli

Huygens’s original formula for expected value was

$E = \frac{p_{1} * a_{1} + p_{2} * a_{2} + p_{3} * a_{3} (\dots)}{p_{1} + p_{2} + p_{3} (\dots)}$

[ with $p_{i}$ being the probability of the $i$ th outcome, and $a_{i}$ being the payout from the $i$ th outcome ].

In Ars conjectandi, written between 1684 and 1689 and published posthumously by his nephew Nicolaus Bernoulli in 1713, Jakob Bernoulli pointed out that, since you can only expect exactly 1 thing to happen to you, the probabilities $p_{1}, p_{2}, p_{3} . . .$ in Huygens’s formula should be normalized, to all sum to 1. Bernoulli’s new formula was

$E = p_{1} * a_{1} + p_{2} * a_{2} + p_{3} * a_{3} (\dots)$ ,

with normalized probabilities.

V. Nicolaus Bernoulli and Daniel Bernoulli

In the same year as the publication of his uncle’s Ars conjectandi [1713], Nicolaus Bernoulli posed a puzzle in a letter to Pierre Raymond de Montmort which called into question the practical applicability of expected value calculations, particularly under conditions where agents are expected to update their decisions in linear terms, in response to the nominal value of currency.

Suppose, wrote Nicolaus, you are offered a lottery. The terms of the lottery are as follows:

You pay down initial stakes $N.

Then a fair coin is flipped, until it comes up H.

However many times «num_runs» the coin comes up H, you are awarded $$ (2^{« n u m_r u n s »})$ .

So if the coin comes up T on the 1st throw, you’re awarded $$ 2^{0}$ , or $$ 1$ . If it comes up T on the 2nd throw, you get $$ 2^{1}$ , or $$ 2$ . If it doesn’t come up T until the 4th throw, you’re awarded $$ 2^{3}$ , or $$ 8$ .

Nicolaus’s question was: what $N initial stakes should you be willing to offer in this lottery?

If you just multiply out the expected values here according to Jakob’s formula, to try and get your expected value, you get

$$ (p_{1} = 1 / 2) * (a_{1} = 1)$
$+ $ (p_{2} = 1 / 4) * (a_{2} = 2)$
$+ $ (p_{3} = 1 / 8) * (a_{3} = 4)$
$+ $ (p_{4} = 1 / 16) * (a_{4} = 8)$

. . . and so on. As you get closer and closer to evaluating “all” the possibilities, your largest payout grows just as quickly as [the probability of the event you’re currently computing], shrinks—that is, your payout goes by a factor of 2 every time, just as the probability of your event goes by a factor of ¹⁄₂ every time. So you’ll continue doing the sum forever, ¹⁄₂ + ¹⁄₂ + ¹⁄₂, and never converge on an expected value.

In 1738, Daniel Bernoulli solved this problem by proposing a model of an expected utility function as paradigmatically bounded over outcome-space. If the expected utility you assign to some event, has to drop off logarithmically with its probability, you can’t be suckered into paying unbounded amounts for infinitesimally probable outcomes the way Nicolaus’s thought experiment [ later called the St. Petersburg paradox because of how Daniel framed it ] suggests.

VI. von Neumann and Morgenstern

In 1943, John von Neumann [with Oskar Morgenstern] wrote in Theory of Games and Economic Behavior about why he thought we shouldn’t dispense with utility when trying to predict the economy, even though we can’t measure cross-individual utility like a physical observable:

“The [ . . . ] difficulties of the notion of utility, and particularly of the attempts to describe it as a number, are well known [ . . . ] It is sometimes claimed in economic literature that discussions of the notions of utility and preference are altogether unnecessary, since these are purely verbal definitions with no empirically observable consequences, i.e., entirely tautological. It does not seem to us that these notions are qualitatively inferior to certain well established and indispensable notions in physics, like force, mass, charge, etc. That is, while they are in their immediate form merely definitions, they become subject to empirical control through the theories which are built upon them” [von Neumann and Morgenstern, 8-9]

“[W]e wish to describe the fundamental concept of individual preferences by the use of a rather far-reaching notion of utility. Many economists will feel that we are assuming far too much [ by purporting to treat utility synthetically ], and that our standpoint is a retrogression from the more cautious modern technique of ‘indifference curves’ [ . . . ]” [von Neumann and Morgenstern, 16]

Von Neumann and Morgenstern went on to argue that, if individuals have a single binary preference over not only every pair of events, but also every pair of combinations of events with stated probabilities, this implies that each individual [taken as static] must have a single coherent ordering on preferences [since having to express a binary relationship between expected-combinations-of-events, constrains the binary relationships it is sensible to express over their atomic components]:

“[A] numerical utility is dependent upon the possibility of comparing differences in utilities [ note: in the previous section, von Neumann was apologizing for the present inability to compare differences in assigned utilities across individuals [by some mechanism more direct than dollar-value market prices] ]. This may seem—and indeed is—a more far-reaching assumption than that of a mere ability to state [subjective] preferences. [ . . . ]

Let us for the moment accept the picture of an individual whose system of preferences is all-embracing and complete, i.e. who, for any two objects or rather any two imagined events, possesses a clear intuition of preference.

More precisely we expect him, for any two alternative events which are put before him as possibilities, to be able to tell which of the two he prefers.

It is a very natural extension of this picture to permit such an individual to compare not only events, but even combinations of events with stated probabilities.

By a combination of two events we mean this: Let the two events be denoted by B and C [which are] 50%-50%. Then the “combination” is the prospect of seeing B occur with a probability of 50% and (if B does not occur) C with the (remaining) probability of 50%. [ . . . ]

It is clear that if [the individual] prefers A to B and also to C, then he will prefer [A] to the above combination as well; similarly, if he prefers [both] B [and] C to A, then he will prefer the combination [ 50% B, 50% C ] [ to A ], too. But if he should prefer A to, say, B, but at the same time C to A, then any assertion about his preference of A against the combination contains fundamentally new information. Specifically: if he now prefers A to the 50-50 combination of B and C, this provides a plausible base for the numerical estimate that his preference of A over B is in excess of his preference of C over A.” [von Neumann and Morgenstern, 18] [emphases mine]

These arguments form the basic verbal backbone of the 4 mathematical axioms von Neumann and Morgenstern assumed later in the book to construct their notion of a utility-maximizing individual, as having a single universal scale of preferences, which they always act to push reality upward through.

The four axioms are as follows [quoting Wikipedia]:

[1] Completeness: For any lotteries $L$ and $M$ , either $L$ ⪰ $M$ or $M$ ⪰ $L$

[2] Transitivity: If $L$ ⪰ $M$ and $M$ ⪰ $N$ , then $L$ ⪰ $N$

[3] Continuity: If $L$ ⪯ $M$ ⪯ $N$ , then there exists a probability $p \in [0, 1]$ such that $p L + (1 - p) N$ ~ $M$

[ i.e., if we prefer lottery $N$ to lottery $M$ to lottery $L$ , then there must exist reciprocal probabilities $p$ and $(1 - p)$ of $L$ and $N$ , respectively, occurring, such that they combine together into a lottery which we assess to be about as good as $M$ - and this is the sense in which lottery $M$ can be said to be pinned down between the less-preferable $L$ and the more-preferable $N$ ]

[4] Independence: For any $N$ and $p \in [0, 1)$ (with the “irrelevant” part of the lottery underlined): $L$ ⪯ $N$ if and only if $(1 - p) L + p M - -- -$ ⪯ $(1 - p) N + p M - -- -$

[ meaning, we don’t have to think about everything that will happen in every branch of possibility [ in this case lottery $M$ ] to make local preference comparisons over what will happen in one specific branch of possibility, as long as the preferences we are comparing don’t build on any preferences over events in the excluded branch[es]. ]

Von Neumann and Morgenstern proved that, for any agent that assigns utilities [later called VNM-utilities] consistently with these 4 axioms, preferences over complex combinations-of-events-with-stated-probabilities [ lotteries/combinations ] are uniquely determined by preferences over simple combinations.

[ The last two sections here, Sections V and VI, describe why we on LessWrong generally tend to speak in terms of utility, rather than value or valence.

The phrase “utility function” itself is more memetically traceable to von Neumann and Morgenstern than to Daniel Bernoulli, but von Neumann and Morgenstern made no claim that agents in full generality must assign utility in the manner they described in order to avoid being Dutch-booked—only agents who assign preferences over combinations of outcomes according to certain axioms. By contrast, Daniel Bernoulli argued that all agents must diminish the utility they assign to an outcome faster than its probability diminishes, in order to avoid vulnerability to some exploit [the St. Petersburg paradox].

So when we talk about a “utility function” as something that we must have, we’re primarily talking about avoiding the St. Petersburg paradox, rather than VNM-coherence—unless we want to introduce the now-nonstandard firm assumption that you and I do indeed assign probabilities over combinations of outcomes in a regular way. ]

Value/​Utility: A History