The ‘individual rationality condition’ is about the payoffs in equilibrium, not about the strategies. It says that the equilibrium payoff profile must yield to each player at least their minmax payoff. Here, the minmax payoff for a given player is −99.3 (which comes from the player best responding with 30 forever to everyone else setting their dials to 100 forever). The equilibrium payoff is −99 (which comes from everyone setting their dials to 99 forever). Since −99 > −99.3, the individual rationality condition of the Folk Theorem is satisfied.
EOC
Because the meaning of statements does not, in general, consist entirely in observations/anticipated experiences, and it makes sense for people to have various attitudes (centrally, beliefs and desires) towards propositions that refer to unobservable-in-principle things.
Accepting that beliefs should pay rent in anticipated experience does not mean accepting that the meaning of sentences are determined entirely by observables/anticipated experiences. We can have that the meanings of sentences are the propositions they express, and the truth-conditions of propositions are generally states-of-affairs-in-the-world and not just observations/anticipated experiences. Eliezer himself puts it nicely here: “The meaning of a statement is not the future experimental predictions that it brings about, nor isomorphic up to those predictions [...] you can have meaningful statements with no experimental consequences, for example: “Galaxies continue to exist after the expanding universe carries them over the horizon of observation from us.”″
As to how to choose one belief over another, if both beliefs are observationally equivalent in some sense, there are many such considerations. One is our best theories predict it: if our best cosmological theories predict something does not cease to exist the moment it exits our lightcone, then we should assign higher probability to the statement “objects continue to exist outside our lightcone” than the statement “objects vanish at the boundary of our lightcone”. Another is simplicity-based priors: the many-worlds interpretation of quantum mechanics is strictly simpler/has a shorter description length than the Copenhagen interpretation (Many-Worlds = wave function + Schrödinger evolution; Copenhagen interpretation = wave function + Schrödinger evolution + collapse postulate), so we should assign a higher prior to many-worlds than to Copenhagen.
If your concern is instead that attitudes towards such propositions have no behavioural implications and thus cannot in principle be elicited from agents, then the response is to point to the various decision-theoretic representation theorems available in the literature. Take the Jeffrey framework: as long as your preferences over propositions satisfies certain conditions (e.g. Ordering, Averaging), I can derive both a quantitative desirability measure and probability measure, characterising your desire and belief attitudes (respectively) towards the propositions you are considering. The actual procedure to elicit this preference relation looks like asking people to consider and compare actualising various propositions, which we can think of as gambles. For example, a gamble might look like “If the coin lands Heads, then one person comes into existence outside of our future lightcone and experiences bliss; If the coin lands Tails, then one person comes into existence outside of our future lightcone and experiences suffering”. Note, the propositions here can refer to unobservables. Also, it seems reasonable to prefer the above gamble involving a fair coin to the same gamble but with the coin biased towards Tails. Moreover, the procedure to elicit an agent’s attitudes to such propositions merely consists in the agents considering what they would do if they were choosing which of various propositions to bring about, and does not cash out in terms of observations/anticipated experiences.
(As an aside, doing acausal reasoning in general requires agent to have beliefs and desires towards unobservable-in-principle stuff in, e.g. distant parts of our universe, or other Everett branches).
What’s this probability you’re reporting?
Same as Sylvester, though my credence in consciousness-collapse interpretations of quantum mechanics has moved from 0.00001 to 0.000001.
Yeah great point, thanks. We tried but couldn’t really get a set-up where she just learns a phenomenal fact. If you have a way of having the only difference in the ‘Tails, Tuesday’ case be that Mary learns a phenomenal fact, we will edit it in!
Some Variants of Sleeping Beauty
Thanks, the clarification of UDT vs. “updateless” is helpful.
But now I’m a bit confused as to why you would still regard UDT as “EU maximisation, where the thing you’re choosing is policies”. If I have a preference ordering over lotteries that violates independence, the vNM theorem implies that I cannot be represented as maximising EU.
In fact, after reading Vladimir_Nesov’s comment, it doesn’t even seem fully accurate to view UDT taking in a preference ordering over lotteries. Here’s the way I’m thinking of UDT: your prior over possible worlds uniquely determines the probabilities of a single lottery L, and selecting a global policy is equivalent to choosing the outcomes of this lottery L. Now different UDT agents may prefer different lotteries, but this is in no sense expected utility maximisation. This is simply: some UDT agents think one lottery is the best, other might think another is the best. There is nothing in this story that resembles a cardinal utility function over outcomes that the agents are multiplying with their prior probabilities to maximise EU with respect to.
It seems that to get an EU representation of UDT, you need to impose coherence on the preference ordering over lotteries (i.e. over different prior distributions), but since UDT agents come with some fixed prior over worlds which is not updated, it’s not at all clear why rationality would demand coherence in your preference between lotteries (let alone coherence that satisfies independence).
Okay this is very clarifying, thanks!
If the preference ordering over lotteries violates independence, then it will not be representable as maximising EU with respect to the probabilities in the lotteries (by the vNM theorem). Do you think it’s a mistake then to think of UDT as “EU maximisation, where the thing you’re choosing is policies”? If so, I believe this is the most common way UDT is framed in LW discussions, and so this would be a pretty important point for you to make more visibly (unless you’ve already made this point before in a post, in which case I’d love to read it).
Yeah by “having a utility function” I just mean “being representable as trying to maximise expected utility”.
Ah okay, interesting. Do you think that updateless agents need not accept any separability axiom at all? And if not, what justifies using the EU framework for discussing UDT agents?
In many discussions on LW about UDT, it seems that a starting point is that agent is maximising some notion of expected utility, and the updatelessness comes in via the EU formula iterating over policies rather than actions. But if we give up on some separability axiom, it seems that this EU starting point is not warranted, since every major EU representation theorem needs some version of separability.
Don’t updateless agents with suitably coherent preferences still have utility functions?
That’s a coherent utility function, but it seems bizarre. When you’re undergoing extreme suffering, in that moment you’d presumably prefer death to continuing to exist in suffering, almost by nature of what extreme suffering is. Why defer to your current preferences rather than your preferences in such moments?
Also, are you claiming this is just your actual preferences or is this some ethical claim about axiology?
What’s the countervailing good that makes you indifferent between tortured lives and nonexistence? Presumably the extreme suffering is a bad that adds negative value to their lives. Do you think just existing or being conscious (regardless of the valence) is intrinsically very good?
In particular, other alignment researchers tend to think that competitive supervision (e.g. AIs competing for reward to provide assistance in AI control that humans evaluate positively, via methods such as debate and alignment bootstrapping, or ELK schemes).
Unfinished sentence?
I think there’s a typo in the last paragraph of Section I?
And let’s use “≻” to mean “better than” (or, “preferred to,” or “chosen over,” or whatever), “≺” to mean “at least as good as,”
“≺” should be “≽”
Yeah, by “actual utility” I mean the sum of the utilities you get from the outcomes of each decision problem you face. You’re right that if my utility function were defined over lifetime trajectories, then this would amount to quite a substantive assumption, i.e. the utility of each iteration contributes equally to the overall utility and what not.
And I think I get what you mean now, and I agree that for the iterated decisions argument to be internally motivating for an agent, it does require stronger assumptions than the representation theorem arguments. In the standard ‘iterated decisions’ argument, my utility function is defined over outcomes which are the prizes in the lotteries that I choose from in each iterated decision. It simply underspecifies what my preferences over trajectories of decision problems might look like (or whether I even have one). In this sense, the ‘iterated decisions’ argument is not as self-contained as (i.e., requires stronger assumptions than) ‘representation theorem’ arguments, in the sense that representation theorems justify EUM entirely in reference to the agent’s existing attitudes, whereas the ‘iterated decisions’ argument relies on external considerations that are not fixed by the attitudes of the agent.
Does this get at the point you were making?
The assumption is that you want to maximize your actual utility. Then, if you expect to face arbitrarily many i.i.d. iterations of a choice among lotteries over outcomes with certain utilities, picking the lottery with the highest expected utility each time gives you the highest actual utility.
It’s really not that interesting of an argument, nor is it very compelling as a general argument for EUM. In practice, you will almost never face the exact same decision problem, with the same options, same outcomes, same probability, and same utilities, over and over again.
Yeah, that’s a good argument that if your utility is monotonically increasing in some good X (e.g. wealth), then the type of the iterated decision you expect to fact involving lotteries over that good can determine that the best way to maximize your utility is to maximize a particular function (e.g. linear) of that good.
But this is not what the ‘iterated decisions’ argument for EUM amounts to. In a sense, it’s quite a bit less interesting. The ‘iterated decisions’ argument does not start with some weak assumption on your utility function and then attempts to impose more structure on your utility function in iterated choice situations. They don’t assume anything about your utility function, other than that you have one (or can be represented as having one).
All it’s saying is that, if you expect to face arbitrarily many i.i.d. iterations of a choice among lotteries (i.e. known probability distributions) over outcomes that you have assigned utilities to already, you should pick the lottery that has the highest expected utility. Note, the utility assignments do not have to be linear or monotonically increasing in any particular feature of the outcomes (such as the amount of money I gain if that outcome obtains), and that the utility function is basically assumed to be there.
The ‘iterated decisions’-type arguments support EUM in a given decision problem if you expect to face the exact same decision problem over and over again. The ‘representation theorem’ arguments support EUM for a given decision problem, without qualification.
In either case, your utility function is meant to be constructed from your underlying preference relation over the set of alternatives for the given problem. The form of the function can be linear in some things or not, that’s something to be determined by your preference relation and not the arguments for EUM.
Oh yeah, the Folk Theorem is totally consistent with the Nash equilibrium of the repeated game here being ‘everyone plays 30 forever’, since the payoff profile ‘-30 for everyone’ is feasible and individually-rational. In fact, this is the unique NE of the stage game and also the unique subgame-perfect NE of any finitely repeated version of the game.
To sustain ‘-30 for everyone forever’, I don’t even need a punishment for off-equilibrium deviations. The strategy for everyone can just be ‘unconditionally play 30 forever’ and there is no profitable unilateral deviation for anyone here.
The relevant Folk Theorem here just says that any feasible and individually-rational payoff profile in the stage game (i.e. setting dials at a given time) is a Nash equilibrium payoff profile in the infinitely repeated game. Here, that’s everything in the interval [-99.3, −30] for a given player. The theorem itself doesn’t really help constrain our expectations about which of the possible Nash equilibria will in fact be played in the game.