From your perspective, you are now in Scenario 1B. Having observed the coin and updated on its state, you now think you have a 90% chance of getting $5 million and a 10% chance of getting nothing.
Reflectively stable agents are updateless. When they make an observation, they do not limit their caring as though all the possible worlds where their observation differs do not exist.
As far as I know, every argument for utility assumes (or implies) that whenever you make an observation, you stop caring about the possible worlds where that observation went differently.
The original Timeless Decision Theory was not updateless. Nor were any of the more traditional ways of thinking about decision. Updateless Decision Theory, and subsequent decision theories corrected this mistake.
Von Neumann did not notice this mistake because he was too busy inventing the entire field. The point where we discover updatelessness is the point where we are supposed to realize that all of utility theory is wrong. I think we failed to notice.
Ironically the community that was the birthplace of updatelessness became the flag for taking utility seriously. (To be fair, this probably is the birthplace of updatelessness because we took utility seriously.)
Unfortunately, because utility theory is so simple, and so obviously correct if you haven’t thought about updatelessness, it ended up being assumed all over the place, without tracking the dependency. I think we use a lot of concepts that are built on the foundation of utility without us even realizing it.
(Note that I am saying here that utility theory is a theoretical mistake! This is much stronger than just saying that humans don’t have utility functions.)
I notice that I’m confused. I’ve recently read the paper “Functional decision theory...” and it’s formulated explicitly in terms of expected utility maximization.
FDT and UDT are formulated in terms of expected utility. I am saying that the they advocate for a way of thinking about the world that makes it so that you don’t just Bayesian update on your observations, and forget about the other possible worlds.
Once you take on this worldview, the Dutch books that made you believe in expected utility in the first place are less convincing, so maybe we want to rethink utility.
I don’t know what the FDT authors were thinking, but it seems like they did not propagate the consequences of the worldview into reevaluating what preferences over outcomes look like.
That depends on what you mean by “suitably coherent.” If you mean they need to satisfy the independence vNM axiom, then yes. But the point is that I don’t see any good argument why updateless agents should satisfy that axiom. The argument for that axiom passes through wanting to have a certain relationship with Bayesian updating.
Also, if by “have a utility function” you mean something other than “try to maximize expected utility,” I don’t know what you mean. To me, the cardinal (as opposed to ordinal) structure of preferences that makes me want to call something a “utility function” is about how to choose between lotteries.
Ah okay, interesting. Do you think that updateless agents need not accept any separability axiom at all? And if not, what justifies using the EU framework for discussing UDT agents?
In many discussions on LW about UDT, it seems that a starting point is that agent is maximising some notion of expected utility, and the updatelessness comes in via the EU formula iterating over policies rather than actions. But if we give up on some separability axiom, it seems that this EU starting point is not warranted, since every major EU representation theorem needs some version of separability.
You could take as an input parameter to UDT a preference ordering over lotteries that does not satisfy the independence axiom, but is a total order (or total preorder if you want ties). Each policy you can take results in a lottery over outcomes, and you take the policy that gives your favorite lottery. There is no need for the assumption that your preferences over lotteries is vNM.
Note that I don’t think that we really understand decision theory, and have a coherent proposal. The only thing I feel like I can say confidently is that if you are convinced by the style of argument that is used to argue for the independence axiom, then you should probably also be convinced by arguments that cause you to be updateful and thus not reflectively stable.
If the preference ordering over lotteries violates independence, then it will not be representable as maximising EU with respect to the probabilities in the lotteries (by the vNM theorem). Do you think it’s a mistake then to think of UDT as “EU maximisation, where the thing you’re choosing is policies”? If so, I believe this is the most common way UDT is framed in LW discussions, and so this would be a pretty important point for you to make more visibly (unless you’ve already made this point before in a post, in which case I’d love to read it).
I think UDT is as you say. I think it is also important to clarify that you are not updating on your observations when you decide on a policy. (If you did, it wouldn’t really be a function from observations to actions, but it is important to emphasize in UDT.)
Note that I am using “updateless” differently than “UDT”. By updateless, I mostly mean anything that is not performing Bayesian updates and forgetting the other possible worlds when it makes observations. UDT is more of a specific proposal. “Updateless” is more of negative property, defined by lack of updating.
I have been trying to write a big post on utility, and haven’t yet, and decided it would be good to give a quick argument here because of the question. The only posts I remember making against utility are in the geometric rationality sequence, especially this post.
Thanks, the clarification of UDT vs. “updateless” is helpful.
But now I’m a bit confused as to why you would still regard UDT as “EU maximisation, where the thing you’re choosing is policies”. If I have a preference ordering over lotteries that violates independence, the vNM theorem implies that I cannot be represented as maximising EU.
In fact, after reading Vladimir_Nesov’s comment, it doesn’t even seem fully accurate to view UDT taking in a preference ordering over lotteries. Here’s the way I’m thinking of UDT: your prior over possible worlds uniquely determines the probabilities of a single lottery L, and selecting a global policy is equivalent to choosing the outcomes of this lottery L. Now different UDT agents may prefer different lotteries, but this is in no sense expected utility maximisation. This is simply: some UDT agents think one lottery is the best, other might think another is the best. There is nothing in this story that resembles a cardinal utility function over outcomes that the agents are multiplying with their prior probabilities to maximise EU with respect to.
It seems that to get an EU representation of UDT, you need to impose coherence on the preference ordering over lotteries (i.e. over different prior distributions), but since UDT agents come with some fixed prior over worlds which is not updated, it’s not at all clear why rationality would demand coherence in your preference between lotteries (let alone coherence that satisfies independence).
Yeah, I don’t have a specific UDT proposal in mind. Maybe instead of “updateless” I should say “the kind of mind that might get counterfactually mugged” as in this example.
To ask for decisions to be coherent, there need to be multiple possible situations in which decisions could be made, coherently across these situations or not. A UDT agent that picks a policy faces a single decision in a single possible situation. There is nothing else out there for the decision in this situation to be coherent with.
The options offered for the decision could be interpreted as lotteries over outcomes, but there is still only one decision to pick one lottery among them all, instead of many situations where the decision is to pick among a particular smaller selection of lotteries, different in each situation. So asking for coherence means asking what the updateless agent would do if most policies could be suddenly prohibited just before the decision (but after its preference is settled), if it were to update on the fact that only particular policies remained as options, which is not what actually happens.
I am not sure if there is any disagreement in this comment. What you say sounds right to me. I agree that UDT does not really set us up to want to talk about “coherence” in the first place, which makes it weird to have it be formalized in term of expected utility maximization.
This does not make me think intelligent/rational agents will/should converge to having utility.
I think coherence of unclear kind is an important principle that needs a place in any decision theory, and it motivates something other than pure updatelessness. I’m not sure how your argument should survive this. The perspective of expected utility and the perspective of updatelessness both have glaring flaws, respectively unwarranted updatefulness and lack of a coherence concept. They can’t argue against each other in their incomplete forms. Expected utility is no more a mistake than updatelessness.
Do you expect learned ML systems to be updateless?
It seems plausible to me that updatelessness of agents is just as “disconnected from reality” of actual systems as EU maximization. Would you disagree?
No, at least probably not at the time that we lose all control.
However, I expect that systems that are self-transparent and can easily sellf-modify might quickly converge to reflective stability (and thus updatelessness). They might not, but I think the same arguments that might make you think they would develop a utility function also can be used to argue that they would develop updatelessness (and thus possibly also not develop a utility function).
I’m confused about the example you give. In the paragraph, Eliezer is trying to show that you ought to accept the independence axiom, cause you can be Dutch booked if you don’t. I’d think if you’re updateless, that means you already accept the independence axiom (cause you wouldn’t be time-consistent otherwise).
And in that sense it seems reasonable to assume that someone who doesn’t already accept the independence axiom is also not updateless.
I haven’t followed this very close, so I’m kinda out-of-the-loop… Which part of UDT/updatelessness says “don’t go for the most utility” (no-maximization) and/or “utility cannot be measured / doesn’t exist” (no-”foundation of utility”, debatably no-consequentialism)? Or maybe “utility” here means something else?
As far as I know, every argument for utility assumes (or implies) that whenever you make an observation, you stop caring about the possible worlds where that observation went differently.
Are you just referring to the VNM theorems or are there other theorems you have in mind?
Note for self: It seems like the independence condition breaks for counterfactual mugging assuming you think we should pay. Assume P is paying $50 and N is not paying, M is receiving $1 million if you would have paid in the counterfactual and zero otherwise. We have N>P but 0.5P+0.5M>0.5N+0.5M in contradiction to independence. The issue is that the value of M is not independent of the choice between P and N.
Reflectively stable agents are updateless. When they make an observation, they do not limit their caring as though all the possible worlds where their observation differs do not exist.
This is very surprising to me! Perhaps I misunderstand what you mean by “caring,” but: an agent who’s made one observation is utterly unable[1] to interact with the other possible-worlds where the observation differed; and it seems crazy[1] to choose your actions based on something they can’t affect; and “not choosing my actions based on X” is how I would define “not caring about X.”
Aside from “my decisions might be logically-correlated with decisions that agents in those worlds make (e.g. clone-prisoner’s-dilemma),” or “I am locked into certain decisions that a CDT agent would call suboptimal, because of a precommitment I made (e.g. Newcomb)” or other fancy decision-theoretic stuff. But that doesn’t seem relevant to Eliezer’s lever-coin-flip scenario you link to?
Here is a situation where you make an “observation” and can still interact with the other possible worlds. Maybe you do not want to call this an observation, but if you don’t call it an observation, then true observations probably never really happen in practice.
I was not trying to say that is relevant to the coin flip directly. I was trying to say that the move used to justify the coin flip is the same move that is rejected in other contexts, and so we should open to the idea of agents that refuse to make that move, and thus might not have utility.
My take is that the concept of expected utility maximization is a mistake. In Eliezer’s Coherent decisions imply consistent utilities, you can see the mistake where he writes:
Reflectively stable agents are updateless. When they make an observation, they do not limit their caring as though all the possible worlds where their observation differs do not exist.
As far as I know, every argument for utility assumes (or implies) that whenever you make an observation, you stop caring about the possible worlds where that observation went differently.
The original Timeless Decision Theory was not updateless. Nor were any of the more traditional ways of thinking about decision. Updateless Decision Theory, and subsequent decision theories corrected this mistake.
Von Neumann did not notice this mistake because he was too busy inventing the entire field. The point where we discover updatelessness is the point where we are supposed to realize that all of utility theory is wrong. I think we failed to notice.
Ironically the community that was the birthplace of updatelessness became the flag for taking utility seriously. (To be fair, this probably is the birthplace of updatelessness because we took utility seriously.)
Unfortunately, because utility theory is so simple, and so obviously correct if you haven’t thought about updatelessness, it ended up being assumed all over the place, without tracking the dependency. I think we use a lot of concepts that are built on the foundation of utility without us even realizing it.
(Note that I am saying here that utility theory is a theoretical mistake! This is much stronger than just saying that humans don’t have utility functions.)
What should I read to learn about propositions like “Reflectively stable agents are updateless” and “utility theory is a theoretical mistake”?
Did you end up finding any resources related to this?
No.
re: #2, maybe https://www.lesswrong.com/posts/A8iGaZ3uHNNGgJeaD/an-orthodox-case-against-utility-functions
but it doesn’t seem like it’s quite what Scott is talking about here. I’d love to hear more from him.
I notice that I’m confused. I’ve recently read the paper “Functional decision theory...” and it’s formulated explicitly in terms of expected utility maximization.
FDT and UDT are formulated in terms of expected utility. I am saying that the they advocate for a way of thinking about the world that makes it so that you don’t just Bayesian update on your observations, and forget about the other possible worlds.
Once you take on this worldview, the Dutch books that made you believe in expected utility in the first place are less convincing, so maybe we want to rethink utility.
I don’t know what the FDT authors were thinking, but it seems like they did not propagate the consequences of the worldview into reevaluating what preferences over outcomes look like.
Don’t updateless agents with suitably coherent preferences still have utility functions?
That depends on what you mean by “suitably coherent.” If you mean they need to satisfy the independence vNM axiom, then yes. But the point is that I don’t see any good argument why updateless agents should satisfy that axiom. The argument for that axiom passes through wanting to have a certain relationship with Bayesian updating.
Also, if by “have a utility function” you mean something other than “try to maximize expected utility,” I don’t know what you mean. To me, the cardinal (as opposed to ordinal) structure of preferences that makes me want to call something a “utility function” is about how to choose between lotteries.
Yeah by “having a utility function” I just mean “being representable as trying to maximise expected utility”.
Ah okay, interesting. Do you think that updateless agents need not accept any separability axiom at all? And if not, what justifies using the EU framework for discussing UDT agents?
In many discussions on LW about UDT, it seems that a starting point is that agent is maximising some notion of expected utility, and the updatelessness comes in via the EU formula iterating over policies rather than actions. But if we give up on some separability axiom, it seems that this EU starting point is not warranted, since every major EU representation theorem needs some version of separability.
You could take as an input parameter to UDT a preference ordering over lotteries that does not satisfy the independence axiom, but is a total order (or total preorder if you want ties). Each policy you can take results in a lottery over outcomes, and you take the policy that gives your favorite lottery. There is no need for the assumption that your preferences over lotteries is vNM.
Note that I don’t think that we really understand decision theory, and have a coherent proposal. The only thing I feel like I can say confidently is that if you are convinced by the style of argument that is used to argue for the independence axiom, then you should probably also be convinced by arguments that cause you to be updateful and thus not reflectively stable.
Okay this is very clarifying, thanks!
If the preference ordering over lotteries violates independence, then it will not be representable as maximising EU with respect to the probabilities in the lotteries (by the vNM theorem). Do you think it’s a mistake then to think of UDT as “EU maximisation, where the thing you’re choosing is policies”? If so, I believe this is the most common way UDT is framed in LW discussions, and so this would be a pretty important point for you to make more visibly (unless you’ve already made this point before in a post, in which case I’d love to read it).
I think UDT is as you say. I think it is also important to clarify that you are not updating on your observations when you decide on a policy. (If you did, it wouldn’t really be a function from observations to actions, but it is important to emphasize in UDT.)
Note that I am using “updateless” differently than “UDT”. By updateless, I mostly mean anything that is not performing Bayesian updates and forgetting the other possible worlds when it makes observations. UDT is more of a specific proposal. “Updateless” is more of negative property, defined by lack of updating.
I have been trying to write a big post on utility, and haven’t yet, and decided it would be good to give a quick argument here because of the question. The only posts I remember making against utility are in the geometric rationality sequence, especially this post.
Thanks, the clarification of UDT vs. “updateless” is helpful.
But now I’m a bit confused as to why you would still regard UDT as “EU maximisation, where the thing you’re choosing is policies”. If I have a preference ordering over lotteries that violates independence, the vNM theorem implies that I cannot be represented as maximising EU.
In fact, after reading Vladimir_Nesov’s comment, it doesn’t even seem fully accurate to view UDT taking in a preference ordering over lotteries. Here’s the way I’m thinking of UDT: your prior over possible worlds uniquely determines the probabilities of a single lottery L, and selecting a global policy is equivalent to choosing the outcomes of this lottery L. Now different UDT agents may prefer different lotteries, but this is in no sense expected utility maximisation. This is simply: some UDT agents think one lottery is the best, other might think another is the best. There is nothing in this story that resembles a cardinal utility function over outcomes that the agents are multiplying with their prior probabilities to maximise EU with respect to.
It seems that to get an EU representation of UDT, you need to impose coherence on the preference ordering over lotteries (i.e. over different prior distributions), but since UDT agents come with some fixed prior over worlds which is not updated, it’s not at all clear why rationality would demand coherence in your preference between lotteries (let alone coherence that satisfies independence).
Yeah, I don’t have a specific UDT proposal in mind. Maybe instead of “updateless” I should say “the kind of mind that might get counterfactually mugged” as in this example.
To ask for decisions to be coherent, there need to be multiple possible situations in which decisions could be made, coherently across these situations or not. A UDT agent that picks a policy faces a single decision in a single possible situation. There is nothing else out there for the decision in this situation to be coherent with.
The options offered for the decision could be interpreted as lotteries over outcomes, but there is still only one decision to pick one lottery among them all, instead of many situations where the decision is to pick among a particular smaller selection of lotteries, different in each situation. So asking for coherence means asking what the updateless agent would do if most policies could be suddenly prohibited just before the decision (but after its preference is settled), if it were to update on the fact that only particular policies remained as options, which is not what actually happens.
I am not sure if there is any disagreement in this comment. What you say sounds right to me. I agree that UDT does not really set us up to want to talk about “coherence” in the first place, which makes it weird to have it be formalized in term of expected utility maximization.
This does not make me think intelligent/rational agents will/should converge to having utility.
I think coherence of unclear kind is an important principle that needs a place in any decision theory, and it motivates something other than pure updatelessness. I’m not sure how your argument should survive this. The perspective of expected utility and the perspective of updatelessness both have glaring flaws, respectively unwarranted updatefulness and lack of a coherence concept. They can’t argue against each other in their incomplete forms. Expected utility is no more a mistake than updatelessness.
Do you expect learned ML systems to be updateless?
It seems plausible to me that updatelessness of agents is just as “disconnected from reality” of actual systems as EU maximization. Would you disagree?
No, at least probably not at the time that we lose all control.
However, I expect that systems that are self-transparent and can easily sellf-modify might quickly converge to reflective stability (and thus updatelessness). They might not, but I think the same arguments that might make you think they would develop a utility function also can be used to argue that they would develop updatelessness (and thus possibly also not develop a utility function).
I’m confused about the example you give. In the paragraph, Eliezer is trying to show that you ought to accept the independence axiom, cause you can be Dutch booked if you don’t. I’d think if you’re updateless, that means you already accept the independence axiom (cause you wouldn’t be time-consistent otherwise).
And in that sense it seems reasonable to assume that someone who doesn’t already accept the independence axiom is also not updateless.
I haven’t followed this very close, so I’m kinda out-of-the-loop… Which part of UDT/updatelessness says “don’t go for the most utility” (no-maximization) and/or “utility cannot be measured / doesn’t exist” (no-”foundation of utility”, debatably no-consequentialism)? Or maybe “utility” here means something else?
Note that I am not saying here that rational agents can’t have a utility function. I am only saying that they don’t have to.
Are you just referring to the VNM theorems or are there other theorems you have in mind?
Note for self: It seems like the independence condition breaks for counterfactual mugging assuming you think we should pay. Assume P is paying $50 and N is not paying, M is receiving $1 million if you would have paid in the counterfactual and zero otherwise. We have N>P but 0.5P+0.5M>0.5N+0.5M in contradiction to independence. The issue is that the value of M is not independent of the choice between P and N.
This is very surprising to me! Perhaps I misunderstand what you mean by “caring,” but: an agent who’s made one observation is utterly unable[1] to interact with the other possible-worlds where the observation differed; and it seems crazy[1] to choose your actions based on something they can’t affect; and “not choosing my actions based on X” is how I would define “not caring about X.”
Aside from “my decisions might be logically-correlated with decisions that agents in those worlds make (e.g. clone-prisoner’s-dilemma),” or “I am locked into certain decisions that a CDT agent would call suboptimal, because of a precommitment I made (e.g. Newcomb)” or other fancy decision-theoretic stuff. But that doesn’t seem relevant to Eliezer’s lever-coin-flip scenario you link to?
Here is a situation where you make an “observation” and can still interact with the other possible worlds. Maybe you do not want to call this an observation, but if you don’t call it an observation, then true observations probably never really happen in practice.
I was not trying to say that is relevant to the coin flip directly. I was trying to say that the move used to justify the coin flip is the same move that is rejected in other contexts, and so we should open to the idea of agents that refuse to make that move, and thus might not have utility.
Ah, that’s the crucial bit I was missing! Thanks for spelling it out.