Replying to one of Will’s edits on account of my comments to the earlier draft:
Finally, in a comment on a draft of this note, Abram Demski said that: “The notion of expected utility for which FDT is supposed to do well (at least, according to me) is expected utility with respect to the prior for the decision problem under consideration.” If that’s correct, it’s striking that this criterion isn’t mentioned in the paper. But it also doesn’t seem compelling as a principle by which to evaluate between decision theories, nor does it seem FDT even does well by it. To see both points: suppose I’m choosing between an avocado sandwich and a hummus sandwich, and my prior was that I prefer avocado, but I’ve since tasted them both and gotten evidence that I prefer hummus. The choice that does best in terms of expected utility with respect to my prior for the decision problem under consideration is the avocado sandwich (and FDT, as I understood it in the paper, would agree). But, uncontroversially, I should choose the hummus sandwich, because I prefer hummus to avocado.
Yeah, the thing is, the FDT paper focused on examples where “expected utility according to the prior” becomes an unclear notion due to logical uncertainty issues. It wouldn’t have made sense for the FDT paper to focus on that, given the desire to put the most difficult issues into focus. However, FDT is supposed to accomplish similar things to UDT, and UDT provides the more concrete illustration.
The policy that does best in expected utility according to the prior is the policy of taking whatever you like. In games of partial information, decisions are defined as functions of information states; and in the situation as described, there are separate information states for liking hummus and liking avocado. Choosing the one you like achieves a higher expected utility according to the prior, in comparison to just choosing avocado no matter what. In this situation, optimizing the decision in this way is equivalent to updating on the information; but, not always (as in transparent newcomb, Bomb, and other such problems).
To re-state that a different way: in a given information state, UDT is choosing what to do as a function of the information available, and judging the utility of that choice according to the prior. So, in this scenario, we judge the expected utility of selecting avocado in response to liking hummus. This is worse (according to the prior!) than selecting hummus in response to liking hummus.
So, this is an interesting one. I could make the argument that UDT would actually suggest taking the opposite of the one you like currently.
It depends on how far you think the future (and yourself) will extend. You can reason that if you were to like both hummus and avocado, you should take both. The problem as stated doesn’t appear to exclude this.
If you know the information observed about humans that we tend to get used to what we do repeatedly as part of your prior, then you can predict that you will come to like (whichever of avocado or hummus that you don’t currently like), if you repeatedly choose to consume it.
Then since there’s no particular reason why doing this would make you later prefer the other option less (and indeed, a certain amount of delayed gratification can increase later enjoyment), in order to achieve the most total utility you would take either both together if you predicted you would like that more at the immediate decision point, or if you are indifferent between both and the unappealing one, then you should take only the unappealing one because doing that more often will allow you to later obtain more utility.
I think this would be the recommendation of UDT if the prior were to say that you would face similar choices to this one “sufficiently often”.
This is why, for example, I almost always eat salads/greens or whichever part of a meal is less appealing before the later, more enjoyable part—you get more utility both immediately (over the course of the meal) and long term by not negatively preferring the unappealing food option so much.
Replying to one of Will’s edits on account of my comments to the earlier draft:
Yeah, the thing is, the FDT paper focused on examples where “expected utility according to the prior” becomes an unclear notion due to logical uncertainty issues. It wouldn’t have made sense for the FDT paper to focus on that, given the desire to put the most difficult issues into focus. However, FDT is supposed to accomplish similar things to UDT, and UDT provides the more concrete illustration.
The policy that does best in expected utility according to the prior is the policy of taking whatever you like. In games of partial information, decisions are defined as functions of information states; and in the situation as described, there are separate information states for liking hummus and liking avocado. Choosing the one you like achieves a higher expected utility according to the prior, in comparison to just choosing avocado no matter what. In this situation, optimizing the decision in this way is equivalent to updating on the information; but, not always (as in transparent newcomb, Bomb, and other such problems).
To re-state that a different way: in a given information state, UDT is choosing what to do as a function of the information available, and judging the utility of that choice according to the prior. So, in this scenario, we judge the expected utility of selecting avocado in response to liking hummus. This is worse (according to the prior!) than selecting hummus in response to liking hummus.
So, this is an interesting one. I could make the argument that UDT would actually suggest taking the opposite of the one you like currently.
It depends on how far you think the future (and yourself) will extend. You can reason that if you were to like both hummus and avocado, you should take both. The problem as stated doesn’t appear to exclude this.
If you know the information observed about humans that we tend to get used to what we do repeatedly as part of your prior, then you can predict that you will come to like (whichever of avocado or hummus that you don’t currently like), if you repeatedly choose to consume it.
Then since there’s no particular reason why doing this would make you later prefer the other option less (and indeed, a certain amount of delayed gratification can increase later enjoyment), in order to achieve the most total utility you would take either both together if you predicted you would like that more at the immediate decision point, or if you are indifferent between both and the unappealing one, then you should take only the unappealing one because doing that more often will allow you to later obtain more utility.
I think this would be the recommendation of UDT if the prior were to say that you would face similar choices to this one “sufficiently often”.
This is why, for example, I almost always eat salads/greens or whichever part of a meal is less appealing before the later, more enjoyable part—you get more utility both immediately (over the course of the meal) and long term by not negatively preferring the unappealing food option so much.