I saw an earlier draft of this, and hope to write an extensive response at some point. For now, the short version:
As I understand it, FDT was intended as an umbrella term for MIRI-style decision theories, which illustrated the critical points without making too many commitments. So, the vagueness of FDT was partly by design.
I think UDT is a more concrete illustration of the most important points relevant to this discussion.
The optimality notion of UDT is clear. “UDT gets the most utility” means “UDT gets the highest expected value with respect to its own prior”. This seems quite well-defined, hopefully addressing your (VII).
FDT focused on the weird logical cases, which is in fact a major part of the motivation for MIRI-style decision theory. However, UDT for single-player extensive-form games actually gets at a lot of what MIRI-style decision theory wants, without broaching the topic of logical counterfactuals or proving-your-own-action directly.
The problems which create a deep indeterminacy seem, to me, to be problems for other decision theories than FDT as well. FDT is trying to face them head-on. But there are big problems for applying EDT to agents who are physically instantiated as computer programs and can prove too much about their own actions.
This also hopefully clarifies the sense in which I don’t think the decisions pointed out in (III) are bizarre. The decisions are optimal according to the very probability distribution used to define the decision problem.
There’s a subtle point here, though, since Will describes the decision problem from an updated perspective—you already know the bomb is in front of you. So UDT “changes the problem” by evaluating “according to the prior”. From my perspective, because the very statement of the Bomb problem suggests that there were also other possible outcomes, we can rightly insist to evaluate expected utility in terms of those chances.
Perhaps this sounds like an unprincipled rejection of the Bomb problem as you state it. My principle is as follows: you should not state a decision problem without having in mind a well-specified way to predictably put agents into that scenario. Let’s call the way-you-put-agents-into-the-scenario the “construction”. We then evaluate agents on how well they deal with the construction.
For examples like Bomb, the construction gives us the overall probability distribution—this is then used for the expected value which UDT’s optimality notion is stated in terms of.
For other examples, as discussed in Decisions are for making bad outcomes inconsistent, the construction simply breaks when you try to put certain decision theories into it. This can also be a good thing; it means the decision theory makes certain scenarios altogether impossible.
The point about “constructions” is possibly a bit subtle (and hastily made); maybe a lot of the disagreement will turn out to be there. But I do hope that the basic idea of UDT’s optimality criterion is actually clear—“evaluate expected utility of policies according to the prior”—and clarifies the situation with FDT as well.
Replying to one of Will’s edits on account of my comments to the earlier draft:
Finally, in a comment on a draft of this note, Abram Demski said that: “The notion of expected utility for which FDT is supposed to do well (at least, according to me) is expected utility with respect to the prior for the decision problem under consideration.” If that’s correct, it’s striking that this criterion isn’t mentioned in the paper. But it also doesn’t seem compelling as a principle by which to evaluate between decision theories, nor does it seem FDT even does well by it. To see both points: suppose I’m choosing between an avocado sandwich and a hummus sandwich, and my prior was that I prefer avocado, but I’ve since tasted them both and gotten evidence that I prefer hummus. The choice that does best in terms of expected utility with respect to my prior for the decision problem under consideration is the avocado sandwich (and FDT, as I understood it in the paper, would agree). But, uncontroversially, I should choose the hummus sandwich, because I prefer hummus to avocado.
Yeah, the thing is, the FDT paper focused on examples where “expected utility according to the prior” becomes an unclear notion due to logical uncertainty issues. It wouldn’t have made sense for the FDT paper to focus on that, given the desire to put the most difficult issues into focus. However, FDT is supposed to accomplish similar things to UDT, and UDT provides the more concrete illustration.
The policy that does best in expected utility according to the prior is the policy of taking whatever you like. In games of partial information, decisions are defined as functions of information states; and in the situation as described, there are separate information states for liking hummus and liking avocado. Choosing the one you like achieves a higher expected utility according to the prior, in comparison to just choosing avocado no matter what. In this situation, optimizing the decision in this way is equivalent to updating on the information; but, not always (as in transparent newcomb, Bomb, and other such problems).
To re-state that a different way: in a given information state, UDT is choosing what to do as a function of the information available, and judging the utility of that choice according to the prior. So, in this scenario, we judge the expected utility of selecting avocado in response to liking hummus. This is worse (according to the prior!) than selecting hummus in response to liking hummus.
So, this is an interesting one. I could make the argument that UDT would actually suggest taking the opposite of the one you like currently.
It depends on how far you think the future (and yourself) will extend. You can reason that if you were to like both hummus and avocado, you should take both. The problem as stated doesn’t appear to exclude this.
If you know the information observed about humans that we tend to get used to what we do repeatedly as part of your prior, then you can predict that you will come to like (whichever of avocado or hummus that you don’t currently like), if you repeatedly choose to consume it.
Then since there’s no particular reason why doing this would make you later prefer the other option less (and indeed, a certain amount of delayed gratification can increase later enjoyment), in order to achieve the most total utility you would take either both together if you predicted you would like that more at the immediate decision point, or if you are indifferent between both and the unappealing one, then you should take only the unappealing one because doing that more often will allow you to later obtain more utility.
I think this would be the recommendation of UDT if the prior were to say that you would face similar choices to this one “sufficiently often”.
This is why, for example, I almost always eat salads/greens or whichever part of a meal is less appealing before the later, more enjoyable part—you get more utility both immediately (over the course of the meal) and long term by not negatively preferring the unappealing food option so much.
I saw an earlier draft of this, and hope to write an extensive response at some point. For now, the short version:
As I understand it, FDT was intended as an umbrella term for MIRI-style decision theories, which illustrated the critical points without making too many commitments. So, the vagueness of FDT was partly by design.
I think UDT is a more concrete illustration of the most important points relevant to this discussion.
The optimality notion of UDT is clear. “UDT gets the most utility” means “UDT gets the highest expected value with respect to its own prior”. This seems quite well-defined, hopefully addressing your (VII).
There are problems applying UDT to realistic situations, but UDT makes perfect sense and is optimal in a straightforward sense for the case of single-player extensive form games. That doesn’t address multi-player games or logical uncertainty, but it is enough for much of Will’s discussion.
FDT focused on the weird logical cases, which is in fact a major part of the motivation for MIRI-style decision theory. However, UDT for single-player extensive-form games actually gets at a lot of what MIRI-style decision theory wants, without broaching the topic of logical counterfactuals or proving-your-own-action directly.
The problems which create a deep indeterminacy seem, to me, to be problems for other decision theories than FDT as well. FDT is trying to face them head-on. But there are big problems for applying EDT to agents who are physically instantiated as computer programs and can prove too much about their own actions.
This also hopefully clarifies the sense in which I don’t think the decisions pointed out in (III) are bizarre. The decisions are optimal according to the very probability distribution used to define the decision problem.
There’s a subtle point here, though, since Will describes the decision problem from an updated perspective—you already know the bomb is in front of you. So UDT “changes the problem” by evaluating “according to the prior”. From my perspective, because the very statement of the Bomb problem suggests that there were also other possible outcomes, we can rightly insist to evaluate expected utility in terms of those chances.
Perhaps this sounds like an unprincipled rejection of the Bomb problem as you state it. My principle is as follows: you should not state a decision problem without having in mind a well-specified way to predictably put agents into that scenario. Let’s call the way-you-put-agents-into-the-scenario the “construction”. We then evaluate agents on how well they deal with the construction.
For examples like Bomb, the construction gives us the overall probability distribution—this is then used for the expected value which UDT’s optimality notion is stated in terms of.
For other examples, as discussed in Decisions are for making bad outcomes inconsistent, the construction simply breaks when you try to put certain decision theories into it. This can also be a good thing; it means the decision theory makes certain scenarios altogether impossible.
The point about “constructions” is possibly a bit subtle (and hastily made); maybe a lot of the disagreement will turn out to be there. But I do hope that the basic idea of UDT’s optimality criterion is actually clear—“evaluate expected utility of policies according to the prior”—and clarifies the situation with FDT as well.
Replying to one of Will’s edits on account of my comments to the earlier draft:
Yeah, the thing is, the FDT paper focused on examples where “expected utility according to the prior” becomes an unclear notion due to logical uncertainty issues. It wouldn’t have made sense for the FDT paper to focus on that, given the desire to put the most difficult issues into focus. However, FDT is supposed to accomplish similar things to UDT, and UDT provides the more concrete illustration.
The policy that does best in expected utility according to the prior is the policy of taking whatever you like. In games of partial information, decisions are defined as functions of information states; and in the situation as described, there are separate information states for liking hummus and liking avocado. Choosing the one you like achieves a higher expected utility according to the prior, in comparison to just choosing avocado no matter what. In this situation, optimizing the decision in this way is equivalent to updating on the information; but, not always (as in transparent newcomb, Bomb, and other such problems).
To re-state that a different way: in a given information state, UDT is choosing what to do as a function of the information available, and judging the utility of that choice according to the prior. So, in this scenario, we judge the expected utility of selecting avocado in response to liking hummus. This is worse (according to the prior!) than selecting hummus in response to liking hummus.
So, this is an interesting one. I could make the argument that UDT would actually suggest taking the opposite of the one you like currently.
It depends on how far you think the future (and yourself) will extend. You can reason that if you were to like both hummus and avocado, you should take both. The problem as stated doesn’t appear to exclude this.
If you know the information observed about humans that we tend to get used to what we do repeatedly as part of your prior, then you can predict that you will come to like (whichever of avocado or hummus that you don’t currently like), if you repeatedly choose to consume it.
Then since there’s no particular reason why doing this would make you later prefer the other option less (and indeed, a certain amount of delayed gratification can increase later enjoyment), in order to achieve the most total utility you would take either both together if you predicted you would like that more at the immediate decision point, or if you are indifferent between both and the unappealing one, then you should take only the unappealing one because doing that more often will allow you to later obtain more utility.
I think this would be the recommendation of UDT if the prior were to say that you would face similar choices to this one “sufficiently often”.
This is why, for example, I almost always eat salads/greens or whichever part of a meal is less appealing before the later, more enjoyable part—you get more utility both immediately (over the course of the meal) and long term by not negatively preferring the unappealing food option so much.