If you’re thinking about the counterfactual world where you do X in the process of deciding whether to do X, let’s call that a first-person counterfactual. If you’re thinking about it in the process of deciding whether another agent A should have done X instead of Y, let’s call that a third-person counterfactual. The definition of, e.g., modal UDT uses first-person counterfactuals, but when we try to prove a theorem showing that modal UDT is “optimal” in some sense, then we need to use third-person counterfactuals.
UDT’s first-person counterfactuals are logical counterfactuals, but our optimality result evaluates UDT by using physical third-party counterfactuals: it asks, would another agent have done better, not, would a different action by the same agent have lead to a better outcome? The former is easier to analyze, but the latter seems to be what we really care about. Nate’s recent post on “global UDT” points towards turning UDT into a notion of third-party counterfactuals, and describes some problems. In this post, I’ll give a fuller UDT-based notion of logical third-party counterfactuals, which at least fails visibly (returns an error) in the kinds of cases Nate describes. However, in a follow-up post I’ll give an example where this definition returns a non-error value which intuitively seems wrong.
Before I start, a historical side note: When Kenny Easwaran visited us for two days and we proved the UDT optimality result, the reason we decided to physical counterfactuals wasn’t actually that we thought these were the better kind of counterfactuals. Rather, we actually thought explicitly about the problem of physical vs. logical third-person counterfactuals on the first morning of Kenny’s visit, and decided to look at the physical counterfactuals case because it seemed easier to reason about. Which turned out to be a great decision, because—to our surprise—we very quickly ended up proving the first version of what later became the modal UDT optimality result!
But today, let’s talk about logical counterfactuals. As Nate points out in his Global UDT post, there’s a sort of duality between first-person and third-person counterfactuals—given a good third-person notion of counterfactuals, you can try to turn it into a first-person notion by writing an agent that evaluates actions according to it, and given a first-person notion you can try to turn it into a third-person notion. So is there a way to turn, say, the first-person counterfactuals of modal UDT into a way to evaluate what would have happened in a certain universe if a certain agent had taken a different action?
Nate’s post describes an algorithm, GlobalUDT(U,A), which tries to tell you what agent A()should have done in order to achieve the best outcome in universe U(). Here, I want to ask a more intermediate question: What would have happened if A() had chosen a different action? Of course, we can then say that the agent should have taken the action that leads to the best possible outcome in this sense, but one advantage of my proposal is that it sometimes says, “I don’t know what would have happened in that case”; in particular, in the cases Nate discusses in his post, my proposal would say that it doesn’t have an answer, rather than giving a wrong answer. (However, in a follow-up post I’ll show that there are cases in which my proposal gives an intuitively incorrect answer.)
So here’s my proposal. Suppose that →A is an m-action agent, that is, a “provably mutually exclusive and exhaustive” (p.m.e.e.) sequence of m closed modal formulas (A1,…,Am), where Ai is interpreted as “the agent takes action i”. “P.m.e.e.” means that it’s provable that exactly one of the m formulas is true. Similarly, →U is an n-outcome universe, i.e., a p.m.e.e. sequence (U1,…,Un) where Uj means “the j’th-best outcome obtains”.
We say that, according to this notion of counterfactuals, action i leads to outcome j if (i) GL⊢Ai→Uj, and (ii) GL⊬¬Ai. So for every i, there are three possible cases:
If there’s exactly one j such that GL⊢Ai→Uj, then we say that action i leads to outcome j.
If GL⊢¬Ai, then we don’t know what would have happened if the agent had taken action i, because we “don’t have enough counterfactuals”: there is no model of PA in which Ai is true (we can think of the models of PA as the “impossible possible worlds” we use to evaluate the impact of different actions). In particular, this is the case if we have both GL⊢Ai→Uj and GL⊢Ai→Uj′, for j≠j′, since this implies GL⊢¬Ai by the assumption that Uj and Uj′ are provably mutually exclusive.
If there’s no j such that GL⊢Ai→Uj, then we don’t know what would have happened if the agent had taken action i, because we have “ambiguous counterfactuals”: there are some distinct j and j′ such that there’s a model of PA in which Ai∧Uj, and a different model in which Ai∧Uj′. (We know that there is a model in which Ai is true, because otherwise we’d have GL⊢¬Ai, which would imply GL⊢Ai→Uj for every j.)
Now, for example, if we consider Nate’s example of an agent that has three possible actions, but always takes the third one (i.e., →A≡(A1,A2,A3)≡(⊥,⊥,⊤)), then it’s clear that our third-person counterfactuals will not fail silently, but rather give the reasonable answer that it’s hard to say what outcome the agent would have achieved if it had returned a different value: for example, say that U14≡⊤∧(⊥∨¬⊤); are some of the ⊤‘s and ⊥’s in the definition of this universe invocations of the agent? Which ones? We might hope that there’s a notion of third-party counterfactuals which can answer questions like this about the real world, but presumably it would need to make more use of the more complicated structure of the real universe; as posed, the question doesn’t seem to have a good answer.
But when we apply this notion to modal UDT, it returns a non-error answer sufficiently often to allow us to show an at least superficially sensible (if rather trivial!) optimality result.
Let’s say that a pair of (→A,→U) is “fully informative” if every i leads to some j according to our notion of counterfactuals. Then, given a fully informative pair, we can say that →A is optimal (according to our notion of counterfactuals!) iff the outcome that →A’s actual action leads to is optimal among the outcomes achievable by any of the available actions.
Now it’s rather straight-forward to see that modal UDT is optimal, in this sense, on a universe →U whenever the pair (→UDT(→U),→U) is fully informative. Recall the way that modal UDT works:
For every outcome j=1 to n (from best to worst):
For every action i=1 to m:
If □(Ai→Uj), then take action i.
If you’re still here, take some default action.
Clearly, in the fully informative case, this algorithm will take the optimal action (in the sense we use here): Suppose that j is optimal, and i leads to j. The search will not find a proof of an implication Ai′→Uj′ with j′<i′, because then j wouldn’t be optimal according to our definition; and the search will terminate when considering the pair (j,i) at the latest; so modal UDT will return some action i∗ for which GL⊢Ai∗→Uj.
I’d like to say that this covers all the cases in which we would expect modal UDT to be optimal, but unfortunately that’s not quite the case. Suppose that there are two actions, A1 and A2, and two outcomes, U1 and U2. In this case, it’s consistent that i=1 leads to j=1, but we don’t have enough counterfactuals about i=2, that is, GL⊢¬A2 (implying that (→A,→U) isn’t fully informative). This is because modal UDT doesn’t have an explicit “playing chicken” step that would make it take action A2 if it can prove that it doesn’t take this action. Now, if we did not have GL⊢A1→U1, then GL⊢¬A2 would imply that the agent would take action 2 (because ¬A2 implies A2→U1), which would lead to a contradiction (the agent takes an action that it provably doesn’t take), so we can rule out that case; but the case of GL⊢A1→U1 plus GL⊢¬A2 is consistent.
So let’s say that a pair (→A,→U) is “sufficiently informative” if either it’s fully informative or if there is some action i such that GL⊢Ai→U1. Then we can say that →A is optimal if either (i) (→A,→U) is fully informative and →A is optimal in the sense discussed earlier, or (ii) N⊨U1, that is, the agent actually obtains the best possible outcome. With these definitions, we can show that modal UDT is optimal whenever (→UDT(→U),→U) is sufficiently informative.
The reasoning is simple. In the fully informative case, our earlier proof works. In the other case, there’s some i such that GL⊢Ai→U1, so the agent’s search is certainly going to stop when it considers Ai→U1 at the latest; in other words, it’s going to stop at some i∗≤i such that GL⊢Ai∗→U1, and the agent is going to output that action i∗; i.e., we’ll have N⊨Ai∗. But since GL is sound, we also have N⊨Ai∗→U1, and hence N⊨U1, showing optimality in the extended sense.
It’s not surprising that modal UDT is “optimal” in this sense, of course! Nevertheless, as a conceptual tool, it seems useful to have this definition of logical third-person counterfactuals, to complement the first-person notion of modal UDT.
However, my not-so-secret agenda for going through this in detail is that in a follow-up post, I’ll show that there are universes →U such that (→UDT(→U),→U) is fully informative, but UDT still does the intuitively incorrect thing—because the notion of counterfactuals (and hence the notion of optimality) I’ve defined in this post doesn’t agree with intuition as well as we’d like. This failure turns out to be clearer in the context of the third-person counterfactuals described in this post than in modal UDT’s first-person ones.
Third-person counterfactuals
If you’re thinking about the counterfactual world where you do X in the process of deciding whether to do X, let’s call that a first-person counterfactual. If you’re thinking about it in the process of deciding whether another agent A should have done X instead of Y, let’s call that a third-person counterfactual. The definition of, e.g., modal UDT uses first-person counterfactuals, but when we try to prove a theorem showing that modal UDT is “optimal” in some sense, then we need to use third-person counterfactuals.
UDT’s first-person counterfactuals are logical counterfactuals, but our optimality result evaluates UDT by using physical third-party counterfactuals: it asks, would another agent have done better, not, would a different action by the same agent have lead to a better outcome? The former is easier to analyze, but the latter seems to be what we really care about. Nate’s recent post on “global UDT” points towards turning UDT into a notion of third-party counterfactuals, and describes some problems. In this post, I’ll give a fuller UDT-based notion of logical third-party counterfactuals, which at least fails visibly (returns an error) in the kinds of cases Nate describes. However, in a follow-up post I’ll give an example where this definition returns a non-error value which intuitively seems wrong.
Before I start, a historical side note: When Kenny Easwaran visited us for two days and we proved the UDT optimality result, the reason we decided to physical counterfactuals wasn’t actually that we thought these were the better kind of counterfactuals. Rather, we actually thought explicitly about the problem of physical vs. logical third-person counterfactuals on the first morning of Kenny’s visit, and decided to look at the physical counterfactuals case because it seemed easier to reason about. Which turned out to be a great decision, because—to our surprise—we very quickly ended up proving the first version of what later became the modal UDT optimality result!
But today, let’s talk about logical counterfactuals. As Nate points out in his Global UDT post, there’s a sort of duality between first-person and third-person counterfactuals—given a good third-person notion of counterfactuals, you can try to turn it into a first-person notion by writing an agent that evaluates actions according to it, and given a first-person notion you can try to turn it into a third-person notion. So is there a way to turn, say, the first-person counterfactuals of modal UDT into a way to evaluate what would have happened in a certain universe if a certain agent had taken a different action?
Nate’s post describes an algorithm,
GlobalUDT(U,A)
, which tries to tell you what agent A() should have done in order to achieve the best outcome in universe U(). Here, I want to ask a more intermediate question: What would have happened if A() had chosen a different action? Of course, we can then say that the agent should have taken the action that leads to the best possible outcome in this sense, but one advantage of my proposal is that it sometimes says, “I don’t know what would have happened in that case”; in particular, in the cases Nate discusses in his post, my proposal would say that it doesn’t have an answer, rather than giving a wrong answer. (However, in a follow-up post I’ll show that there are cases in which my proposal gives an intuitively incorrect answer.)So here’s my proposal. Suppose that →A is an m-action agent, that is, a “provably mutually exclusive and exhaustive” (p.m.e.e.) sequence of m closed modal formulas (A1,…,Am), where Ai is interpreted as “the agent takes action i”. “P.m.e.e.” means that it’s provable that exactly one of the m formulas is true. Similarly, →U is an n-outcome universe, i.e., a p.m.e.e. sequence (U1,…,Un) where Uj means “the j’th-best outcome obtains”.
We say that, according to this notion of counterfactuals, action i leads to outcome j if (i) GL⊢Ai→Uj, and (ii) GL⊬¬Ai. So for every i, there are three possible cases:
If there’s exactly one j such that GL⊢Ai→Uj, then we say that action i leads to outcome j.
If GL⊢¬Ai, then we don’t know what would have happened if the agent had taken action i, because we “don’t have enough counterfactuals”: there is no model of PA in which Ai is true (we can think of the models of PA as the “impossible possible worlds” we use to evaluate the impact of different actions). In particular, this is the case if we have both GL⊢Ai→Uj and GL⊢Ai→Uj′, for j≠j′, since this implies GL⊢¬Ai by the assumption that Uj and Uj′ are provably mutually exclusive.
If there’s no j such that GL⊢Ai→Uj, then we don’t know what would have happened if the agent had taken action i, because we have “ambiguous counterfactuals”: there are some distinct j and j′ such that there’s a model of PA in which Ai∧Uj, and a different model in which Ai∧Uj′. (We know that there is a model in which Ai is true, because otherwise we’d have GL⊢¬Ai, which would imply GL⊢Ai→Uj for every j.)
Now, for example, if we consider Nate’s example of an agent that has three possible actions, but always takes the third one (i.e., →A≡(A1,A2,A3)≡(⊥,⊥,⊤)), then it’s clear that our third-person counterfactuals will not fail silently, but rather give the reasonable answer that it’s hard to say what outcome the agent would have achieved if it had returned a different value: for example, say that U14≡⊤∧(⊥∨¬⊤); are some of the ⊤‘s and ⊥’s in the definition of this universe invocations of the agent? Which ones? We might hope that there’s a notion of third-party counterfactuals which can answer questions like this about the real world, but presumably it would need to make more use of the more complicated structure of the real universe; as posed, the question doesn’t seem to have a good answer.
But when we apply this notion to modal UDT, it returns a non-error answer sufficiently often to allow us to show an at least superficially sensible (if rather trivial!) optimality result.
Let’s say that a pair of (→A,→U) is “fully informative” if every i leads to some j according to our notion of counterfactuals. Then, given a fully informative pair, we can say that →A is optimal (according to our notion of counterfactuals!) iff the outcome that →A’s actual action leads to is optimal among the outcomes achievable by any of the available actions.
Now it’s rather straight-forward to see that modal UDT is optimal, in this sense, on a universe →U whenever the pair (→UDT(→U),→U) is fully informative. Recall the way that modal UDT works:
For every outcome j=1 to n (from best to worst):
For every action i=1 to m:
If □(Ai→Uj), then take action i.
If you’re still here, take some default action.
Clearly, in the fully informative case, this algorithm will take the optimal action (in the sense we use here): Suppose that j is optimal, and i leads to j. The search will not find a proof of an implication Ai′→Uj′ with j′<i′, because then j wouldn’t be optimal according to our definition; and the search will terminate when considering the pair (j,i) at the latest; so modal UDT will return some action i∗ for which GL⊢Ai∗→Uj.
I’d like to say that this covers all the cases in which we would expect modal UDT to be optimal, but unfortunately that’s not quite the case. Suppose that there are two actions, A1 and A2, and two outcomes, U1 and U2. In this case, it’s consistent that i=1 leads to j=1, but we don’t have enough counterfactuals about i=2, that is, GL⊢¬A2 (implying that (→A,→U) isn’t fully informative). This is because modal UDT doesn’t have an explicit “playing chicken” step that would make it take action A2 if it can prove that it doesn’t take this action. Now, if we did not have GL⊢A1→U1, then GL⊢¬A2 would imply that the agent would take action 2 (because ¬A2 implies A2→U1), which would lead to a contradiction (the agent takes an action that it provably doesn’t take), so we can rule out that case; but the case of GL⊢A1→U1 plus GL⊢¬A2 is consistent.
So let’s say that a pair (→A,→U) is “sufficiently informative” if either it’s fully informative or if there is some action i such that GL⊢Ai→U1. Then we can say that →A is optimal if either (i) (→A,→U) is fully informative and →A is optimal in the sense discussed earlier, or (ii) N⊨U1, that is, the agent actually obtains the best possible outcome. With these definitions, we can show that modal UDT is optimal whenever (→UDT(→U),→U) is sufficiently informative.
The reasoning is simple. In the fully informative case, our earlier proof works. In the other case, there’s some i such that GL⊢Ai→U1, so the agent’s search is certainly going to stop when it considers Ai→U1 at the latest; in other words, it’s going to stop at some i∗≤i such that GL⊢Ai∗→U1, and the agent is going to output that action i∗; i.e., we’ll have N⊨Ai∗. But since GL is sound, we also have N⊨Ai∗→U1, and hence N⊨U1, showing optimality in the extended sense.
It’s not surprising that modal UDT is “optimal” in this sense, of course! Nevertheless, as a conceptual tool, it seems useful to have this definition of logical third-person counterfactuals, to complement the first-person notion of modal UDT.
However, my not-so-secret agenda for going through this in detail is that in a follow-up post, I’ll show that there are universes →U such that (→UDT(→U),→U) is fully informative, but UDT still does the intuitively incorrect thing—because the notion of counterfactuals (and hence the notion of optimality) I’ve defined in this post doesn’t agree with intuition as well as we’d like. This failure turns out to be clearer in the context of the third-person counterfactuals described in this post than in modal UDT’s first-person ones.