I think the assumption that multiple actions have nonzero probability in the context of a deterministic decision theory is a pretty big problem. If you come up with a model for where these nonzero probabilities are coming from, I don’t think your argument is going to work.
For instance, your argument fails if these nonzero probabilities come from epsilon exploration. If the agent is forced to take every action with probability epsilon, and merely chooses which action to assign the remaining probability to, then the agent will indeed purchase the contract for some sufficiently small price 2d if cdt(a)≠edt(a), even if a is not the optimal action (let’s say b is the optimal action). When the time comes to take an action, the agent’s best bet is b′ (prime meaning sell the contract for price d). The way I described the set-up, the agent doesn’t choose between a and a′, because actions other than the top choice all happen with probability epsilon. The fact that the agent sells the contract back in its top choice isn’t a Dutch book, because the case where the agent’s top choice goes through is the case in which the contract is worthless, and the contract’s value is derived from other cases.
We could modify the epsilon exploration assumption so that the agent also chooses between a and a′ even while its top choice is b′. That is, there’s a lower bound on the probability with which the agent takes an action in {a,a′}, but even if that bound is achieved, the agent still has some flexibility in distributing probability between a and a′. In this case, contrary to your argument, the agent will prefer a rather than a′, i.e., it will not get Dutch booked. This is because the agent is still choosing b′ as the only action with high probability, and cdt(a) refers to the expected consequence of the agent choosing a as its intended action, so the agent cannot use cdt(a) when calculating which of a or a′ is better to pick as its next choice if its attempt to implement intended action b′ fails.
Another source of uncertainty that the agent could have about its actions is if it believes it could gain information in the future, but before it has to make a decision, and this information could be relevant to which decision it makes. Say that cdtt(a) and edtt(a) are the agent’s expectations at time t of the utility that taking action a would cause it to get, and the utility it would get conditional on taking action a, respectively. Suppose the bookie offers the deal at time 0, and the agent must act at time 1. If the possibility of gaining future knowledge is the only source of the agent’s uncertainty about its own decisions, then at time 1, it knows what action it is taking, and edt1 is undefined on actions not taken.cdt0 and cdt1 should both be well-defined, but they could be different. The problem description should disambiguate between them. Suppose that every time you say cdt and edt in the description of the contract, this means cdt0 and edt0, respectively. The agent purchases the contract, and then, when it comes time to act, it evaluates consequences by cdt1, not cdt0, so the argument for why the agent will inevitably resell the contract fails. If the cdt appearing in the description of the contract instead means cdt1 (since the agent doesn’t know what that is yet, this means the contract references what the agent will believe in the future, rather than stating numerical payoffs), then the agent won’t purchase it in the first place because it will know that the contract will only have value if a seems to be suboptimal at time 1 and it takes action a anyway, which it knows won’t happen, and hence the contract is worthless.
As I said in the post, the real answer is that this argument simply does not apply if the agent knows its action. More generally: the argument applies precisely to those actions to which the agent ascribes positive probability (directly before deciding). So, it is possible for agents to maintain a difference between counterfactual and evidential expectations. However, I think it’s rarely normatively correct for an agent to be in such a position.
Even though the decision procedure of CDT is deterministic, this does not mean that agents described by CDT know what they will do in the future. We can think of this in terms of logical induction: the market is not 100% certain of its own beliefs, and in particular, doesn’t typically know precisely what the maximum-expectation-action is.
One way of seeing the importance of this is to point out that CDT is a normative theory, not a descriptive one. CDT is supposed to tell you what arbitrary agents should do. The recommendations are supposed to apply even to, say, epsilon-exploring agents (who are not described by CDT, strictly speaking). But here we see that CDT recommends being dutch-booked! Therefore, CDT is not a very good normative theory, at least for epsilon-explorers. (So I’m addressing your epsilon-exploration example by differentiating between the agent’s algorithm and the CDT decision theory. The agent isn’t dutch-booked, but CDT recommends a dutch book.)
Granted, we could argue via dutch book that agents should know their own actions, if those actions are deterministic consequences of a know agent-architecture. However, theories of logical uncertainty tell us that this is not (always) realistic. In particular, we can adapt the bounded-resource-dutch-book idea from logical induction. According to this idea, some dutch-book-ability is OK, but agents should not be boundlessly exploitable by resource-bounded bookies.
This idea leads me to think that efficiently computable sequences of actions, which continue to have probability bounded away from zero (just before the decision), should have CDT expectations which converge to EDT expectations.
(Probably there’s a stronger version, based on density-zero exploration type intuitions, where we can reach this conclusion even if the probability is not bounded away from zero, because the total probability is still unbounded.)
One conjecture which was supposed to be communicated by my more recent post was: in learnable environments, this will amount to: all counterfactual expectations converge to evidential expectations (provided the agent is sufficiently farsighted). For example, if the agent knows the environment is trap-free, then when counterfactual and evidential hypotheses continue to severely differ for some (efficiently enumerable) sequence of actions, then there will be a hypothesis which says “the evidential expectations are actually correct”. The agent will want to check that hypothesis, because the VOI of significantly updating its counterfactual expectations is high. Therefore, these actions will not become sufficiently rare (unless the evidential and counterfactual expectations do indeed converge).
In other words, the divergence between evidential and counterfactual expectations is itself a reason why the action probability should be high, provided that the agent is not shortsighted and doesn’t expect the action to be a trap.
If the agent is shortsighted and/or expects traps, then it normatively should not learn anyway (at least, not by deliberate exploration steps). In that case, counterfactual and evidential expectations may forever differ. OTOH, in that case, there’s no reason to expect evidential expectations to be well-informed, so it kind of makes sense that the agent has little motive to adjust its counterfactual expectations towards them.
(But I’ll still give the agent a skeptical look when it asserts that the two differ, since I know that highly informed positions never look like this. The belief that the two differ seems “potentially rational but never defensible”, if that makes sense. I’m tempted to bake the counterfactual/evidential equivalence into the prior, on the general principle that priors should not contain possibilities which we know will be eliminated if sufficient evidence comes in. Yet, doing so might make us vulnerable to Troll Bridge.)
I thought about these things in writing this, but I’ll have to think about them again before making a full reply.
We could modify the epsilon exploration assumption so that the agent also chooses between a and a′ even while its top choice is b′. That is, there’s a lower bound on the probability with which the agent takes an action in {a,a′}, but even if that bound is achieved, the agent still has some flexibility in distributing probability between a and a′.
Another similar scenario would be: we assume the probability of an action is small if it’s sub-optimal, but smaller the worse it is.
I think the assumption that multiple actions have nonzero probability in the context of a deterministic decision theory is a pretty big problem. If you come up with a model for where these nonzero probabilities are coming from, I don’t think your argument is going to work.
For instance, your argument fails if these nonzero probabilities come from epsilon exploration. If the agent is forced to take every action with probability epsilon, and merely chooses which action to assign the remaining probability to, then the agent will indeed purchase the contract for some sufficiently small price 2d if cdt(a)≠edt(a), even if a is not the optimal action (let’s say b is the optimal action). When the time comes to take an action, the agent’s best bet is b′ (prime meaning sell the contract for price d). The way I described the set-up, the agent doesn’t choose between a and a′, because actions other than the top choice all happen with probability epsilon. The fact that the agent sells the contract back in its top choice isn’t a Dutch book, because the case where the agent’s top choice goes through is the case in which the contract is worthless, and the contract’s value is derived from other cases.
We could modify the epsilon exploration assumption so that the agent also chooses between a and a′ even while its top choice is b′. That is, there’s a lower bound on the probability with which the agent takes an action in {a,a′}, but even if that bound is achieved, the agent still has some flexibility in distributing probability between a and a′. In this case, contrary to your argument, the agent will prefer a rather than a′, i.e., it will not get Dutch booked. This is because the agent is still choosing b′ as the only action with high probability, and cdt(a) refers to the expected consequence of the agent choosing a as its intended action, so the agent cannot use cdt(a) when calculating which of a or a′ is better to pick as its next choice if its attempt to implement intended action b′ fails.
Another source of uncertainty that the agent could have about its actions is if it believes it could gain information in the future, but before it has to make a decision, and this information could be relevant to which decision it makes. Say that cdtt(a) and edtt(a) are the agent’s expectations at time t of the utility that taking action a would cause it to get, and the utility it would get conditional on taking action a, respectively. Suppose the bookie offers the deal at time 0, and the agent must act at time 1. If the possibility of gaining future knowledge is the only source of the agent’s uncertainty about its own decisions, then at time 1, it knows what action it is taking, and edt1 is undefined on actions not taken.cdt0 and cdt1 should both be well-defined, but they could be different. The problem description should disambiguate between them. Suppose that every time you say cdt and edt in the description of the contract, this means cdt0 and edt0, respectively. The agent purchases the contract, and then, when it comes time to act, it evaluates consequences by cdt1, not cdt0, so the argument for why the agent will inevitably resell the contract fails. If the cdt appearing in the description of the contract instead means cdt1 (since the agent doesn’t know what that is yet, this means the contract references what the agent will believe in the future, rather than stating numerical payoffs), then the agent won’t purchase it in the first place because it will know that the contract will only have value if a seems to be suboptimal at time 1 and it takes action a anyway, which it knows won’t happen, and hence the contract is worthless.
OK, here’s my position.
As I said in the post, the real answer is that this argument simply does not apply if the agent knows its action. More generally: the argument applies precisely to those actions to which the agent ascribes positive probability (directly before deciding). So, it is possible for agents to maintain a difference between counterfactual and evidential expectations. However, I think it’s rarely normatively correct for an agent to be in such a position.
Even though the decision procedure of CDT is deterministic, this does not mean that agents described by CDT know what they will do in the future. We can think of this in terms of logical induction: the market is not 100% certain of its own beliefs, and in particular, doesn’t typically know precisely what the maximum-expectation-action is.
One way of seeing the importance of this is to point out that CDT is a normative theory, not a descriptive one. CDT is supposed to tell you what arbitrary agents should do. The recommendations are supposed to apply even to, say, epsilon-exploring agents (who are not described by CDT, strictly speaking). But here we see that CDT recommends being dutch-booked! Therefore, CDT is not a very good normative theory, at least for epsilon-explorers. (So I’m addressing your epsilon-exploration example by differentiating between the agent’s algorithm and the CDT decision theory. The agent isn’t dutch-booked, but CDT recommends a dutch book.)
Granted, we could argue via dutch book that agents should know their own actions, if those actions are deterministic consequences of a know agent-architecture. However, theories of logical uncertainty tell us that this is not (always) realistic. In particular, we can adapt the bounded-resource-dutch-book idea from logical induction. According to this idea, some dutch-book-ability is OK, but agents should not be boundlessly exploitable by resource-bounded bookies.
This idea leads me to think that efficiently computable sequences of actions, which continue to have probability bounded away from zero (just before the decision), should have CDT expectations which converge to EDT expectations.
(Probably there’s a stronger version, based on density-zero exploration type intuitions, where we can reach this conclusion even if the probability is not bounded away from zero, because the total probability is still unbounded.)
One conjecture which was supposed to be communicated by my more recent post was: in learnable environments, this will amount to: all counterfactual expectations converge to evidential expectations (provided the agent is sufficiently farsighted). For example, if the agent knows the environment is trap-free, then when counterfactual and evidential hypotheses continue to severely differ for some (efficiently enumerable) sequence of actions, then there will be a hypothesis which says “the evidential expectations are actually correct”. The agent will want to check that hypothesis, because the VOI of significantly updating its counterfactual expectations is high. Therefore, these actions will not become sufficiently rare (unless the evidential and counterfactual expectations do indeed converge).
In other words, the divergence between evidential and counterfactual expectations is itself a reason why the action probability should be high, provided that the agent is not shortsighted and doesn’t expect the action to be a trap.
If the agent is shortsighted and/or expects traps, then it normatively should not learn anyway (at least, not by deliberate exploration steps). In that case, counterfactual and evidential expectations may forever differ. OTOH, in that case, there’s no reason to expect evidential expectations to be well-informed, so it kind of makes sense that the agent has little motive to adjust its counterfactual expectations towards them.
(But I’ll still give the agent a skeptical look when it asserts that the two differ, since I know that highly informed positions never look like this. The belief that the two differ seems “potentially rational but never defensible”, if that makes sense. I’m tempted to bake the counterfactual/evidential equivalence into the prior, on the general principle that priors should not contain possibilities which we know will be eliminated if sufficient evidence comes in. Yet, doing so might make us vulnerable to Troll Bridge.)
I thought about these things in writing this, but I’ll have to think about them again before making a full reply.
Another similar scenario would be: we assume the probability of an action is small if it’s sub-optimal, but smaller the worse it is.