Decision theories map world models into actions. If you ever make a claim like “This decision-theory agent can never learn X and is therefore flawed”, you’re either misphrasing something or you’re wrong. The capacity to learn a good world-model is outside the scope of what decision theory is[1]. In this case, I think you’re wrong.
For example, suppose the CDT agent estimates the prediction will be “zero” with probability p, and “one” with probability 1-p. Then if p≥1/2, they can say “one”, and have a probability p≥1/2 of winning, in their own view. If p<1/2, they can say “zero”, and have a subjective probability 1−p>1/2 of winning.
This is not what a CDT agent would do. Here is what a CDT agent would do:
1. The CDT agent makes an initial estimate that the prediction will be “zero” with probability 0.9 and “one” with probability 0.1.
2. The CDT agent considers making the decision to say “one” but notices that Omega’s prediction aligns with its actions.
3. Given that the CDT agent was just considering saying “one”, the agent updates its initial estimate by reversing it. It declares “I planned on guessing one before but the last time I planned that, the predictor also guessed one. Therefore I will reverse and consider guessing zero.”
4. Given that the CDT agent was just considering saying “zero”, the agent updates its initial estimate by reversing it. It declares “I planned on guessing zero before but the last time I planned that, the predictor also guessedzero. Therefore I will reverse and consider guessing one.”
5. The CDT agent realizes that, given the predictor’s capabilities, its own prediction will be undefined
6. The CDT agent walks away, not wanting to waste the computational power
The longer and longer the predictor is accurate for, the higher and higher the CDT agent’s prior becomes that its own thought process is casually affecting the estimate[2]. Since the CDT agent is embedded, it’s impossible for the CDT agent to reason outside it’s thought process and there’s no use in it nonsensically refusing to leave the game.
Furthermore, any good decision-theorist knows that you should never go up against a Sicilian when death is on the line[3].
[1] This is not to say that world-modeling isn’t relevant to evaluating a decision theory. But in this case, we should be fully discussing things that may/may not happen in the actual world we’re in and picking the most appropriate decision theory for this one. Isolated thought experiments do not serve this purpose.
[2] Note that, in cases where this isn’t true, the predictor should get worse over time. The predictor is trying to model the CDT agent’s predictions (which depend on how the CDT agent’s actions affect its thought-process) without accounting for the way the CDT agent is changing as it makes decision. As a result, a persevering CDT agent will ultimately beat the predictor here and gain infinite utility by playing the game forever
[3] The Battle of Wits from the Princess Bride is isomorphic to problem in this post
Since when does CDT include backtracking on noticing other people’s predictive inconsistency? And, I’m not sure that any such explicitly iterative algorithm would be stable.
The CDT agent considers making the decision to say “one” but notices that Omega’s prediction aligns with its actions.
This is the key. You’re not playing CDT here, you’re playing “human-style hacky decision theory.” CDT cannot notice that Omega’s prediction aligns with its hypothetical decision because Omega’s prediction is causally “before” CDT’s decision, so any causal decision graph cannot condition on it. This is why post-TDT decision theories are also called “acausal.”
Since when does CDT include backtracking on noticing other people’s predictive inconsistency?
I agree that CDT does not including backtracking on noticing other people’s predictive inconsistency. My assumption is that decision-theories (including CDT) takesa world-map and outputs an action. I’m claiming that this post is conflating an error in constructing an accurate world-map with an error in the decision theory.
CDT cannot notice that Omega’s prediction aligns with its hypothetical decision because Omega’s prediction is causally “before” CDT’s decision, so any causal decision graph cannot condition on it. This is why post-TDT decision theories are also called “acausal.”
Here is a more explicit version of what I’m talking about. CDT makes a decision to act based on the expected value of its action. To produce such an action, we need to estimate an expected value. In the original post, there are two parts to this:
Part 1 (Building a World Model):
I believe that the predictor modeled my reasoning process and has made a prediction based on that model. This prediction happens before I actually instantiate my reasoning process
I believe this model to be accurate/quasi-accurate
I start unaware of what my causal reasoning process is so I have no idea what the predictor will do. In any case, the causal reasoning process must continue because I’m thinking.
As I think, I get more information about my causal reasoning process. Because I know that the predictor is modeling my reasoning process, this let’s me update my prediction of the predictor’s prediction.
Because the above step was part of my causal reasoning process and information about my causal reasoning process affects my model of the predictor’s model of me, I must update on the above step as well
[The Dubious Step] Because I am modeling myself as CDT, I will make a statement intended to inverse the predictor. Because I believe the predictor is modeling me, this requires me to inverse myself. That is to say, every update my causal reasoning process makes to my probabilities is inversing the previous update
Note that this only works if I believe my reasoning process (but not necessarily the ultimate action) gives me information about the predictor’s prediction.
The above leads to infinite regress
Part 2 (CDT)
Ask the world model what the odds are that the predictor said “one” or “zero”
Find the one with higher likelihood and inverse it
I believe Part 1 fails and that this isn’t the fault of CDT. For instance, imagine the above problem with zero stakes such that decision theory is irrelevant. If you ask any agent to give the inverse of its probabilities that Omega will say “one” or “zero” with the added information that Omega will perfectly predict those inverses and align with them, that agent won’t be able to give you probabilities. Hence, the failure occurs in building a world model rather than in implementing a decision theory.
-------------------------------- Original version
Since when does CDT include backtracking on noticing other people’s predictive inconsistency?
Ever since the process of updating a causal model of the world based on new information was considered an epistemic question outside the scope of decision theory.
To see how this is true, imagine the exact same situation as described in the post with zero stakes. Then ask any agent with any decision theory about the inverse of the prediction it expects the predictor to make. The answer will always be “I don’t know”, independent of decision theory. Ask that same agent if it can assign probabilities to the answers and it will say “I don’t know; every time I try to come up with one, the answer reverses.”
All I’m trying to do is compute the probability that the predictor will guess “one” or “zero” and failing. The output of failing here isn’t “well, I guess I’ll default to fifty-fifty so I should pick at random”[1], it’s NaN.
Here’s a causal explanation:
I believe the predictor modeled my reasoning process and has made a prediction based on that model.
I believe this model to be accurate/quasi-accurate
I start unaware of what my causal reasoning process is so I have no idea what the predictor will do. But my prediction of the predictor depends on my causal reasoning process
Because my causal reasoning process is contingent on my prediction and my prediction is contingent on my causal reasoning process, I end up in an infinite loop where my causal reasoning process cannot converge on an actual answer. Every time it tries, it just keeps updating.
I quit the game because my prediction is incomputable
I’m claiming that this post is conflating an error in constructing an accurate world-map with an error in the decision theory.
The problem is not that CDT has an inaccurate world-map; the problem is that CDT has an accurate world map, and then breaks it. CDT would work much better with an inaccurate world-map, one in which its decision causally affects the prediction.
Having done some research, it turns out the thing I was actually pointing to was ratifiability and the stance that any reasonable separation of world-modeling and decision-selection should put ratifiability in the former rather than the latter. This specific claim isn’t new: From “Regret and Instability in causal decision theory”:
Second, while I agree that deliberative equilibrium is central to rational decision making, I disagree with Arntzenius that CDT needs to be ammended in any way to make it appropriately deliberational. In cases like Murder Lesion a deliberational perspective is forced on us by what CDT says. It says this: A rational agent should base her decisions on her best information about the outcomes her acts are likely to causally promote, and she should ignore information about what her acts merely indicate. In other words, as I have argued, the theory asks agents to conform to Full Information, which requires them to reason themselves into a state of equilibrium before they act. The deliberational perspective is thus already a part of CDT
However, it’s clear to me now that you were discussing an older, more conventional, version of CDT[1] which does not have that property. With respect to that version, the thought-experiment goes through but, with respect to the version I believe to be sensible, it doesn’t[2].
[1] I’m actually kind of surprised that the conventional version of CDT is that dumb—and I had to check a bunch of papers to verify that this was actually happening. Maybe if my memory had complied at the time, it would’ve flagged your distinguishing between CDT and EDT here from past LessWrong articles I’ve read like CDT=EDT. But this wasn’t meant to be so I didn’t notice you were talking about something different.
[2] I am now confident it does not apply to the thing I’m referring to—the linked paper brings up “Death in Damascus” specifically as a place where ratifiable CDt does not fail
Can you clarify what you mean by “successfully formalised”? I’m not sure if I can answer that question but I can say the following:
Stanford’s encyclopedia has a discussion of ratifiability dating back to the 1960s and (by the 1980s) it has been applied to both EDT and CDT (which I’d expect, given that constraints on having an accurate world model should be independent of decision theory). This gives me confidence that it’s not just a random Less Wrong thing.
Abram Dempski from MIRI has a whole sequence on when CDT=EDT which leverages ratifiability as a sub-assumption. This gives me confidence that ratifiability is actually onto something (the Less Wrong stamp of approval is important!)
Whether any of this means that it’s been “successfully formalised”, I can’t really say. From the outside-view POV, I literally did not know about the conventional version of CDT until yesterday. Thus, I do not really view myself as someone currently capable of verifying the extent to which a decision theory has been successfully formalised. Still, I consider this version of CDT old enough historically and well-enough-discussed on Less Wrong by Known Smart People that I have high confidence in it.
Decision theories map world models into actions. If you ever make a claim like “This decision-theory agent can never learn X and is therefore flawed”, you’re either misphrasing something or you’re wrong. The capacity to learn a good world-model is outside the scope of what decision theory is[1]. In this case, I think you’re wrong.
This is not what a CDT agent would do. Here is what a CDT agent would do:
1. The CDT agent makes an initial estimate that the prediction will be “zero” with probability 0.9 and “one” with probability 0.1.
2. The CDT agent considers making the decision to say “one” but notices that Omega’s prediction aligns with its actions.
3. Given that the CDT agent was just considering saying “one”, the agent updates its initial estimate by reversing it. It declares “I planned on guessing one before but the last time I planned that, the predictor also guessed one. Therefore I will reverse and consider guessing zero.”
4. Given that the CDT agent was just considering saying “zero”, the agent updates its initial estimate by reversing it. It declares “I planned on guessing zero before but the last time I planned that, the predictor also guessed zero. Therefore I will reverse and consider guessing one.”
5. The CDT agent realizes that, given the predictor’s capabilities, its own prediction will be undefined
6. The CDT agent walks away, not wanting to waste the computational power
The longer and longer the predictor is accurate for, the higher and higher the CDT agent’s prior becomes that its own thought process is casually affecting the estimate[2]. Since the CDT agent is embedded, it’s impossible for the CDT agent to reason outside it’s thought process and there’s no use in it nonsensically refusing to leave the game.
Furthermore, any good decision-theorist knows that you should never go up against a Sicilian when death is on the line[3].
[1] This is not to say that world-modeling isn’t relevant to evaluating a decision theory. But in this case, we should be fully discussing things that may/may not happen in the actual world we’re in and picking the most appropriate decision theory for this one. Isolated thought experiments do not serve this purpose.
[2] Note that, in cases where this isn’t true, the predictor should get worse over time. The predictor is trying to model the CDT agent’s predictions (which depend on how the CDT agent’s actions affect its thought-process) without accounting for the way the CDT agent is changing as it makes decision. As a result, a persevering CDT agent will ultimately beat the predictor here and gain infinite utility by playing the game forever
[3] The Battle of Wits from the Princess Bride is isomorphic to problem in this post
Since when does CDT include backtracking on noticing other people’s predictive inconsistency? And, I’m not sure that any such explicitly iterative algorithm would be stable.
This is the key. You’re not playing CDT here, you’re playing “human-style hacky decision theory.” CDT cannot notice that Omega’s prediction aligns with its hypothetical decision because Omega’s prediction is causally “before” CDT’s decision, so any causal decision graph cannot condition on it. This is why post-TDT decision theories are also called “acausal.”
[Comment edited for clarity]
I agree that CDT does not including backtracking on noticing other people’s predictive inconsistency. My assumption is that decision-theories (including CDT) takesa world-map and outputs an action. I’m claiming that this post is conflating an error in constructing an accurate world-map with an error in the decision theory.
Here is a more explicit version of what I’m talking about. CDT makes a decision to act based on the expected value of its action. To produce such an action, we need to estimate an expected value. In the original post, there are two parts to this:
Part 1 (Building a World Model):
I believe that the predictor modeled my reasoning process and has made a prediction based on that model. This prediction happens before I actually instantiate my reasoning process
I believe this model to be accurate/quasi-accurate
I start unaware of what my causal reasoning process is so I have no idea what the predictor will do. In any case, the causal reasoning process must continue because I’m thinking.
As I think, I get more information about my causal reasoning process. Because I know that the predictor is modeling my reasoning process, this let’s me update my prediction of the predictor’s prediction.
Because the above step was part of my causal reasoning process and information about my causal reasoning process affects my model of the predictor’s model of me, I must update on the above step as well
[The Dubious Step] Because I am modeling myself as CDT, I will make a statement intended to inverse the predictor. Because I believe the predictor is modeling me, this requires me to inverse myself. That is to say, every update my causal reasoning process makes to my probabilities is inversing the previous update
Note that this only works if I believe my reasoning process (but not necessarily the ultimate action) gives me information about the predictor’s prediction.
The above leads to infinite regress
Part 2 (CDT)
Ask the world model what the odds are that the predictor said “one” or “zero”
Find the one with higher likelihood and inverse it
I believe Part 1 fails and that this isn’t the fault of CDT. For instance, imagine the above problem with zero stakes such that decision theory is irrelevant. If you ask any agent to give the inverse of its probabilities that Omega will say “one” or “zero” with the added information that Omega will perfectly predict those inverses and align with them, that agent won’t be able to give you probabilities. Hence, the failure occurs in building a world model rather than in implementing a decision theory.
-------------------------------- Original version
Ever since the process of updating a causal model of the world based on new information was considered an epistemic question outside the scope of decision theory.
To see how this is true, imagine the exact same situation as described in the post with zero stakes. Then ask any agent with any decision theory about the inverse of the prediction it expects the predictor to make. The answer will always be “I don’t know”, independent of decision theory. Ask that same agent if it can assign probabilities to the answers and it will say “I don’t know; every time I try to come up with one, the answer reverses.”
All I’m trying to do is compute the probability that the predictor will guess “one” or “zero” and failing. The output of failing here isn’t “well, I guess I’ll default to fifty-fifty so I should pick at random”[1], it’s NaN.
Here’s a causal explanation:
I believe the predictor modeled my reasoning process and has made a prediction based on that model.
I believe this model to be accurate/quasi-accurate
I start unaware of what my causal reasoning process is so I have no idea what the predictor will do. But my prediction of the predictor depends on my causal reasoning process
Because my causal reasoning process is contingent on my prediction and my prediction is contingent on my causal reasoning process, I end up in an infinite loop where my causal reasoning process cannot converge on an actual answer. Every time it tries, it just keeps updating.
I quit the game because my prediction is incomputable
The problem is not that CDT has an inaccurate world-map; the problem is that CDT has an accurate world map, and then breaks it. CDT would work much better with an inaccurate world-map, one in which its decision causally affects the prediction.
See this post for how you can hack that: https://www.lesswrong.com/posts/9m2fzjNSJmd3yxxKG/acdt-a-hack-y-acausal-decision-theory
Having done some research, it turns out the thing I was actually pointing to was ratifiability and the stance that any reasonable separation of world-modeling and decision-selection should put ratifiability in the former rather than the latter. This specific claim isn’t new: From “Regret and Instability in causal decision theory”:
However, it’s clear to me now that you were discussing an older, more conventional, version of CDT[1] which does not have that property. With respect to that version, the thought-experiment goes through but, with respect to the version I believe to be sensible, it doesn’t[2].
[1] I’m actually kind of surprised that the conventional version of CDT is that dumb—and I had to check a bunch of papers to verify that this was actually happening. Maybe if my memory had complied at the time, it would’ve flagged your distinguishing between CDT and EDT here from past LessWrong articles I’ve read like CDT=EDT. But this wasn’t meant to be so I didn’t notice you were talking about something different.
[2] I am now confident it does not apply to the thing I’m referring to—the linked paper brings up “Death in Damascus” specifically as a place where ratifiable CDt does not fail
Have they successfully formalised the newer CDT?
Can you clarify what you mean by “successfully formalised”? I’m not sure if I can answer that question but I can say the following:
Stanford’s encyclopedia has a discussion of ratifiability dating back to the 1960s and (by the 1980s) it has been applied to both EDT and CDT (which I’d expect, given that constraints on having an accurate world model should be independent of decision theory). This gives me confidence that it’s not just a random Less Wrong thing.
Abram Dempski from MIRI has a whole sequence on when CDT=EDT which leverages ratifiability as a sub-assumption. This gives me confidence that ratifiability is actually onto something (the Less Wrong stamp of approval is important!)
Whether any of this means that it’s been “successfully formalised”, I can’t really say. From the outside-view POV, I literally did not know about the conventional version of CDT until yesterday. Thus, I do not really view myself as someone currently capable of verifying the extent to which a decision theory has been successfully formalised. Still, I consider this version of CDT old enough historically and well-enough-discussed on Less Wrong by Known Smart People that I have high confidence in it.