Approach #1 seems to be naive EDT, which is pretty nonstandard. I’d expect more typical reasoning to look like a causal model with two nodes, Money->Agent, where considering different hypothestical strategies changes the behavior of the Agent node.
The observation counterfactuals thing is pretty interesting. But I think it might end up duplicating causal reasoning if you poke at it enough.
Approach #1 is supposed to be a naive updateless-EDT, yeah. What do you think an updateless-CDT approach would be? Perhaps, whereas regular CDT would causally condition on the action, updateless-CDT would change the conditional probabilities in the causal network? That would be the same as the earlier conditioning-on-conditionals approach, in this case. (So it would two-box in transparent Nowcomb.) It could differ from that approach if the causal network doesn’t make the observation set equal the set of parents, though—although it’s unclear how you’d define updateless-CDT in that case.
I would expect something called updateless-CDT to have a causal model of the world, with nodes that it’s picked out (by some magical process) as nodes controlled by the agent, and then it maximizes a utility function over histories of the causal model by following the utility-maximizing strategy, which is a function from states of knowledge at a controlled node (state of some magically-labeled agent nodes that are parents of the controlled node?) to actions (setting the state of the controlled node).
If the magical labeling process has labeled no nodes inside Omega as controlled, then this will probably two-box even on standard Newcomb. On the other hand, if Omega is known to fully simulate the agent, then we might suppose that updateless-CDT plans as if its strategy is controlling Omega’s prediction, and always one-box even with transparent boxes.
I haven’t read Conditioning on Conditionals yet. I am doing so now, but could you explain more about the similarities you were thinking of?
Yeah, I agree that updateless-CDT needs to somehow label which nodes it controls.
You’re glossing over a second magical part, though:
and then it maximizes a utility function over histories of the causal model by following the utility-maximizing strategy,
How do you calculate the expected utility of following a strategy? How do you condition on following a strategy? That’s the whole point here. You obviously can’t just condition on taking certain values of the nodes you control, since a strategy takes different actions in different worlds; so, regular causal conditioning is out. You can try conditioning on the material cenditionals specifying the strategy, which falls on its face as mentioned.
That’s why I jumped to the idea that UCDT would use the conditioning-on-conditionals approach. It seems like what you want to do, to condition on a strategy, is change the conditional probabilities of actions given their parent nodes.
Also, I agree that conditioning-on-conditionals can work fine if combined with a magical locate-which-nodes-you-control step. Observation-counterfactuals are supposed to be a less magical way of dealing with the problem.
Yeah, I agree that observation-counterfactuals are what you’d like the UCDT agent to be thinking of as a strategy—a mapping between information-states and actions.
The reason I used weird language like “state of magically labeled nodes that are parents of the controlled nodes” is just because of how it’s nontrivial to translate the idea of “information available to the agent” into a naturalized causal model. But if that’s what the agent is using to predict the world, I think that’s what things have to get cashed out into.
Approach #1 seems to be naive EDT, which is pretty nonstandard. I’d expect more typical reasoning to look like a causal model with two nodes, Money->Agent, where considering different hypothestical strategies changes the behavior of the Agent node.
The observation counterfactuals thing is pretty interesting. But I think it might end up duplicating causal reasoning if you poke at it enough.
Approach #1 is supposed to be a naive updateless-EDT, yeah. What do you think an updateless-CDT approach would be? Perhaps, whereas regular CDT would causally condition on the action, updateless-CDT would change the conditional probabilities in the causal network? That would be the same as the earlier conditioning-on-conditionals approach, in this case. (So it would two-box in transparent Nowcomb.) It could differ from that approach if the causal network doesn’t make the observation set equal the set of parents, though—although it’s unclear how you’d define updateless-CDT in that case.
I would expect something called updateless-CDT to have a causal model of the world, with nodes that it’s picked out (by some magical process) as nodes controlled by the agent, and then it maximizes a utility function over histories of the causal model by following the utility-maximizing strategy, which is a function from states of knowledge at a controlled node (state of some magically-labeled agent nodes that are parents of the controlled node?) to actions (setting the state of the controlled node).
If the magical labeling process has labeled no nodes inside Omega as controlled, then this will probably two-box even on standard Newcomb. On the other hand, if Omega is known to fully simulate the agent, then we might suppose that updateless-CDT plans as if its strategy is controlling Omega’s prediction, and always one-box even with transparent boxes.
I haven’t read Conditioning on Conditionals yet. I am doing so now, but could you explain more about the similarities you were thinking of?
Yeah, I agree that updateless-CDT needs to somehow label which nodes it controls.
You’re glossing over a second magical part, though:
How do you calculate the expected utility of following a strategy? How do you condition on following a strategy? That’s the whole point here. You obviously can’t just condition on taking certain values of the nodes you control, since a strategy takes different actions in different worlds; so, regular causal conditioning is out. You can try conditioning on the material cenditionals specifying the strategy, which falls on its face as mentioned.
That’s why I jumped to the idea that UCDT would use the conditioning-on-conditionals approach. It seems like what you want to do, to condition on a strategy, is change the conditional probabilities of actions given their parent nodes.
Also, I agree that conditioning-on-conditionals can work fine if combined with a magical locate-which-nodes-you-control step. Observation-counterfactuals are supposed to be a less magical way of dealing with the problem.
Yeah, I agree that observation-counterfactuals are what you’d like the UCDT agent to be thinking of as a strategy—a mapping between information-states and actions.
The reason I used weird language like “state of magically labeled nodes that are parents of the controlled nodes” is just because of how it’s nontrivial to translate the idea of “information available to the agent” into a naturalized causal model. But if that’s what the agent is using to predict the world, I think that’s what things have to get cashed out into.