My intuition here is that it should be possible to see causal networks as arising naturally out of Bayesian considerations
You disagree, then, with Pearl’s dictum that causality is a primitive concept, not reducible to any statistical construction?
The Smoker’s Lesion problem is completely dissolved by using the causal information about the lesion. Without that information it cannot be. The correlations among Smoking, Lesion, and Cancer, on their own, allow of the alternative causal possibilities that Smoking causes Lesion, which causes Cancer, or that Cancer causes Lesion, which causes Smoking (even in the presence of the usual causal assumptions of DAG, Markov, and Faithfulness). These three causal graphs cannot be distinguished by the observational statistics. The causal information given in the problem is an essential part of its statement, and no decision theory which ignores causation can solve it.
EDT recommends the action “which, conditional on your having chosen it, gives you the best expectations for the outcome.” That formulation glosses over whether that conditional expectation is based on the statistical correlations observed in the population (i.e. ignoring causation), or the correlations resulting from considering the actions as interventions in a causal network. It is generally understood as the former; attempts to fix it consist of changing it to use the latter. The only differences among these various attempts is how willing their proposers are to simply say “do causal reasoning”.
When you talk about selection bias, you talk about counterfactuals (do-actions, in Pearl’s notation, a causal concept). The Tickle defence introduces a causal hypothesis (the tickle prompting, i.e. causing smoking). I don’t follow the reference class part, but it doesn’t seem to cover the situation of an EDT reasoner advising someone else who professes an inclination to smoke. That is just as much a problem for EDT as the original version. It is also a problem that AIXI can be set to solving. What might its answer be?
You disagree, then, with Pearl’s dictum that causality is a primitive concept, not reducible to any statistical construction?
No. For example, AIXI is what I would regard as essentially a Bayesian agent, but it has a notion of causality because it has a notion of the environment taking its actions as an input. What I mean is more like wondering if AIXI would invent causal networks.
It is generally understood as the former; attempts to fix it consist of changing it to use the latter.
I think this is too narrow a way to describe the mistake that naive EDT is making. First, I hope you agree that even naive EDT wouldn’t use statistical correlations in a population of agents completely unrelated to it (for example, agents who make their decisions randomly). But naive EDT may be in the position of existing in a world where it is the only naive EDT agent, although there may be many agents which are similar but not completely identical to it. How should it update in this situation? It might try to pick a population of agents sufficiently similar to itself, but then it’s unclear how the fact that they’re similar but not identical should be taken into account.
AIXI, by contrast, would do something more sophisticated. Namely, its observations about the environment, including other agents similar to itself, would all update its model of the environment.
I don’t follow the reference class part, but it doesn’t seem to cover the situation of an EDT reasoner advising someone else who professes an inclination to smoke.
It seems like some variant of the tickle defense covers this. Once the other agent professes their inclination to smoke, that screens off any further information obtained by the other agent smoking or not smoking.
It is also a problem that AIXI can be set to solving. What might its answer be?
I guess AIXI could do something like start with a prior over possible models of how various actions, including smoking, could affect the other agent, update, then use the posterior distribution over models to predict the effect of interventions like smoking. But this requires a lot more data than is usually given in the smoking lesion problem.
No. For example, AIXI is what I would regard as essentially a Bayesian agent, but it has a notion of causality because it has a notion of the environment taking its actions as an input.
This looks like a symptom of AIXI’s inability to self-model. Of course causality is going to look fundamental when you think you can magically intervene from outside the system.
Do you share the intuition I mention in my other comment? I feel that they way this post reframes CDT and TDT as attempts to clarify bad self-modelling by naive EDT is very similar to the way I would reframe Pearl’s positions as an attempt to clarify bad self-modelling by naive probability theory a la AIXI.
So your intuition is that causality isn’t fundamental but should fall out of correct self-modeling? I guess that’s also my intuition, and I also don’t know how to make that precise.
These three causal graphs cannot be distinguished by the observational statistics. The causal information given in the problem is an essential part of its statement, and no decision theory which ignores causation can solve it.
I think this isn’t actually compatible with the thought experiment. Our hypothetical agent knows that it is an agent. I can’t yet formalize what I mean by this, but I think that it requires probability distributions corresponding to a certain causal structure, which would allow us to distinguish it from the other graphs. I don’t know how to write down a probability distribution that contains myself as I write it, but it seems that such a thing would encode the interventional information about the system that I am interacting with on a purely probabilistic level. If this is correct, you wouldn’t need a separate representation of causality to decide correctly.
Our hypothetical agent knows that it is an agent. I can’t yet formalize what I mean by this, but I think that it requires probability distributions corresponding to a certain causal structure, which would allow us to distinguish it from the other graphs
How about: an agent, relative to a given situation described by a causal graph G, is an entity that can perform do-actions on G.
No, that’s not what I meant at all. In what you said, the agent needs to be separate from the system in order to preform do-actions. I want an agent that knows it’s an agent, so it has to have a self-model and, in particular, has to be inside the system that is modelled by our causal graph.
One of the guiding heuristics in FAI theory is that an agent should model itself the same way it models other things. Roughly, the agent isn’t actually tagged as different from nonagent things in reality, so any desired behaviour that depends on correctly making this distinction cannot be regulated with evidence as to whether it is actually making the distinction the way we want it to. A common example of this is the distinction between self-modification and creating a successor AI; an FAI should not need to distinguish these, since they’re functionally the same. These sorts of ideas are why I want the agent to be modelled within its own causal graph.
You disagree, then, with Pearl’s dictum that causality is a primitive concept, not reducible to any statistical construction?
The Smoker’s Lesion problem is completely dissolved by using the causal information about the lesion. Without that information it cannot be. The correlations among Smoking, Lesion, and Cancer, on their own, allow of the alternative causal possibilities that Smoking causes Lesion, which causes Cancer, or that Cancer causes Lesion, which causes Smoking (even in the presence of the usual causal assumptions of DAG, Markov, and Faithfulness). These three causal graphs cannot be distinguished by the observational statistics. The causal information given in the problem is an essential part of its statement, and no decision theory which ignores causation can solve it.
EDT recommends the action “which, conditional on your having chosen it, gives you the best expectations for the outcome.” That formulation glosses over whether that conditional expectation is based on the statistical correlations observed in the population (i.e. ignoring causation), or the correlations resulting from considering the actions as interventions in a causal network. It is generally understood as the former; attempts to fix it consist of changing it to use the latter. The only differences among these various attempts is how willing their proposers are to simply say “do causal reasoning”.
When you talk about selection bias, you talk about counterfactuals (do-actions, in Pearl’s notation, a causal concept). The Tickle defence introduces a causal hypothesis (the tickle prompting, i.e. causing smoking). I don’t follow the reference class part, but it doesn’t seem to cover the situation of an EDT reasoner advising someone else who professes an inclination to smoke. That is just as much a problem for EDT as the original version. It is also a problem that AIXI can be set to solving. What might its answer be?
No. For example, AIXI is what I would regard as essentially a Bayesian agent, but it has a notion of causality because it has a notion of the environment taking its actions as an input. What I mean is more like wondering if AIXI would invent causal networks.
I think this is too narrow a way to describe the mistake that naive EDT is making. First, I hope you agree that even naive EDT wouldn’t use statistical correlations in a population of agents completely unrelated to it (for example, agents who make their decisions randomly). But naive EDT may be in the position of existing in a world where it is the only naive EDT agent, although there may be many agents which are similar but not completely identical to it. How should it update in this situation? It might try to pick a population of agents sufficiently similar to itself, but then it’s unclear how the fact that they’re similar but not identical should be taken into account.
AIXI, by contrast, would do something more sophisticated. Namely, its observations about the environment, including other agents similar to itself, would all update its model of the environment.
It seems like some variant of the tickle defense covers this. Once the other agent professes their inclination to smoke, that screens off any further information obtained by the other agent smoking or not smoking.
I guess AIXI could do something like start with a prior over possible models of how various actions, including smoking, could affect the other agent, update, then use the posterior distribution over models to predict the effect of interventions like smoking. But this requires a lot more data than is usually given in the smoking lesion problem.
This looks like a symptom of AIXI’s inability to self-model. Of course causality is going to look fundamental when you think you can magically intervene from outside the system.
Do you share the intuition I mention in my other comment? I feel that they way this post reframes CDT and TDT as attempts to clarify bad self-modelling by naive EDT is very similar to the way I would reframe Pearl’s positions as an attempt to clarify bad self-modelling by naive probability theory a la AIXI.
So your intuition is that causality isn’t fundamental but should fall out of correct self-modeling? I guess that’s also my intuition, and I also don’t know how to make that precise.
I think this isn’t actually compatible with the thought experiment. Our hypothetical agent knows that it is an agent. I can’t yet formalize what I mean by this, but I think that it requires probability distributions corresponding to a certain causal structure, which would allow us to distinguish it from the other graphs. I don’t know how to write down a probability distribution that contains myself as I write it, but it seems that such a thing would encode the interventional information about the system that I am interacting with on a purely probabilistic level. If this is correct, you wouldn’t need a separate representation of causality to decide correctly.
How about: an agent, relative to a given situation described by a causal graph G, is an entity that can perform do-actions on G.
No, that’s not what I meant at all. In what you said, the agent needs to be separate from the system in order to preform do-actions. I want an agent that knows it’s an agent, so it has to have a self-model and, in particular, has to be inside the system that is modelled by our causal graph.
One of the guiding heuristics in FAI theory is that an agent should model itself the same way it models other things. Roughly, the agent isn’t actually tagged as different from nonagent things in reality, so any desired behaviour that depends on correctly making this distinction cannot be regulated with evidence as to whether it is actually making the distinction the way we want it to. A common example of this is the distinction between self-modification and creating a successor AI; an FAI should not need to distinguish these, since they’re functionally the same. These sorts of ideas are why I want the agent to be modelled within its own causal graph.