By Lewis Hammond, Tom Everitt, Jon Richens, Francis Rhys Ward, Ryan Carey, Sebastian Benthall, and James Fox, representing the Causal Incentives Working Group. Thanks also to Alexis Bellot, Toby Shevlane, and Aliya Ahmad.
Causal models are the foundations of our work. In this post, we provide a succinct but accessible explanation of causal models that can handle interventions, counterfactuals, and agents, which will be the building blocks of future posts in the sequence. Basic familiarity with (conditional) probabilities will be assumed.
What is causality?
What does it mean for the rain to cause the grass to become green? Causality is a philosophically intriguing topic that underlies many other concepts of human importance. In particular, many concepts relevant to safe AGI, like influence, response, agency, intent, fairness, harm, and manipulation, cannot be grasped without a causal model of the world, as we mentioned in the intro post and will discuss further in subsequent posts.
We follow Pearl and adopt an interventionist definition of causality: the sprinkler today causally influences the greenness of the grass tomorrow, because if someone intervened and turned on the sprinkler, then the greenness of the grass would be different. In contrast, making the grass green tomorrow has no effect on the sprinkler today (assuming no one predicts the intervention). So the sprinkler today causally influences the grass tomorrow, but not vice versa, as we would intuitively expect.
Interventions
Causal Bayesian Networks (CBNs) represent causal dependencies between aspects of reality using a directed acyclic graph. An arrow from a variable A to a variable B means that A influences B under some fixed setting of the other variables. For example, we draw an arrow from sprinkler (S) to grass greenness (G):
For each node in the graph, a causal mechanism of how the node is influenced by its parents is specified with a conditional probability distribution. For the sprinkler, a distribution p(S) specifies how commonly it is turned on, e.g. P(S=on)=30%. For the grass, a conditional distribution p(G∣S) specifies how likely it is that the grass becomes green when the sprinkler is on, e.g. p(G=green∣S=on)=100%, and how likely it is that the grass becomes green when the sprinkler is off, e.g. p(G=green∣S=off)=30%.
By multiplying the distributions together, we get a joint probability distributionp(S,G)=p(S)p(G∣S) that describes the likelihood of any combination of outcomes. Joint probability distribution are the foundation of standard probability theory, and can be used to answer questions such as “what is the likelihood that the sprinkler is on, given that I observe that the grass is wet?”
An intervention on a system changes one or more causal mechanisms. For example, an intervention that turns the sprinkler on corresponds to replacing the causal mechanism p(S) for the sprinkler, with a new mechanism 1(S=on) that always has the sprinkler on. The effects of the intervention can be computed from the updated joint distribution p(S,G∣do(S=on))=1(S=on)intervenedmechanismp(G∣S) where do(S=on) denotes the intervention.
Note that it would not be possible to compute the effect of the intervention from just the joint probability distribution p(S,G), as without the causal graph, there’d be no way to tell whether a mechanism should be changed in the factorisation P(S)P(G∣S) or inp(G)p(S∣G).
Ultimately, all statistical correlations are due to casual influences. Hence, for a set of variables there is always some CBN that represents the underlying causal structure of the data generating process, though extra variables may be needed to explain e.g. unmeasured confounders.
Counterfactuals
Suppose that the sprinkler is on and the grass is green. Would the grass have been green had the sprinkler not been on? Questions about counterfactuals like these are harder than questions about interventions, because they involve reasoning across multiple worlds. Counterfactuals are key to defining e.g. harm, intent, fairness, and impact measures, as they all depend on comparing outcomes across hypothetical worlds.
To handle such reasoning, structural causal models (SCMs) refine CBNs in three important ways. First, background context that is shared across hypothetical worlds is explicitly separated from variables that can be intervened and vary across the worlds. The former are called exogenous variables, and the latter endogenous. For our question, it will be useful to introduce an exogenous variable R for whether it rains or not. The sprinkler and the grass are endogenous variables.
The relationship between hypothetical worlds can be represented with a twin-graph, where there are two copies of the endogenous variables for actual and hypothetical worlds, and the exogenous variable(s) provide shared context:
Second, SCMs introduce notation to distinguish endogenous variables in different hypothetical worlds. For example, GS=off denotes grass greenness in the hypothetical world where the sprinkler is off. It can be read as shorthand for “G∣do(S=off)”, and has the benefit that it can occur in expressions involving variables from other worlds. For example, our question can be expressed as p(GS=off=greenhypothetical∣S=on,G=greenactual observations).
Third, SCMs require all endogenous variables to have deterministic causal mechanisms. This is satisfied in our case if we assume that the sprinkler is on whenever it’s not raining, and the grass becomes green (only) if it rains or the sprinkler is on.
The determinism means that conditioning is as simple as updating the distribution over exogenous variables, e.g.P(R) updates to P(R∣S=on,G=green). In our case, the probability for rain decreases from 30% to 0%, since the sprinkler is never on if it’s raining.
This means our question is answered by the following reasoning steps:
Abduction: update P(R) to P(R∣S=on,G=green)
Intervention: intervene to turn the sprinkler off, do(S=off)
Prediction: compute the value of G in the updated model.
That is, we can say that the grass would not have been green if the sprinkler had been off (under the assumption we’ve made about the specific relationships).
SCMs are strictly more powerful than CBNs. Their primary drawback is that they require deterministic relationships between endogenous variables, which are often hard to determine in practice. They’re also limited to non-backtracking counterfactuals, where hypothetical worlds are distinguished by interventions.
One agent
To infer Mr Jones’ intentions or incentives, or predict how his behaviour would adapt to changes in his model of the world, we need a causal influence diagrams (CID) that labels variables as chance, decision, or utility nodes. In our example, rain would be a chance node, the sprinkler a decision, and grass greenness a utility. Since rain is a parent of the sprinkler, Mr Jones observes it before making his decision. Graphically, chance nodes are rounded as before, decisions are rectangles, utilities are diamonds, and dashed edges denote observations:
The agent specifies causal mechanisms for its decisions, i.e. a policy, with the goal of maximising the sum of its utility nodes. In our example, an optimal policy would be to turn the sprinkler on when it’s not raining (the decision when it is raining doesn’t matter). Once a policy is specified, the CID defines a CBN.
In models with agents, there are two kinds of interventions, depending on whether agents get to adapt their policy to the intervention or not. For example, only if we informed Mr Jones about an intervention to the grass before he made his sprinkler decision, could he pick a different sprinkler policy. Both pre-policy and post-policy interventions can both be handled with the standard do-operator if we add so-called mechanism nodes to the model. More about these in the next post.
Multiple agents
Interaction between multiple agents can be modelled with causal games, in which each agent has a set of decision and utility variables.
To illustrate this, assume Mr Jones sometimes sows new grass. Birds like to eat the seeds, but cannot tell from afar whether there are any. They can only see whether Mr Jones is using the sprinkler, which is more likely when the grass is new. Mr Jones wants to water his lawn if it’s new, but does not want the birds to eat his seeds. This signalling game has the following structure:
Beyond modelling causality better, causal games also have some other advantages over standard extensive-form games (EFGs). For example, the causal game immediately shows that the birds are indifferent to whether Mr Jones waters the grass or not, because the only directed path from the sprinkler S to food F goes via the birds’ own decision B. In an EFG, this information would be hidden in the payoffs. By explicitly representing independencies, causal games can sometimes find more subgames and rule out more non-credible threats than EFGs. A causal game can always be converted to an EFG.
Analogously to the distinction between joint probability distributions, CBNs, and SCMs, there are (multi-agent) influence diagrams that include agents in graphs that need not be causal, and structural causal influence models and structural causal games that combine agents with exogenous nodes and determinism to answer counterfactual questions.
Summary
This post introduced models that can answer correlational, interventional and counterfactual questions, and that can handle zero, one, or many agents. All in all, there are nine possible kinds of models. For more comprehensive introductions to causal models, see Section 2 of Reasoning about causality in games, and Pearl’s book A Primer.
Next post. CIDs and causal games are used to model agent(s). But, what is an agent? In the next post, we take a deeper look at what agents are by looking at some characteristics shared by all agentic systems.
Causality: A Brief Introduction
Post 2 of Towards Causal Foundations of Safe AGI, see also Post 1 Introduction.
By Lewis Hammond, Tom Everitt, Jon Richens, Francis Rhys Ward, Ryan Carey, Sebastian Benthall, and James Fox, representing the Causal Incentives Working Group. Thanks also to Alexis Bellot, Toby Shevlane, and Aliya Ahmad.
Causal models are the foundations of our work. In this post, we provide a succinct but accessible explanation of causal models that can handle interventions, counterfactuals, and agents, which will be the building blocks of future posts in the sequence. Basic familiarity with (conditional) probabilities will be assumed.
What is causality?
What does it mean for the rain to cause the grass to become green? Causality is a philosophically intriguing topic that underlies many other concepts of human importance. In particular, many concepts relevant to safe AGI, like influence, response, agency, intent, fairness, harm, and manipulation, cannot be grasped without a causal model of the world, as we mentioned in the intro post and will discuss further in subsequent posts.
We follow Pearl and adopt an interventionist definition of causality: the sprinkler today causally influences the greenness of the grass tomorrow, because if someone intervened and turned on the sprinkler, then the greenness of the grass would be different. In contrast, making the grass green tomorrow has no effect on the sprinkler today (assuming no one predicts the intervention). So the sprinkler today causally influences the grass tomorrow, but not vice versa, as we would intuitively expect.
Interventions
Causal Bayesian Networks (CBNs) represent causal dependencies between aspects of reality using a directed acyclic graph. An arrow from a variable A to a variable B means that A influences B under some fixed setting of the other variables. For example, we draw an arrow from sprinkler (S) to grass greenness (G):
For each node in the graph, a causal mechanism of how the node is influenced by its parents is specified with a conditional probability distribution. For the sprinkler, a distribution p(S) specifies how commonly it is turned on, e.g. P(S=on)=30%. For the grass, a conditional distribution p(G∣S) specifies how likely it is that the grass becomes green when the sprinkler is on, e.g. p(G=green∣S=on)=100%, and how likely it is that the grass becomes green when the sprinkler is off, e.g. p(G=green∣S=off)=30%.
By multiplying the distributions together, we get a joint probability distribution p(S,G)=p(S)p(G∣S) that describes the likelihood of any combination of outcomes. Joint probability distribution are the foundation of standard probability theory, and can be used to answer questions such as “what is the likelihood that the sprinkler is on, given that I observe that the grass is wet?”
An intervention on a system changes one or more causal mechanisms. For example, an intervention that turns the sprinkler on corresponds to replacing the causal mechanism p(S) for the sprinkler, with a new mechanism 1(S=on) that always has the sprinkler on. The effects of the intervention can be computed from the updated joint distribution p(S,G∣do(S=on))=1(S=on)intervenedmechanismp(G∣S) where do(S=on) denotes the intervention.
Note that it would not be possible to compute the effect of the intervention from just the joint probability distribution p(S,G), as without the causal graph, there’d be no way to tell whether a mechanism should be changed in the factorisation P(S)P(G∣S) or inp(G)p(S∣G).
Ultimately, all statistical correlations are due to casual influences. Hence, for a set of variables there is always some CBN that represents the underlying causal structure of the data generating process, though extra variables may be needed to explain e.g. unmeasured confounders.
Counterfactuals
Suppose that the sprinkler is on and the grass is green. Would the grass have been green had the sprinkler not been on? Questions about counterfactuals like these are harder than questions about interventions, because they involve reasoning across multiple worlds. Counterfactuals are key to defining e.g. harm, intent, fairness, and impact measures, as they all depend on comparing outcomes across hypothetical worlds.
To handle such reasoning, structural causal models (SCMs) refine CBNs in three important ways. First, background context that is shared across hypothetical worlds is explicitly separated from variables that can be intervened and vary across the worlds. The former are called exogenous variables, and the latter endogenous. For our question, it will be useful to introduce an exogenous variable R for whether it rains or not. The sprinkler and the grass are endogenous variables.
The relationship between hypothetical worlds can be represented with a twin-graph, where there are two copies of the endogenous variables for actual and hypothetical worlds, and the exogenous variable(s) provide shared context:
Second, SCMs introduce notation to distinguish endogenous variables in different hypothetical worlds. For example, GS=off denotes grass greenness in the hypothetical world where the sprinkler is off. It can be read as shorthand for “G∣do(S=off)”, and has the benefit that it can occur in expressions involving variables from other worlds. For example, our question can be expressed as p(GS=off=greenhypothetical∣S=on,G=greenactual observations).
Third, SCMs require all endogenous variables to have deterministic causal mechanisms. This is satisfied in our case if we assume that the sprinkler is on whenever it’s not raining, and the grass becomes green (only) if it rains or the sprinkler is on.
The determinism means that conditioning is as simple as updating the distribution over exogenous variables, e.g.P(R) updates to P(R∣S=on,G=green). In our case, the probability for rain decreases from 30% to 0%, since the sprinkler is never on if it’s raining.
This means our question is answered by the following reasoning steps:
Abduction: update P(R) to P(R∣S=on,G=green)
Intervention: intervene to turn the sprinkler off, do(S=off)
Prediction: compute the value of G in the updated model.
Equivalently, in one formula:
p(GS=off=green∣S=on,G=green)original question==∑r∈{yes,no}s∈{on,off}p(R=r∣S=on,G=green)update exogenous1(S=off)intervenep(G=green∣R=r,S=s)predict=0.
That is, we can say that the grass would not have been green if the sprinkler had been off (under the assumption we’ve made about the specific relationships).
SCMs are strictly more powerful than CBNs. Their primary drawback is that they require deterministic relationships between endogenous variables, which are often hard to determine in practice. They’re also limited to non-backtracking counterfactuals, where hypothetical worlds are distinguished by interventions.
One agent
To infer Mr Jones’ intentions or incentives, or predict how his behaviour would adapt to changes in his model of the world, we need a causal influence diagrams (CID) that labels variables as chance, decision, or utility nodes. In our example, rain would be a chance node, the sprinkler a decision, and grass greenness a utility. Since rain is a parent of the sprinkler, Mr Jones observes it before making his decision. Graphically, chance nodes are rounded as before, decisions are rectangles, utilities are diamonds, and dashed edges denote observations:
The agent specifies causal mechanisms for its decisions, i.e. a policy, with the goal of maximising the sum of its utility nodes. In our example, an optimal policy would be to turn the sprinkler on when it’s not raining (the decision when it is raining doesn’t matter). Once a policy is specified, the CID defines a CBN.
In models with agents, there are two kinds of interventions, depending on whether agents get to adapt their policy to the intervention or not. For example, only if we informed Mr Jones about an intervention to the grass before he made his sprinkler decision, could he pick a different sprinkler policy. Both pre-policy and post-policy interventions can both be handled with the standard do-operator if we add so-called mechanism nodes to the model. More about these in the next post.
Multiple agents
Interaction between multiple agents can be modelled with causal games, in which each agent has a set of decision and utility variables.
To illustrate this, assume Mr Jones sometimes sows new grass. Birds like to eat the seeds, but cannot tell from afar whether there are any. They can only see whether Mr Jones is using the sprinkler, which is more likely when the grass is new. Mr Jones wants to water his lawn if it’s new, but does not want the birds to eat his seeds. This signalling game has the following structure:
Beyond modelling causality better, causal games also have some other advantages over standard extensive-form games (EFGs). For example, the causal game immediately shows that the birds are indifferent to whether Mr Jones waters the grass or not, because the only directed path from the sprinkler S to food F goes via the birds’ own decision B. In an EFG, this information would be hidden in the payoffs. By explicitly representing independencies, causal games can sometimes find more subgames and rule out more non-credible threats than EFGs. A causal game can always be converted to an EFG.
Analogously to the distinction between joint probability distributions, CBNs, and SCMs, there are (multi-agent) influence diagrams that include agents in graphs that need not be causal, and structural causal influence models and structural causal games that combine agents with exogenous nodes and determinism to answer counterfactual questions.
Summary
This post introduced models that can answer correlational, interventional and counterfactual questions, and that can handle zero, one, or many agents. All in all, there are nine possible kinds of models. For more comprehensive introductions to causal models, see Section 2 of Reasoning about causality in games, and Pearl’s book A Primer.
Next post. CIDs and causal games are used to model agent(s). But, what is an agent? In the next post, we take a deeper look at what agents are by looking at some characteristics shared by all agentic systems.