An environment for studying counterfactuals
Summary: I introduce a decision theory framework where successful agents are those with good counterfactuals.
Motivation
The problem of logical counterfactuals is how to define probabilities when is known to be false. (I’ll ignore more general counterfactuals in this post.)
The theory of logical induction provides a joint distribution over sentences, so the problem becomes: How do you condition on when has negligible probability?
Exploration tries to solve this by making sure that never has negligible probability. But it doesn’t work in problems like Agent Simulates Predictor that contain predictors who can’t tell when the agent explores.
A better solution is early exploration, which uses an early stage of the logical inductor to do exploration. But then the later stages of the inductor know that is false, and we’re back where we started.
I’m going to describe an environment that captures these features of the problem — it’s got reflection, early exploration, counterfactuals, and a Bayesian update that stands in for the evolution of a logical inductor.
Informal definition
The agent outputs counterfactual distributions . This determines an expected utility for each action. Most of the time, an action is chosen for the agent that maximizes this expected utility. But a small fraction of the time, an exploration action is chosen instead.
The agent receives an observation as input, from which it can infer whether exploration will occur. The agent also receives a prior as input, and this prior accurately reflects the behavior of the agent as a function of and . (This uses a fixed-point theorem.)
If action is chosen, then the counterfactual is factual; the rest are counterfactual. We judge an agent according to how accurate its factual counterfactual is, in addition to how much utility it gets.
Here’s an agent that does okay in this environment: It adopts as its epistemic state and ignores . Because of exploration, it can compute counterfactuals by conditioning. This agent does okay but not great, since it ignores .
You could try to make a better agent as follows: Adopt as a prior and then do a Bayesian update on . But now you’ve inferred whether exploration occurs, so some actions have probability zero, and it’s not clear how to compute counterfactuals.
If you find a good agent for this environment, you’ll probably have learned something about making good counterfactuals.
Formal definition
A decision problem consists of a tuple of random variables:
- is a finite set of actions.
- determines whether the agent explores. If , no exploration takes place.
- is a finite set of observations.
- is the space of utilities.
- is the space of distributions over the above variables.
- represents an infinite source of random bits that the agent can use.
- is the agent’s output, representing a counterfactual distribution for each action.
along with some likelihoods:
- for all , and , for some choice of .
Likelihoods , depending on the problem.
i.i.d. uniform distributions on each bit of .
- is mostly determined by and as follows: If , then . (If there is a tie, is undetermined.) Otherwise, .
A distribution over conditional on each value of and , depending on the problem.
An agent is a function . A decision and an agent together almost determine a joint distribution over all the variables. What’s missing is and tiebreakers for . These are determined by finding a fixed point satisfying:
If is the resulting marginal distribution over , then .
For each and , there is a distribution over the set such that is sampled from that distribution.
(I might prove the existence of a fixed point in a comment.)
We’ll informally say that an agent does well on a decision problem if, for every fixed point, the following are true:
- is high.
The factual counterfactual is accurate — say, it’s close to the marginal over conditional on the true action in total variation distance: .
Future work
I have an idea for defining an optimal agent for every decision problem in this family; I’ll explore that in another post.
Once we find a general solution, we’d ideally transfer it to the setting of logical induction, and then we’d have logical counterfactuals.
I’m pretty sure exploration is a hack. Trembling hand shouldn’t be required for good decisions. The right decision theory should make do with the natural amount of uncertainty: “I’m not sure what I’ll do because I haven’t finished thinking and could still stumble on a good argument for any of the options.” That’s the kind of thing I’d want to see formalized.
If exploration is a hack, then why do pretty much all multi-armed bandit algorithms rely on exploration into suboptimal outcomes to prevent spurious underestimates of the value associated with a lever?
The multi-armed bandit problem is a many-round problem in which actions in early rounds provide information that is useful for later rounds, so it makes sense to explore to gain this information. That’s different from using exploration in one-shot problems to make the counterfactuals well-defined, which is a hack.
I agree exploration is a hack. I think exploration vs. other sources of non-dogmatism is orthogonal to the question of counterfactuals, so I’m happy to rely on exploration for now.
“The agent receives an observation O as input, from which it can infer whether exploration will occur.”—I’m confused here. What is this observation? Does it purely relate to whether it will explore or does it also provide data about the universe? And is it merely a correlation or a binary yes or no?
The observation can provide all sorts of information about the universe, including whether exploration occurs. The exact set of possible observations depends on the decision problem.
E and O can have any relationship, but the most interesting case is when one can infer E from O with certainty.