Discovering agents provide a genuine causal, interventionist account of agency and an algorithm to detect them, motivated by the intentional stance. I find this paper very enlightening from a conceptual perspective!
I’ve tried to think of problems that needed to be solved before we can actually implement this on real systems—both conceptual and practical—on approximate order of importance.
There are no ‘dynamics,’ no learning. As soon as a mechanism node is edited, it is assumed that agents immediately change their ‘object decision variable’ (a conditional probability distribution given its object parent nodes) to play the subgame equilibria.
Assumption of factorization of variables into ‘object’ / ‘mechanisms,’ and the resulting subjectivity. The paper models the process by which an agent adapts its policy given changes in the mechanism of the environment via a ‘mechanism decision variable’ (that depends on its mechanism parent nodes), which modulates the conditional probability distribution of its child ‘object decision variable’, the actual policy.
For example, the paper says a learned RL policy isn’t an agent, because interventions in the environment won’t make it change its already-learned policy—but that a human or a RL policy together with its training process is an agent, because it can adapt. Is this reasonable?
Say I have a gridworld RL policy that’s learned to get cheese (3 cell world, cheese always on left) by always going to the left. Clearly it can’t change its policy when I change the cheese distribution to favor right, so it seems right to call this not an agent.
Now, say the policy now has sensory access to the grid state, and correctly generalized (despite only being trained on left-cheese) to move in the direction where it sees the cheese, so when I change the cheese distribution, it adapts accordingly. I think it is right to call this an agent?
Now, say the policy is an LLM agent (static weight) on an open world simulation which reasons in-context. I just changed the mechanism of the simulation by lowering the gravity constant, and the agent observes this, reasons in-context, and adapts its sensorimotor policy accordingly. This is clearly an agent?
I think this is because the paper considers, in the case of the RL policy alone, the ‘object policy’ to be the policy of the trained neural network (whose induced policy distribution is definitionally fixed), and the ‘mechanism policy’ to be a trivial delta function assigning the already-trained object policy. And in the case of the RL policy together with its training process, the ‘mechanism policy’ is now defined as the training process that assigns the fully-trained conditional probability distribution to the object policy.
But what if the ‘mechanism policy’ was the in-context learning process by which it induces an ‘object policy’? Then changes in the environment’s mechanism can be related to the ‘mechanism policy’ and thus the ‘object policy’ via in-context learning as in the second and third example, making them count as agents.
Ultimately, the setup in the paper forces us to factorize the means-by-which-policies-adapt into mechanism vs object variables, and the results (like whether a system is to be considered an agent) depends on this factorization. It’s not always clear what the right factorization is, how to discover them from data, or if this is the right frame to think about the problem at all.
Implicit choice of variables that are convenient for agent discovery. The paper does mention that the algorithm is dependent in the choice of the variable, as in: if the node corresponding to the ‘actual agent decision’ is missing but its children is there, then the algorithm will label its children to be the decision nodes. But this is already a very convenient representation!
Prototypical example: Minecraft world with RL agents interacting represented as a coarse-grained lattice (dynamical Bayes Net?) with each node corresponding to a physical location and its property, like color. Clearly no single node here is an agent, because agents move! My naive guess is that in principle, everything will be labeled an agent.
So the variables of choice must be abstract variables of the underlying substrate, like functions over them. But then, how do you discover the right representation automatically, in a way that interventions in the abstract variable level can faithfully translate to actually performable interventions in the underlying substrate?
Given the causal graph, even the slightest satisfaction of the agency-criterion labels the nodes as decision / utility. No “degree-of-agency”—maybe by summing over the extent to which the independencies fail to satisfy?
Then different agents are defined as causally separated chunks (~connected component) of [set-of-decision-nodes / set-of-utility-nodes]. How do we accommodate hierarchical agency (like subagents), systems with different degrees of agency, etc?
The interventional distribution on the object/mechanism variables are converted into a causal graph using the obvious [perform-do()-while-fixing-everything-else] algorithm. My impression is that causal discovery doesn’t really work in practice, especially in noisy reality with a large number of variables via gazillion conditional independence tests.
The correctness proof requires lots of unrealistic assumptions, e.g., agents always play subgame equilibria, though I think some of this can be relaxed.
Discovering agents provide a genuine causal, interventionist account of agency and an algorithm to detect them, motivated by the intentional stance. I find this paper very enlightening from a conceptual perspective!
I’ve tried to think of problems that needed to be solved before we can actually implement this on real systems—both conceptual and practical—on approximate order of importance.
There are no ‘dynamics,’ no learning. As soon as a mechanism node is edited, it is assumed that agents immediately change their ‘object decision variable’ (a conditional probability distribution given its object parent nodes) to play the subgame equilibria.
Assumption of factorization of variables into ‘object’ / ‘mechanisms,’ and the resulting subjectivity. The paper models the process by which an agent adapts its policy given changes in the mechanism of the environment via a ‘mechanism decision variable’ (that depends on its mechanism parent nodes), which modulates the conditional probability distribution of its child ‘object decision variable’, the actual policy.
For example, the paper says a learned RL policy isn’t an agent, because interventions in the environment won’t make it change its already-learned policy—but that a human or a RL policy together with its training process is an agent, because it can adapt. Is this reasonable?
Say I have a gridworld RL policy that’s learned to get cheese (3 cell world, cheese always on left) by always going to the left. Clearly it can’t change its policy when I change the cheese distribution to favor right, so it seems right to call this not an agent.
Now, say the policy now has sensory access to the grid state, and correctly generalized (despite only being trained on left-cheese) to move in the direction where it sees the cheese, so when I change the cheese distribution, it adapts accordingly. I think it is right to call this an agent?
Now, say the policy is an LLM agent (static weight) on an open world simulation which reasons in-context. I just changed the mechanism of the simulation by lowering the gravity constant, and the agent observes this, reasons in-context, and adapts its sensorimotor policy accordingly. This is clearly an agent?
I think this is because the paper considers, in the case of the RL policy alone, the ‘object policy’ to be the policy of the trained neural network (whose induced policy distribution is definitionally fixed), and the ‘mechanism policy’ to be a trivial delta function assigning the already-trained object policy. And in the case of the RL policy together with its training process, the ‘mechanism policy’ is now defined as the training process that assigns the fully-trained conditional probability distribution to the object policy.
But what if the ‘mechanism policy’ was the in-context learning process by which it induces an ‘object policy’? Then changes in the environment’s mechanism can be related to the ‘mechanism policy’ and thus the ‘object policy’ via in-context learning as in the second and third example, making them count as agents.
Ultimately, the setup in the paper forces us to factorize the means-by-which-policies-adapt into mechanism vs object variables, and the results (like whether a system is to be considered an agent) depends on this factorization. It’s not always clear what the right factorization is, how to discover them from data, or if this is the right frame to think about the problem at all.
Implicit choice of variables that are convenient for agent discovery. The paper does mention that the algorithm is dependent in the choice of the variable, as in: if the node corresponding to the ‘actual agent decision’ is missing but its children is there, then the algorithm will label its children to be the decision nodes. But this is already a very convenient representation!
Prototypical example: Minecraft world with RL agents interacting represented as a coarse-grained lattice (dynamical Bayes Net?) with each node corresponding to a physical location and its property, like color. Clearly no single node here is an agent, because agents move! My naive guess is that in principle, everything will be labeled an agent.
So the variables of choice must be abstract variables of the underlying substrate, like functions over them. But then, how do you discover the right representation automatically, in a way that interventions in the abstract variable level can faithfully translate to actually performable interventions in the underlying substrate?
Given the causal graph, even the slightest satisfaction of the agency-criterion labels the nodes as decision / utility. No “degree-of-agency”—maybe by summing over the extent to which the independencies fail to satisfy?
Then different agents are defined as causally separated chunks (~connected component) of [set-of-decision-nodes / set-of-utility-nodes]. How do we accommodate hierarchical agency (like subagents), systems with different degrees of agency, etc?
The interventional distribution on the object/mechanism variables are converted into a causal graph using the obvious [perform-do()-while-fixing-everything-else] algorithm. My impression is that causal discovery doesn’t really work in practice, especially in noisy reality with a large number of variables via gazillion conditional independence tests.
The correctness proof requires lots of unrealistic assumptions, e.g., agents always play subgame equilibria, though I think some of this can be relaxed.