For instance, if I’m planning a party, then the actions I take now are far away in time (and probably also space) from the party they’re optimizing. The “intermediate layers” might be snapshots of the universe-state at each time between the actions and the party. (… or they might be something else; there are usually many different ways to draw intermediate layers between far-apart things.)
This applies surprisingly well even in situations like reinforcement learning, where we don’t typically think of the objective as “far away” from the agent. If I’m a reinforcement learner optimizing for some reward I’ll receive later, that later reward is still typically far away from my current actions. My actions impact the reward via some complicated causal path through the environment, acting through many intermediate layers.
So we’ve ruled out agents just “optimizing” their own actions. How does this solve the other two problems?
I feel like this is assuming away one of the crucial difficulties of ascribing agency and goal-directedness: lack of competence or non optimality might make agentic behavior look non-agentic unless you already have a mechanistic interpretation. Separating a rock from a human is not really the problem; it’s more like separating something acting like a chimp but for which you have very little data and understanding, and an agent optimizing to clip you.
(Not saying that this can’t be relevant to address this problem, just that currently you seem to assume the problem away)
Because the agent only interacts with the far away things-it’s-optimizing via a relatively-small summary, it’s natural to define the “actions” and “observations” as the contents of the summary flowing in either direction, rather than all the low-level interactions flowing through the agent’s supposed “Cartesian boundary”. That solves the microscopic interactions problem: all the random bumping between my hair/skin and air molecules mostly doesn’t impact things far away, except via a few summary variables like temperature and pressure.
Hmm. I like the idea of redefining action as the consequences of one’s action that are observable “far away” — it nicely rederives the observation-action loop through interaction with far away variables. That being said, I’m confused if defining the observations in the summary statistics itself is not problematic. I have one intuition that tells me that this is all you can observe anyway, so it’s fine; on the other hand, it looks like you’re assuming that the agent has the right ontology already? I guess that can be solved by saying that the observations are on the content of the summary, but not necessarily all of it.
When Adam Shimi first suggested to me a couple years ago that “optimization far away” might be important somehow, one counterargument I raised was dynamic programming (DP): if the agent is optimizing an expected utility function over something far away, then we can use DP to propagate the expected utility function back through the intermediate layers to find an equivalent utility function over the agent’s actions:
u′(A)=E[u(X)|do(A)]
This isn’t actually a problem, though. It says that optimization far away is equivalent to some optimization nearby. But the reverse does not necessarily hold: optimization nearby is not necessarily equivalent to some optimization far away. This makes sense: optimization nearby is a trivial condition which matches basically any system, and therefore will match the interesting cases as well as the uninteresting cases.
I think I actually remember now the discussion we were having, and I recall an intuition about counting. Like, there seem to be more ways to optimize nearby than to optimize the specific part of far away, which I guess is what you’re pointing at.
Great post!
I feel like this is assuming away one of the crucial difficulties of ascribing agency and goal-directedness: lack of competence or non optimality might make agentic behavior look non-agentic unless you already have a mechanistic interpretation. Separating a rock from a human is not really the problem; it’s more like separating something acting like a chimp but for which you have very little data and understanding, and an agent optimizing to clip you.
(Not saying that this can’t be relevant to address this problem, just that currently you seem to assume the problem away)
Hmm. I like the idea of redefining action as the consequences of one’s action that are observable “far away” — it nicely rederives the observation-action loop through interaction with far away variables. That being said, I’m confused if defining the observations in the summary statistics itself is not problematic. I have one intuition that tells me that this is all you can observe anyway, so it’s fine; on the other hand, it looks like you’re assuming that the agent has the right ontology already? I guess that can be solved by saying that the observations are on the content of the summary, but not necessarily all of it.
I think I actually remember now the discussion we were having, and I recall an intuition about counting. Like, there seem to be more ways to optimize nearby than to optimize the specific part of far away, which I guess is what you’re pointing at.