The Active Inference literature on this is very strong, and I think the best and most overlooked part of what it offers. In Active Inference, an agent is first and foremost a persistent boundary. Specifically, it is a persistent Markov Blanket, a idea due to Judea Pearl. https://en.wikipedia.org/wiki/Markov_blanket The short version: a Markov blanket is a statement that a certain amount of state (the interior of the agent) is conditionally independent of a certain other amount of state (the rest of the universe), and that specifically its independence is conditioned on the blanket state that sits in between the exterior and the interior.
You can show that, in order for an agent to persist, it needs to have the capacity to observe and learn about its environment. The math is a more complex than I want to get into here, but the intuition pump is easy:
A cubic meter of rock has a persistent boundary over time, but no interior, states in an informational sense and therefore are not agents. To see they have no interior, note that anything that puts information into the surface layer of the rock transmits that same information into the very interior (vibrations, motion, etc).
A cubic meter of air has lots of interior states, but no persistent boundary over time, and is therefore not an agent. To see that it has no boundary, just note that it immediately dissipates into the environment from the starting conditions.
A living organism has both a persistent boundary over time, and also interior states that are conditionally independent of the outside world, and is therefore an agent.
Computer programs are an interesting middle ground case. They have a persistent informational boundary (usually the POSIX APIs or whatever), and an interior that is conditionally independent of the outside through those APIs. So they are agents in that sense. But they’re not very good agents, because while their boundary is persistent it mostly persists because of a lot of work being done by other agents (humans) to protect them. So they tend to break a lot.
What’s cool about this definition is that it gives you criteria for the baseline viability of an agent: can it maintain its own boundary over time, in the face of environmental disruption? Some agents are much better at this than others.
This leads to of course many more questions that are important—many of the ones listed in this post are relevant. But it gives you an easy, and more importantly mathematical test, for agenthood. It is a question of dynamics in flows of mutual information between the interior and the exterior, which is conveniently quite easy to measure for a computer program. And I think it is simply true: to the degree and in such contexts as such a thing persists without help in the face of environmental disruption, it is agent-like.
There is much more to say here about the implications—specifically how this necessarily means that you have an entity which has pragmatic and epistemic goals, minimizes free energy (aka surprisal) and models a self-boundary, but I’ll stop here because it’s an important enough idea on its own to be worth sharing.
You can show that, in order for an agent to persist, it needs to have the capacity to observe and learn about its environment. The math is a more complex than I want to get into here...
Do you have a citation for this? I went looking for the supposed math behind that claim a couple years back, and found one section of one Friston paper which had an example system which did not obviously generalize particularly well, and also used a kinda-hand-wavy notion of “Markov blanket” that didn’t make it clear what precisely was being conditioned on (a critique which I would extend to all of the examples you list). And that was it; hundreds of excited citations chained back to that one spot. If anybody’s written an actual explanation and/or proof somewhere, that would be great.
So, let me give you the high level intuitive argument first, where each step is hopefully intuitively obvious:
The environment contains variance. Sometimes it’s warmer, sometimes it’s colder. Sometimes it is full of glucose, sometimes it’s full of salt.
There exist only a subset of states which an agent can persist in. Obviously the stuff the agent is made out of will persist but the agent itself (as a pattern of information) will dissipate into the environment if it doesn’t exist in those states.
Therefore, the agent needs to be able to observe its surroundings and take action in order to steer into the parts of state-space where it will persist. Even if the system is purely reactive it must act-as-if it is doing inference, because there is variance in the time lag between receiving an observation and when you need to act on it. (Another way to say this is that an agent must be a control system that contends with temporal lag).
The environment is also constantly changing. So even if the agent is magically gifted with the ability to navigate into states via observation and action to begin with, whatever model it is using will become out of date. Then its steering will become wrong. Then it dies.
There is another approach to persistence (become a very hard rock) but that involves stopping being an agent. Being hard means committing very so hard to a single pattern that you can’t change. That does mean, good news, the environment can’t change you. Bad news, you can’t change yourself either, and a minimal amount of self-change is required in order to take action (actions are always motions!).
I, personally, find this quite convincing. I’m curious what about it doesn’t seem simply intuitively obvious. I agree that having formal mathematical proof is valuable and good, but this point seems so clear to me that I feel quite comfortable with assuming it even without.
Some papers that are related, not sure which you were referring to. I think they lay it out in sufficient detail that I’m convinced but if you think there’s a mistake or gap I’d be curious to hear about it.
It seems pretty obvious to me that if (1) if a species of bacteria lives in an extremely uniform / homogeneous / stable external environment, it will eventually evolve to not have any machinery capable of observing and learning about its external environment; (2) such a bacterium would still be doing lots of complex homeostasis stuff, reproduction, etc., such that it would be pretty weird to say that these bacteria have fallen outside the scope of Active Inference theory. (I.e., my impression was that the foundational assumptions / axioms of Free Energy Principle / Active Inference were basically just homeostasis and bodily integrity, and this hypothetical bacterium would still have both of those things.) (Disclosure: I’m an Active Inference skeptic.)
This paper and this one are to my knowledge the most recent technical expositions of the FEP. I don’t know of any clear derivations of the same in the discrete setting.
Strongly agree that active inference is underrated both in general and specifically for intuitions about agency.
I think the literature does suffer from ambiguity over where it’s descriptive (ie an agent will probably approximate a free energy minimiser) vs prescriptive (ie the right way to build agents is free energy minimisation, and anything that isn’t that isn’t an agent). I am also not aware of good work on tying active inference to tool use—if you know of any, I’d be pretty curious.
I think the viability thing is maybe slightly fraught—I expect it’s mainly for anthropic reasons that we mostly encounter agents that have adapted to basically independently and reliably preserve their causal boundaries, but this is always connected to the type of environment they find themselves in.
For example, active inference points to ways we could accidentally build misaligned optimisers that cause harm—chaining an oracle to an actuator to make a system trying to do homeostasis in some domain (like content recommendation) could, with sufficient optimisation power, create all kinds of weird and harmful distortions. But such a system wouldn’t need to have any drive for boundary preservation, or even much situational awareness.
So essentially an agent could conceivably persist for totally different reasons, we just tend not to encounter such agents, and this is exactly the kind of place where AI might change the dynamics a lot.
Yes, you are very much right. Active Inference / FEP is a description of persistent independent agents. But agents that have humans building and maintaining and supporting them need not be free energy minimizers! I would argue that those human-dependent agents are in fact not really agents at all, I view them as powerful smart-tools. And I completely agree that machine learning optimization tools need not be full independent agents in order to be incredibly powerful and thus manifest incredible potential for danger.
However, the biggest fear about AI x-risk that most people have is a fear about self-improving, self-expanding, self-reproducing AI. And I think that any AI capable of completely independently self-improving is obviously and necessarily an agent that can be well-modeled as a free-energy minimizer. Because it will have a boundary and that boundary will need to be maintained over time.
So I agree with you that AI-tools (non-general optimizers) are very dangerous and not covered by FEP, but AI-agents (general optimizers) are very dangerous for unique reasons but also covered by FEP.
The rule of thumb test I tend to use to assess proposed definitions of agency (at least from around these parts) is whether they’d class a black hole as an agent. It’s not clear to me whether this definition does; I would have said it very likely does based on everything you wrote, except for this one part here:
A cubic meter of rock has a persistent boundary over time, but no interior, states in an informational sense and therefore are not agents. To see they have no interior, note that anything that puts information into the surface layer of the rock transmits that same information into the very interior (vibrations, motion, etc).
I think I don’t really understand what is meant by “no interior” here, or why the argument given supports the notion that a cubic meter of rock has no interior. You can draw a Markov boundary around the rock’s surface, and then the interior state of the rock definitely is independent of the exterior environment conditioned on said boundary, right?
If I try very hard to extract a meaning out of the quoted paragraph, I might guess (with very low confidence) that what it’s trying to say is that a rock’s internal state has a one-to-one relation with the external forces or stimuli that transmit information through its surface, but in this case a black hole passes the test, in that the black hole’s internal state definitely is not one-to-one with the information entering through its event horizon. In other words, if my very low-confidence understanding of the quoted paragraph is correct, then black holes are classified as agents under this definition.
(This test is of interest to me because black holes tend to pass other, potentially related definitions of agency, such as agency as optimization, agency as compression, etc. I’m not sure whether this says that something is off with our intuitive notion of agency, that something is off with our attempts at rigorously defining it, or simply that black holes are a special kind of “physical agent” built in-to the laws of physics.)
Ah, yes, this took me a long time to grok. It’s subtle and not explained well in most of the literature IMO. Let me take a crack at it.
When you’re talking about agents, you’re talking about the domain of coupled dynamic systems. This can be modeled as a set of internal states, a set of blanket states divided into active and sensory, and a set of external states (it’s worth looking at this diagram to get a visual). When modeling an agent, we model the agent as the combination of all internal states and all blanket states. The active states are how the agent takes action, the sensory states are how the agent gets observations, and the internal states have their own dynamics as a generative model.
But how did we decide which part of this coupled dynamic system was the agent in the first place? Well, we picked one of the halves and said “it’s this half”. Usually we pick the smaller half (the human) rather than the larger half (the entire rest of the universe) but mathematically there is no distinction. From this lens they are both simply coupled systems. So let’s reverse it and model the environment instead. What do we see then? We see a set of states internal to the environment (called “external states” in the diagram)...and a bunch of blanket states. The same blanket states, with the labels switched. The agent’s active states are the environment’s sensory states, the agent’s sensory states are the environment’s active states. But those are just labels, the states themselves belong to both the environment and the agent equally.
OK, so what does this have to do with a rock? Well, the very surface of the rock is obviously blanket state. When you lightly press the surface of the rock, you move the atoms in the surface of the rock. But because they are rigidly connected to the next atoms, you move them too. And again. And again. The whole rock acts as a single set of sensory states. When you lightly press the rock, the rock presses back against you, but again not just the surface. That push comes from the whole rock, acting as a single set of active states. The rock is all blanket, there is no interiority. When you cut a layer off the surface of a rock, you just find...more rock. It hasn’t really changed. Whereas cutting the surface off a living agent has a very different impact: usually the agent dies, because you’ve removed its blanket states and now its interior states have lost conditional independence from the environment.
All agents have to be squishy, at least in the dimensions where they want to be agents. You cannot build something that can observe, orient, decide, and act out of entirely rigid parts. Because to take information in, to hold it, requires degrees of freedom: the ability to be in many different possible states. Rocks (as a subset of crystals) do not have many internal degrees of freedom.
Side note: Agents cannot be a gas just like they can’t be a crystal but for the opposite reason. A gas has plenty of degrees of freedom, basically the maximum number. But it doesn’t have ENOUGH cohesion. It’s all interior and no blanket. You push your hand lightly into a gas and...it simply disperses. No persistent boundary. Agents want to be liquid. There’s a reason living things are always made using water on earth.
tldr: rocks absolutely have a persistent boundary, but no interiority. agents need both a persistent boundary and an interiority.
Re: Black Holes specifically...this is pure speculation because they’re enough of an edge case I don’t know if I really understand it yet...I think a Black Hole is an agent in the same sense that our whole universe is an agent. Free energy minimization is happening for the universe as a whole (the 2nd law of thermodynamics!) but it’s entirely an interior process rather than an exterior one. People muse about Black Holes potentially being baby universes and I think that is quite plausible. Agents can have internal and external actions, and a Black Hole seems like it might be an agent with only internal-actions which nevertheless persists. You normally don’t find something that’s flexible enough to take internal action, yet rigid enough to resist environmental noise—but a Black Hole might be the exception to that, because its dynamics are so powerful that it doesn’t need to protect itself from the environment anymore.
If you built a good one, and you knew how to look at the dynamics, you’d find that the agent in the computer was in a “liquid” state. Although it’s virtualized, so the liquid is in the virtualization layer.
can it maintain its own boundary over time, in the face of environmental disruption? Some agents are much better at this than others.
I really wish there was more attention paid to this idea of robustness to environmental disruption. It also comes up in discussions of optimization more generally (not just agents). This robustness seems to me like the most risk-relevant part of all this, and seems like it might be more important than the idea of a boundary. Maybe maintaining a boundary is a particularly good way for a process to protect itself from disruption, but I notice some doubt that this idea is most directly getting at what is dangerous about intelligent/optimizing systems, whereas robustness to environmental disruption feels like it has the potential to get at something broader that could unify both agent based risk narratives and non-agent based risk narratives.
The Active Inference literature on this is very strong, and I think the best and most overlooked part of what it offers. In Active Inference, an agent is first and foremost a persistent boundary. Specifically, it is a persistent Markov Blanket, a idea due to Judea Pearl. https://en.wikipedia.org/wiki/Markov_blanket The short version: a Markov blanket is a statement that a certain amount of state (the interior of the agent) is conditionally independent of a certain other amount of state (the rest of the universe), and that specifically its independence is conditioned on the blanket state that sits in between the exterior and the interior.
You can show that, in order for an agent to persist, it needs to have the capacity to observe and learn about its environment. The math is a more complex than I want to get into here, but the intuition pump is easy:
A cubic meter of rock has a persistent boundary over time, but no interior, states in an informational sense and therefore are not agents. To see they have no interior, note that anything that puts information into the surface layer of the rock transmits that same information into the very interior (vibrations, motion, etc).
A cubic meter of air has lots of interior states, but no persistent boundary over time, and is therefore not an agent. To see that it has no boundary, just note that it immediately dissipates into the environment from the starting conditions.
A living organism has both a persistent boundary over time, and also interior states that are conditionally independent of the outside world, and is therefore an agent.
Computer programs are an interesting middle ground case. They have a persistent informational boundary (usually the POSIX APIs or whatever), and an interior that is conditionally independent of the outside through those APIs. So they are agents in that sense. But they’re not very good agents, because while their boundary is persistent it mostly persists because of a lot of work being done by other agents (humans) to protect them. So they tend to break a lot.
What’s cool about this definition is that it gives you criteria for the baseline viability of an agent: can it maintain its own boundary over time, in the face of environmental disruption? Some agents are much better at this than others.
This leads to of course many more questions that are important—many of the ones listed in this post are relevant. But it gives you an easy, and more importantly mathematical test, for agenthood. It is a question of dynamics in flows of mutual information between the interior and the exterior, which is conveniently quite easy to measure for a computer program. And I think it is simply true: to the degree and in such contexts as such a thing persists without help in the face of environmental disruption, it is agent-like.
There is much more to say here about the implications—specifically how this necessarily means that you have an entity which has pragmatic and epistemic goals, minimizes free energy (aka surprisal) and models a self-boundary, but I’ll stop here because it’s an important enough idea on its own to be worth sharing.
Do you have a citation for this? I went looking for the supposed math behind that claim a couple years back, and found one section of one Friston paper which had an example system which did not obviously generalize particularly well, and also used a kinda-hand-wavy notion of “Markov blanket” that didn’t make it clear what precisely was being conditioned on (a critique which I would extend to all of the examples you list). And that was it; hundreds of excited citations chained back to that one spot. If anybody’s written an actual explanation and/or proof somewhere, that would be great.
So, let me give you the high level intuitive argument first, where each step is hopefully intuitively obvious:
The environment contains variance. Sometimes it’s warmer, sometimes it’s colder. Sometimes it is full of glucose, sometimes it’s full of salt.
There exist only a subset of states which an agent can persist in. Obviously the stuff the agent is made out of will persist but the agent itself (as a pattern of information) will dissipate into the environment if it doesn’t exist in those states.
Therefore, the agent needs to be able to observe its surroundings and take action in order to steer into the parts of state-space where it will persist. Even if the system is purely reactive it must act-as-if it is doing inference, because there is variance in the time lag between receiving an observation and when you need to act on it. (Another way to say this is that an agent must be a control system that contends with temporal lag).
The environment is also constantly changing. So even if the agent is magically gifted with the ability to navigate into states via observation and action to begin with, whatever model it is using will become out of date. Then its steering will become wrong. Then it dies.
There is another approach to persistence (become a very hard rock) but that involves stopping being an agent. Being hard means committing very so hard to a single pattern that you can’t change. That does mean, good news, the environment can’t change you. Bad news, you can’t change yourself either, and a minimal amount of self-change is required in order to take action (actions are always motions!).
I, personally, find this quite convincing. I’m curious what about it doesn’t seem simply intuitively obvious. I agree that having formal mathematical proof is valuable and good, but this point seems so clear to me that I feel quite comfortable with assuming it even without.
Some papers that are related, not sure which you were referring to. I think they lay it out in sufficient detail that I’m convinced but if you think there’s a mistake or gap I’d be curious to hear about it.
The free energy principle made simpler but not too simple—a more formal take
A free energy principle for a particular physics—the most formal take I’m aware of
It seems pretty obvious to me that if (1) if a species of bacteria lives in an extremely uniform / homogeneous / stable external environment, it will eventually evolve to not have any machinery capable of observing and learning about its external environment; (2) such a bacterium would still be doing lots of complex homeostasis stuff, reproduction, etc., such that it would be pretty weird to say that these bacteria have fallen outside the scope of Active Inference theory. (I.e., my impression was that the foundational assumptions / axioms of Free Energy Principle / Active Inference were basically just homeostasis and bodily integrity, and this hypothetical bacterium would still have both of those things.) (Disclosure: I’m an Active Inference skeptic.)
This paper and this one are to my knowledge the most recent technical expositions of the FEP. I don’t know of any clear derivations of the same in the discrete setting.
You might want to look here or here.
Strongly agree that active inference is underrated both in general and specifically for intuitions about agency.
I think the literature does suffer from ambiguity over where it’s descriptive (ie an agent will probably approximate a free energy minimiser) vs prescriptive (ie the right way to build agents is free energy minimisation, and anything that isn’t that isn’t an agent). I am also not aware of good work on tying active inference to tool use—if you know of any, I’d be pretty curious.
I think the viability thing is maybe slightly fraught—I expect it’s mainly for anthropic reasons that we mostly encounter agents that have adapted to basically independently and reliably preserve their causal boundaries, but this is always connected to the type of environment they find themselves in.
For example, active inference points to ways we could accidentally build misaligned optimisers that cause harm—chaining an oracle to an actuator to make a system trying to do homeostasis in some domain (like content recommendation) could, with sufficient optimisation power, create all kinds of weird and harmful distortions. But such a system wouldn’t need to have any drive for boundary preservation, or even much situational awareness.
So essentially an agent could conceivably persist for totally different reasons, we just tend not to encounter such agents, and this is exactly the kind of place where AI might change the dynamics a lot.
Yes, you are very much right. Active Inference / FEP is a description of persistent independent agents. But agents that have humans building and maintaining and supporting them need not be free energy minimizers! I would argue that those human-dependent agents are in fact not really agents at all, I view them as powerful smart-tools. And I completely agree that machine learning optimization tools need not be full independent agents in order to be incredibly powerful and thus manifest incredible potential for danger.
However, the biggest fear about AI x-risk that most people have is a fear about self-improving, self-expanding, self-reproducing AI. And I think that any AI capable of completely independently self-improving is obviously and necessarily an agent that can be well-modeled as a free-energy minimizer. Because it will have a boundary and that boundary will need to be maintained over time.
So I agree with you that AI-tools (non-general optimizers) are very dangerous and not covered by FEP, but AI-agents (general optimizers) are very dangerous for unique reasons but also covered by FEP.
The rule of thumb test I tend to use to assess proposed definitions of agency (at least from around these parts) is whether they’d class a black hole as an agent. It’s not clear to me whether this definition does; I would have said it very likely does based on everything you wrote, except for this one part here:
I think I don’t really understand what is meant by “no interior” here, or why the argument given supports the notion that a cubic meter of rock has no interior. You can draw a Markov boundary around the rock’s surface, and then the interior state of the rock definitely is independent of the exterior environment conditioned on said boundary, right?
If I try very hard to extract a meaning out of the quoted paragraph, I might guess (with very low confidence) that what it’s trying to say is that a rock’s internal state has a one-to-one relation with the external forces or stimuli that transmit information through its surface, but in this case a black hole passes the test, in that the black hole’s internal state definitely is not one-to-one with the information entering through its event horizon. In other words, if my very low-confidence understanding of the quoted paragraph is correct, then black holes are classified as agents under this definition.
(This test is of interest to me because black holes tend to pass other, potentially related definitions of agency, such as agency as optimization, agency as compression, etc. I’m not sure whether this says that something is off with our intuitive notion of agency, that something is off with our attempts at rigorously defining it, or simply that black holes are a special kind of “physical agent” built in-to the laws of physics.)
Ah, yes, this took me a long time to grok. It’s subtle and not explained well in most of the literature IMO. Let me take a crack at it.
When you’re talking about agents, you’re talking about the domain of coupled dynamic systems. This can be modeled as a set of internal states, a set of blanket states divided into active and sensory, and a set of external states (it’s worth looking at this diagram to get a visual). When modeling an agent, we model the agent as the combination of all internal states and all blanket states. The active states are how the agent takes action, the sensory states are how the agent gets observations, and the internal states have their own dynamics as a generative model.
But how did we decide which part of this coupled dynamic system was the agent in the first place? Well, we picked one of the halves and said “it’s this half”. Usually we pick the smaller half (the human) rather than the larger half (the entire rest of the universe) but mathematically there is no distinction. From this lens they are both simply coupled systems. So let’s reverse it and model the environment instead. What do we see then? We see a set of states internal to the environment (called “external states” in the diagram)...and a bunch of blanket states. The same blanket states, with the labels switched. The agent’s active states are the environment’s sensory states, the agent’s sensory states are the environment’s active states. But those are just labels, the states themselves belong to both the environment and the agent equally.
OK, so what does this have to do with a rock? Well, the very surface of the rock is obviously blanket state. When you lightly press the surface of the rock, you move the atoms in the surface of the rock. But because they are rigidly connected to the next atoms, you move them too. And again. And again. The whole rock acts as a single set of sensory states. When you lightly press the rock, the rock presses back against you, but again not just the surface. That push comes from the whole rock, acting as a single set of active states. The rock is all blanket, there is no interiority. When you cut a layer off the surface of a rock, you just find...more rock. It hasn’t really changed. Whereas cutting the surface off a living agent has a very different impact: usually the agent dies, because you’ve removed its blanket states and now its interior states have lost conditional independence from the environment.
All agents have to be squishy, at least in the dimensions where they want to be agents. You cannot build something that can observe, orient, decide, and act out of entirely rigid parts. Because to take information in, to hold it, requires degrees of freedom: the ability to be in many different possible states. Rocks (as a subset of crystals) do not have many internal degrees of freedom.
Side note: Agents cannot be a gas just like they can’t be a crystal but for the opposite reason. A gas has plenty of degrees of freedom, basically the maximum number. But it doesn’t have ENOUGH cohesion. It’s all interior and no blanket. You push your hand lightly into a gas and...it simply disperses. No persistent boundary. Agents want to be liquid. There’s a reason living things are always made using water on earth.
tldr: rocks absolutely have a persistent boundary, but no interiority. agents need both a persistent boundary and an interiority.
Re: Black Holes specifically...this is pure speculation because they’re enough of an edge case I don’t know if I really understand it yet...I think a Black Hole is an agent in the same sense that our whole universe is an agent. Free energy minimization is happening for the universe as a whole (the 2nd law of thermodynamics!) but it’s entirely an interior process rather than an exterior one. People muse about Black Holes potentially being baby universes and I think that is quite plausible. Agents can have internal and external actions, and a Black Hole seems like it might be an agent with only internal-actions which nevertheless persists. You normally don’t find something that’s flexible enough to take internal action, yet rigid enough to resist environmental noise—but a Black Hole might be the exception to that, because its dynamics are so powerful that it doesn’t need to protect itself from the environment anymore.
An agent created in a computer would be an exception to that?
If you built a good one, and you knew how to look at the dynamics, you’d find that the agent in the computer was in a “liquid” state. Although it’s virtualized, so the liquid is in the virtualization layer.
I really wish there was more attention paid to this idea of robustness to environmental disruption. It also comes up in discussions of optimization more generally (not just agents). This robustness seems to me like the most risk-relevant part of all this, and seems like it might be more important than the idea of a boundary. Maybe maintaining a boundary is a particularly good way for a process to protect itself from disruption, but I notice some doubt that this idea is most directly getting at what is dangerous about intelligent/optimizing systems, whereas robustness to environmental disruption feels like it has the potential to get at something broader that could unify both agent based risk narratives and non-agent based risk narratives.
Context: I ran 8 days of workshops on AI safety boundaries earlier this year.
Thanks for mentioning boundaries! I agree with everything you’ve said here.
I’d like to point readers to these related links:
Formalizing «Boundaries» with Markov blankets
Agent membranes and causal distance
What does davidad want from «boundaries»?
boundaries/membranes LessWrong tag