By the end of this section, I want you to understand the following diagram (Pearlian causal diagram):
Also: I will assume a basic familiarity with Markov chains in this post.
First, I want you to imagine a simple Markov chain that represents the fact that a human influences itself over time:
Second, I want you to imagine a Markov chain that represents the fact that the environment[1] influences itself over time:
Okay. Now, notice that in between the human and its environment there’s some kind of boundary. For example, their skin (a physical boundary) and their interpretation/cognition (an informational boundary). If this were not a human but instead a bacterium, then the boundary I mean would (mostly) be the bacterium’s cell membrane.
Third, imagine a Markov chain that represents that boundary influencing itself over time:
Okay, so we have these three Markov chains running in parallel:
But they also influence each other, so let’s build that into the model, too.
How should they be connected?
Well, how does the environment affect a human?
Ok, so I want you to notice that when an environment affects a human, it doesn’t influence them directly, but instead it influences their skin or their cognition (their boundary), and then their boundary influences them.
For example, I shine light in your eyes (part of your environment), it activates your eyes (part of your boundary), and your eyes send information to your brain (part of your insides).
Which is to say, this is what does not happen:
(This is called “infiltration”.) The environment does not directly influence the human.
Instead, the environment influences the boundary which influences them, which looks like this:
The environment influences your skin and your senses, and your skin and senses influence you.
Okay, now let’s do the other direction. How does a human influence their environment?
It’s not that a human controls the environment directly…
(This is called “exfiltration”; this does not happen.)
…but that the human takes actions (via their boundary), and then their actions affect the environment:
For example, it’s not that the environment “reads your mind” directly, but rather that you express yourself and then others read your words and actions.
Okay.
Now, putting together both of directions of human-influences-environment and environment-influences-human, we get this:
Also, I want you to notice which arrows that are conspicuously missing from the diagram above:
Please compare this diagram to the one before it.
So that’s how we can model the approximate causal separation between an agent and the environment.
Defining boundary violations
Finally, we can define boundary violations as exactly this:
Boundary violations are infiltration across human Markov blankets.[2]
Leakage and leakage minimization
Of course, in reality, there’s actually leakage and the ‘real’ Markov blanket between any human and their environment does include the arrows I said were missing.
For example, viruses in the air might influence me in ways I can’t control or prevent. Similarly, my brain waves are emanating out into the fields around me.
However, humans are agents that are actively minimizing that leakage. For example:
You don’t want to be directly controlled by your environment. (You don’t want infiltration.)
Instead, you want to take in information and then be able to decide what to do with it. You want to have a say about how things affect you.
A bacterium wants things to go through its gates and ion channels, and not just pierce its membrane.
If I could cheaply improve my boundary’s immunity to viruses, I would.
Humans are embedded agents (of course). However, humans are also actively seeking to de-embed themselves from the environment and make themselves independent from the environment.
You don’t want the way that you’re influencing the world to be by people mind-reading you. (Exfiltration[3])
Instead, you want to be affecting the world intentionally, through your actions.
If you believed that someone might be able to predict you well or get close to predicting you well and you don’t want that, you would probably take evasive maneuvers.
Even if this works, how would the AI system detect the Markov blankets?
Perhaps by doing causal discovery on the world to detect the Markov blankets of the moral patients that we want the AI system to respect. Also see: Discovering Agents.
A few months ago I asked a member of the Causal Incentives group (the authors of the links above) if causal discovery could be used empirically to discover agents in the real world and I remember a vibe of “yeah possibly”. (Though also this didn’t seem like their goal.)
[Critch also splits the boundary into two components, “active” (~actions) and “passive” (~perceptions). A more thorough version of this post would have split the “B” in the diagrams above into these components, too, but I didn’t think it was necessary to do here.]
It’s not clear exactly how to specify the environment a priori, but it should end up roughly being the complement of the human with respect to the rest of the universe.
It may also be preferable to avoid exfiltration across human Markov blankets (which would be direct arrows from H→E), but it’s not clear to me that that can be reasonably prevented by anyone except the human. It would be nice, though. Note that exfiltration is like privacy. Related: 1, 2.
Formalizing «Boundaries» with Markov blankets
How could «boundaries» be formally specified? Markov blankets seem to be one fitting abstraction.
[The post is largely a conceptual distillation of Andrew Critch’s Part 3a: Defining boundaries as directed Markov blankets.]
Explaining Markov blankets
By the end of this section, I want you to understand the following diagram (Pearlian causal diagram):
Also: I will assume a basic familiarity with Markov chains in this post.
First, I want you to imagine a simple Markov chain that represents the fact that a human influences itself over time:
Second, I want you to imagine a Markov chain that represents the fact that the environment[1] influences itself over time:
Okay. Now, notice that in between the human and its environment there’s some kind of boundary. For example, their skin (a physical boundary) and their interpretation/cognition (an informational boundary). If this were not a human but instead a bacterium, then the boundary I mean would (mostly) be the bacterium’s cell membrane.
Third, imagine a Markov chain that represents that boundary influencing itself over time:
Okay, so we have these three Markov chains running in parallel:
But they also influence each other, so let’s build that into the model, too.
How should they be connected?
Well, how does the environment affect a human?
Ok, so I want you to notice that when an environment affects a human, it doesn’t influence them directly, but instead it influences their skin or their cognition (their boundary), and then their boundary influences them.
For example, I shine light in your eyes (part of your environment), it activates your eyes (part of your boundary), and your eyes send information to your brain (part of your insides).
Which is to say, this is what does not happen:
(This is called “infiltration”.) The environment does not directly influence the human.
Instead, the environment influences the boundary which influences them, which looks like this:
The environment influences your skin and your senses, and your skin and senses influence you.
Okay, now let’s do the other direction. How does a human influence their environment?
It’s not that a human controls the environment directly…
(This is called “exfiltration”; this does not happen.)
…but that the human takes actions (via their boundary), and then their actions affect the environment:
For example, it’s not that the environment “reads your mind” directly, but rather that you express yourself and then others read your words and actions.
Okay.
Now, putting together both of directions of human-influences-environment and environment-influences-human, we get this:
Also, I want you to notice which arrows that are conspicuously missing from the diagram above:
Please compare this diagram to the one before it.
So that’s how we can model the approximate causal separation between an agent and the environment.
Defining boundary violations
Finally, we can define boundary violations as exactly this:
Boundary violations are infiltration across human Markov blankets.[2]
Leakage and leakage minimization
Of course, in reality, there’s actually leakage and the ‘real’ Markov blanket between any human and their environment does include the arrows I said were missing.
For example, viruses in the air might influence me in ways I can’t control or prevent. Similarly, my brain waves are emanating out into the fields around me.
However, humans are agents that are actively minimizing that leakage. For example:
You don’t want to be directly controlled by your environment. (You don’t want infiltration.)
Instead, you want to take in information and then be able to decide what to do with it. You want to have a say about how things affect you.
A bacterium wants things to go through its gates and ion channels, and not just pierce its membrane.
If I could cheaply improve my boundary’s immunity to viruses, I would.
Humans are embedded agents (of course). However, humans are also actively seeking to de-embed themselves from the environment and make themselves independent from the environment.
You don’t want the way that you’re influencing the world to be by people mind-reading you. (Exfiltration[3])
Instead, you want to be affecting the world intentionally, through your actions.
If you believed that someone might be able to predict you well or get close to predicting you well and you don’t want that, you would probably take evasive maneuvers.
Even if this works, how would the AI system detect the Markov blankets?
Perhaps by doing causal discovery on the world to detect the Markov blankets of the moral patients that we want the AI system to respect. Also see: Discovering Agents.
A few months ago I asked a member of the Causal Incentives group (the authors of the links above) if causal discovery could be used empirically to discover agents in the real world and I remember a vibe of “yeah possibly”. (Though also this didn’t seem like their goal.)
Credits and math
This section was largely based on Andrew Critch’s «Boundaries», Part 3a: Defining boundaries as directed Markov blankets — LessWrong. That post has more technical details, and defines infiltration more rigorously in terms of mutual information. E.g.:
[Critch also splits the boundary into two components, “active” (~actions) and “passive” (~perceptions). A more thorough version of this post would have split the “B” in the diagrams above into these components, too, but I didn’t think it was necessary to do here.]
Subscribe to the boundaries / membranes tag if you’d like to stay updated on this agenda.
It’s not clear exactly how to specify the environment a priori, but it should end up roughly being the complement of the human with respect to the rest of the universe.
It may also be preferable to avoid exfiltration across human Markov blankets (which would be direct arrows from H→E), but it’s not clear to me that that can be reasonably prevented by anyone except the human. It would be nice, though. Note that exfiltration is like privacy. Related: 1, 2.
exfiltration, i.e.: privacy and the absence of mind-reading. Related section: “Maintaining Boundaries is about Maintaining Free Will and Privacy” by Scott Garrabrant.