EDIT: I no longer think this setup is viable, for reasons that connect to why I think Critch’s operationalization is incomplete and why boundaries should ultimately be grounded in Pearlian Causality and interventions. Check update.
I believe there’s nothing much in the way of actually implementing an approximation of Critch’s boundaries[1] using deep learning.
Recall, Critch’s boundaries are:
Given a world (markovian stochastic process) Wt, map its values W (vector) bijectively using f into ‘features’ that can be split into four vectors each representing a boundary-possessing system’s Viscera, Active Boundary, Passive Boundary, and Environment.
Then, we characterize boundary-ness (i.e. minimal information flow across features unmediated by a boundary) using two mutual information criterion each representing infiltration and exfiltration of information.
And a policy of the boundary-posessing system (under the ‘stance’ of viewing the world implied by f) can be viewed as a stochastic map (that has no infiltration/exfiltration by definition) that best approximates the true Wt dynamics.
The interpretation here (under low exfiltration and infiltration) is that f can be viewed as a policy taken by the system in order to perpetuate its boundary-ness into the future and continue being well-described as a boundary-posessing system.
All of this seems easily implementable using very basic techniques from deep learning!
Bijective feature map are implemented using two NN maps each way, with an autoencoder loss.
Mutual information is approximated with standard variational approximations. Optimize f to minimize it.
(the interpretation here being—we’re optimizing our ‘stance’ towards the world in a way that best views the world as a boundary-possessing system)
After you train your ‘stance’ using the above setup, learn the policy using an NN with standard SGD, with fixed f.
A very basic experiment would look something like:
Test the above setup on two cellular automata (e.g., GoL, Lenia, etc) systems, one containing just random ash, and the other some boundary-like structure like noise-resistant glider structures found via optimization (there are a lot of such examples in the Lenia literature).[2]
Then (1) check if the infiltration/exfiltration values are lower for the latter system, and (2) do some interp to see if the V/A/P/E features or the learned policy NN have any interesting structures.
I’m not sure if I’d be working on this any time soon, but posting the idea here just in case people have feedback.
I think research on boundaries—both conceptual work and developing practical algorithms for approximating them & schemes involving them—are quite important for alignment for reasons discussed earlier in my shortform.
Ultimately we want our setup to detect boundaries that aren’t just physically contiguous chunks of matter, like informational boundaries, so we want to make sure our algorithm isn’t just always exploiting basic locality heuristics.
I can’t think of a good toy testbed (ideas appreciated!), but one easy thing to try is to just destroy all locality by mapping the automata lattice (which we were feeding as input) with the output of a complicated fixed bijective map over it, so that our system will have to learn locality if it turns out to be a useful notion in its attempt at viewing the system as a boundary.
I don’t see much hope in capturing a technical definition that doesn’t fall out of some sort of game theory, and even the latter won’t directly work for boundaries as representation of respect for autonomy helpful for alignment (as it needs to apply to radically weaker parties).
Boundaries seem more like a landmark feature of human-like preferences that serves as a test case for whether toy models of preference are reasonable. If a moral theory insists on tiling the universe with something, it fails the test. Imperative to merge all agents fails the test unless the agents end up essentially reconstructed. And with computronium, we’d need to look at the shape of things it’s computing rather than at the computing substrate.
I think it’s plausible that the general concept of boundaries can possibly be characterized somewhat independently of preferences, but at the same time have boundary-preservation be a quality that agents mostly satisfy (discussion here. very unsure about this). I see Critch’s definition as a first iteration of an operationalization for boundaries in the general, somewhat-preference-independent sense.
But I do agree that ultimately all of this should tie back to game theory. I find Discovering Agents most promising in this regards, though there are still a lot of problems—some of which I suspect might be easier to solve if we treat systems-with-high-boundaryness as a sort of primitive for the kind-of-thing that we can associate agency and preferences with in the first place.
There are two different points here, boundaries as a formulation of agency, and boundaries as a major component of human values (which might be somewhat sufficient by itself for some alignment purposes). In the first role, boundaries are an acausal norm that many agents end up adopting, so that it’s natural to consider a notion of agency that implies boundaries (after the agent had an opportunity for sufficient reflection). But this use of boundaries is probably open to arbitrary ruthlessness, it’s not respect for autonomy of someone the powers that be wouldn’t sufficiently care about. Instead, boundaries would be a convenient primitive for describing interactions with other live players, a Schelling concept shared by agents in this sense.
The second role as an aspect of values expresses that the agent does care about autonomy of others outside game theoretic considerations, so it only ties back to game theory by similarity, or through the story of formation of such values that involved game theory. A general definition might be useful here, if pointing AIs at it could instill it into their values. But technical definitions don’t seem to work when you consider what happens if you try to protect humanity’s autonomy using a boundary according to such definitions. It’s like machine translation, the problem could well be well-defined, but impossible to formally specify, other than by gesturing at a learning process.
I no longer think the setup above is viable, for reasons that connect to why I think Critch’s operationalization is incomplete and why boundaries should ultimately be grounded in Pearlian Causality and interventions.
(Note: I am thinking as I’m writing, so this might be a bit rambly.)
The world-trajectory distribution is ambiguous.
Intuition: Why does a robust glider in Lenia intuitively feel like a system possessing boundary? Well, I imagine various situations that happen in the world (like bullets) and this pattern mostly stays stable in face of them.
Now, notice that the measure of infiltration/exfiltration depends on ϕ∈Δ(Wω), a distribution over world history. Infil(ϕ):=Aggt≥0MutWω∼ϕ((Vt+1,At+1);Et∣(Vt,At,Pt))
So, for the above measure to capture my intuition, the approximate Markov condition (operationalized by low infil & exfil) must consider the world state Wω that contains the Lenia pattern with it avoiding bullets.
Remember, W is the raw world state, no coarse graining. So ϕ is the distribution over the raw world trajectory. It already captures all the “potentially occurring trajectories under which the system may take boundary-preserving-action.” Since everything is observed, our distribution already encodes all of “Nature’s Intervention.” So in some sense Critch’s definition is already causal (in a very trivial sense), by the virtue of requiring a distribution over the raw world trajectory, despite mentioning no Pearlian Causality.
Issue: Choice of ϕ
Maybe there is some canonical true ϕ for our physical world that minds can intersubjectively arrive at, so there’s no ambiguity.
But when I imagine trying to implement this scheme on Lenia, there’s immediately an ambiguity as to which distribution (representing my epistemic state on which raw world trajectories that will “actually happen”) we should choose:
Perhaps a very simple distribution: assigning uniform probability over world trajectories where the world contains nothing but the glider moving in a random direction with some initial point offset.
I suspect many stances other the one factorizing the world into gliders would have low infil/exfil, because the world is so simple. This is the case of “accidental boundary-ness.”
Perhaps something more complicated: various trajectories where e.g., the Lenia patterns encounters bullets, evolves with various other patterns, etc.
This I think rules out “accidental boundary-ness.”
I think the latter works. But now there’s a subjective choice of the distribution, and what are the set of possible/realistic “Nature’s Intervention”—all the situations that can ever be encountered by the system under which it has boundary-like behaviors—that we want to implicitly encode into our observational distribution. I don’t think it’s natural for ϕ assign much probability to a trajectory whose initial conditions are set in a very precise way such that everything decays into noise. But this feels quite subjective.
Hints toward a solution: Causality
I think the discussion above hints at a very crucial insight:
ϕ must arise as a consequence of the stable mechanisms in the world.
Suppose the world of Lenia contains various stable mechanisms like a gun that shoots bullets at random directions, scarce food sources, etc.
We want ϕ to describe distributions that the boundary system will “actually” experience in some sense. I want the “Lenia pattern dodges bullet” world trajectory to be considered, because there is a plausible mechanism in the world that can cause such trajectories to exist. For similar reasons, I think the empty world distributions are impoverished, and a distribution containing trajectories where the entire world decays into noise is bad because no mechanism can implement it.
Thus, unless you have a canonical choice of ϕ, a better starting point would be to consider the abstract causal model that encodes the stable mechanisms in the world, and using Discovering Agents-style interventional algorithms that operationalize the notion “boundaries causally separate environment and viscera.”
Well, because of everything mentioned above on how the causal model informs us on which trajectories are realistic, especially in the absence of a canonical ϕ. It’s also far more efficient, because the knowledge of the mechanism informs the algorithm of the precise interventions to query the world for, instead of having to implicitly bake them in ϕ.
There are still a lot more questions, but I think this is a pretty clarifying answer as to how Critch’s boundaries are limiting and why DA-style causal methods will be important.
I think the update makes sense in general, isn’t there however some way mutual information and causality is linked? Maybe it isn’t strong enough for there to be an easy extrapolation from one to the other.
Also I just wanted to drop this to see if you find it interesting, kind of on this topic? Im not sure its fully defined in a causality based way but it is about structure preservation.
Yeah I’d like to know if there’s a unified way of thinking about information theoretic quantities and causal quantities, though a quick literature search doesn’t show up anything interesting. My guess is that we’d want separate boundary metrics for informational separation and causal separation.
EDIT: I no longer think this setup is viable, for reasons that connect to why I think Critch’s operationalization is incomplete and why boundaries should ultimately be grounded in Pearlian Causality and interventions. Check update.
I believe there’s nothing much in the way of actually implementing an approximation of Critch’s boundaries[1] using deep learning.
Recall, Critch’s boundaries are:
Given a world (markovian stochastic process) Wt, map its values W (vector) bijectively using f into ‘features’ that can be split into four vectors each representing a boundary-possessing system’s Viscera, Active Boundary, Passive Boundary, and Environment.
Then, we characterize boundary-ness (i.e. minimal information flow across features unmediated by a boundary) using two mutual information criterion each representing infiltration and exfiltration of information.
And a policy of the boundary-posessing system (under the ‘stance’ of viewing the world implied by f) can be viewed as a stochastic map (that has no infiltration/exfiltration by definition) that best approximates the true Wt dynamics.
The interpretation here (under low exfiltration and infiltration) is that f can be viewed as a policy taken by the system in order to perpetuate its boundary-ness into the future and continue being well-described as a boundary-posessing system.
All of this seems easily implementable using very basic techniques from deep learning!
Bijective feature map are implemented using two NN maps each way, with an autoencoder loss.
Mutual information is approximated with standard variational approximations. Optimize f to minimize it.
(the interpretation here being—we’re optimizing our ‘stance’ towards the world in a way that best views the world as a boundary-possessing system)
After you train your ‘stance’ using the above setup, learn the policy using an NN with standard SGD, with fixed f.
A very basic experiment would look something like:
Test the above setup on two cellular automata (e.g., GoL, Lenia, etc) systems, one containing just random ash, and the other some boundary-like structure like noise-resistant glider structures found via optimization (there are a lot of such examples in the Lenia literature).[2]
Then (1) check if the infiltration/exfiltration values are lower for the latter system, and (2) do some interp to see if the V/A/P/E features or the learned policy NN have any interesting structures.
I’m not sure if I’d be working on this any time soon, but posting the idea here just in case people have feedback.
I think research on boundaries—both conceptual work and developing practical algorithms for approximating them & schemes involving them—are quite important for alignment for reasons discussed earlier in my shortform.
Ultimately we want our setup to detect boundaries that aren’t just physically contiguous chunks of matter, like informational boundaries, so we want to make sure our algorithm isn’t just always exploiting basic locality heuristics.
I can’t think of a good toy testbed (ideas appreciated!), but one easy thing to try is to just destroy all locality by mapping the automata lattice (which we were feeding as input) with the output of a complicated fixed bijective map over it, so that our system will have to learn locality if it turns out to be a useful notion in its attempt at viewing the system as a boundary.
I don’t see much hope in capturing a technical definition that doesn’t fall out of some sort of game theory, and even the latter won’t directly work for boundaries as representation of respect for autonomy helpful for alignment (as it needs to apply to radically weaker parties).
Boundaries seem more like a landmark feature of human-like preferences that serves as a test case for whether toy models of preference are reasonable. If a moral theory insists on tiling the universe with something, it fails the test. Imperative to merge all agents fails the test unless the agents end up essentially reconstructed. And with computronium, we’d need to look at the shape of things it’s computing rather than at the computing substrate.
I think it’s plausible that the general concept of boundaries can possibly be characterized somewhat independently of preferences, but at the same time have boundary-preservation be a quality that agents mostly satisfy (discussion here. very unsure about this). I see Critch’s definition as a first iteration of an operationalization for boundaries in the general, somewhat-preference-independent sense.
But I do agree that ultimately all of this should tie back to game theory. I find Discovering Agents most promising in this regards, though there are still a lot of problems—some of which I suspect might be easier to solve if we treat systems-with-high-boundaryness as a sort of primitive for the kind-of-thing that we can associate agency and preferences with in the first place.
There are two different points here, boundaries as a formulation of agency, and boundaries as a major component of human values (which might be somewhat sufficient by itself for some alignment purposes). In the first role, boundaries are an acausal norm that many agents end up adopting, so that it’s natural to consider a notion of agency that implies boundaries (after the agent had an opportunity for sufficient reflection). But this use of boundaries is probably open to arbitrary ruthlessness, it’s not respect for autonomy of someone the powers that be wouldn’t sufficiently care about. Instead, boundaries would be a convenient primitive for describing interactions with other live players, a Schelling concept shared by agents in this sense.
The second role as an aspect of values expresses that the agent does care about autonomy of others outside game theoretic considerations, so it only ties back to game theory by similarity, or through the story of formation of such values that involved game theory. A general definition might be useful here, if pointing AIs at it could instill it into their values. But technical definitions don’t seem to work when you consider what happens if you try to protect humanity’s autonomy using a boundary according to such definitions. It’s like machine translation, the problem could well be well-defined, but impossible to formally specify, other than by gesturing at a learning process.
I no longer think the setup above is viable, for reasons that connect to why I think Critch’s operationalization is incomplete and why boundaries should ultimately be grounded in Pearlian Causality and interventions.
(Note: I am thinking as I’m writing, so this might be a bit rambly.)
The world-trajectory distribution is ambiguous.
Intuition: Why does a robust glider in Lenia intuitively feel like a system possessing boundary? Well, I imagine various situations that happen in the world (like bullets) and this pattern mostly stays stable in face of them.
Now, notice that the measure of infiltration/exfiltration depends on ϕ∈Δ(Wω), a distribution over world history. Infil(ϕ):=Aggt≥0MutWω∼ϕ((Vt+1,At+1);Et∣(Vt,At,Pt))
So, for the above measure to capture my intuition, the approximate Markov condition (operationalized by low infil & exfil) must consider the world state Wω that contains the Lenia pattern with it avoiding bullets.
Remember, W is the raw world state, no coarse graining. So ϕ is the distribution over the raw world trajectory. It already captures all the “potentially occurring trajectories under which the system may take boundary-preserving-action.” Since everything is observed, our distribution already encodes all of “Nature’s Intervention.” So in some sense Critch’s definition is already causal (in a very trivial sense), by the virtue of requiring a distribution over the raw world trajectory, despite mentioning no Pearlian Causality.
Issue: Choice of ϕ
Maybe there is some canonical true ϕ for our physical world that minds can intersubjectively arrive at, so there’s no ambiguity.
But when I imagine trying to implement this scheme on Lenia, there’s immediately an ambiguity as to which distribution (representing my epistemic state on which raw world trajectories that will “actually happen”) we should choose:
Perhaps a very simple distribution: assigning uniform probability over world trajectories where the world contains nothing but the glider moving in a random direction with some initial point offset.
I suspect many stances other the one factorizing the world into gliders would have low infil/exfil, because the world is so simple. This is the case of “accidental boundary-ness.”
Perhaps something more complicated: various trajectories where e.g., the Lenia patterns encounters bullets, evolves with various other patterns, etc.
This I think rules out “accidental boundary-ness.”
I think the latter works. But now there’s a subjective choice of the distribution, and what are the set of possible/realistic “Nature’s Intervention”—all the situations that can ever be encountered by the system under which it has boundary-like behaviors—that we want to implicitly encode into our observational distribution. I don’t think it’s natural for ϕ assign much probability to a trajectory whose initial conditions are set in a very precise way such that everything decays into noise. But this feels quite subjective.
Hints toward a solution: Causality
I think the discussion above hints at a very crucial insight:
ϕ must arise as a consequence of the stable mechanisms in the world.
Suppose the world of Lenia contains various stable mechanisms like a gun that shoots bullets at random directions, scarce food sources, etc.
We want ϕ to describe distributions that the boundary system will “actually” experience in some sense. I want the “Lenia pattern dodges bullet” world trajectory to be considered, because there is a plausible mechanism in the world that can cause such trajectories to exist. For similar reasons, I think the empty world distributions are impoverished, and a distribution containing trajectories where the entire world decays into noise is bad because no mechanism can implement it.
Thus, unless you have a canonical choice of ϕ, a better starting point would be to consider the abstract causal model that encodes the stable mechanisms in the world, and using Discovering Agents-style interventional algorithms that operationalize the notion “boundaries causally separate environment and viscera.”
Well, because of everything mentioned above on how the causal model informs us on which trajectories are realistic, especially in the absence of a canonical ϕ. It’s also far more efficient, because the knowledge of the mechanism informs the algorithm of the precise interventions to query the world for, instead of having to implicitly bake them in ϕ.
There are still a lot more questions, but I think this is a pretty clarifying answer as to how Critch’s boundaries are limiting and why DA-style causal methods will be important.
I think the update makes sense in general, isn’t there however some way mutual information and causality is linked? Maybe it isn’t strong enough for there to be an easy extrapolation from one to the other.
Also I just wanted to drop this to see if you find it interesting, kind of on this topic? Im not sure its fully defined in a causality based way but it is about structure preservation.
https://youtu.be/1tT0pFAE36c?si=yv6mbswVpMiywQx9
Active Inference people also have the boundary problem as core in their work so they have some interesting stuff on it.
Yeah I’d like to know if there’s a unified way of thinking about information theoretic quantities and causal quantities, though a quick literature search doesn’t show up anything interesting. My guess is that we’d want separate boundary metrics for informational separation and causal separation.