By boundaries, I mean a sustaining/propagating system that informationally/causally insulates its ‘viscera’ from the ‘environment,’ and only allows relatively small amounts of deliberate information flow through certain channels in both directions. Living systems are an example of it (from bacteria to humans). It doesn’t even have to be a physically distinct chunk of spacetime, they can be over more abstract variables like societal norms. Agents are an example of it.
I find them very relevant to alignment especially from the direction of detecting such boundary-possessing/agent-like structures embedded in a large AI system and backing out a sparse relationship between these subsystems, which can then be used to e.g., control the overall dynamic. Check out theseposts for more.
A prototypical deliverable would be an algorithm that can detect such ‘boundaries’ embedded in a dynamical system when given access to some representation of the system, performs observations & experiments and returns a summary data structure of all the ‘boundaries’ embedded in a system and their desires/wants, how they game-theoretically relate to one another (sparse causal relevance graph?), the consequences of interventions performed on them, etc—that’s versatile enough to detect e.g., gliders embedded in Game of Life / Particle Lenia, agents playing Minecraft while only given coarse grained access to the physical state of the world, boundary-like things inside LLMs, etc. (I’m inspired by this)
Why do I find the aforementioned directions relevant to this goal?
Critch’s Boundaries operationalizes boundaries/viscera/environment as functions of the underlying variable that executes policies that continuously prevents information ‘flow’ [1] between disallowed channels, quantified via conditional transfer entropy.
Relatedly, Fernando Rosas’s paper on Causal Blankets operationalize boundaries using a similar but subtly different[2] form of mutual information constraint on the boundaries/viscera/environment variables than that of Critch’s. Importantly, they show that such blankets always exist between two coupled stochastic processes (using a similar style of future morph equivalence relation characterization from compmech, and also a metric they call “synergistic coefficient” that quantifies how boundary-like this thing is.[3]
More on compmech, epsilon transducers generalize epsilon machines to input-output processes. PALO (Perception Action Loops) and Boundaries as two epsilon transducers coupled together?
These directions are interesting, but I find them still unsatisfactory because all of them are purely behavioral accounts of boundaries/agency. One of the hallmarks of agentic behavior (or some boundary behaviors) is adapting ones policy if an intervention changes the environment in a way that the system can observe and adapt to.[4][5]
(is there an interventionist extension of compmech?)
Discovering agents provide a genuine causal, interventionist account of agency and an algorithm to detect them, motivated by the intentional stance. I think the paper is very enlightening from a conceptual perspective, but there are many problems yet to be solved before we can actually implement this. Here’s my take on it.
More fundamentally, (this is more vibes, I’m really out of my depth here) I feel there is something intrinsically limiting with the use of Bayes Nets, especially with the fact that choosing which variables to use in your Bayes Net already encodes a lot of information about the specific factorization structure of the world. I heard good things about finite factored sets and I’m eager to learn more about them.
Not exactly a ‘flow’, because transfer entropy conflates between intrinsic information flow and synergistic information—a ‘flow’ connotes only the intrinsic component, while transfer entropy just measures the overall amount of information that a system couldn’t have obtained on its own. But anyways, transfer entropy seems like a conceptually correct metric to use.
Specifically, Fernando’s paper criticizes blankets of the following form (V for viscera, A and P for active/passive boundaries, E for environment):
Vt→At,Pt→Et
DIP implies I(Vt;At,Pt)≥I(Vt;Et)
This clearly forbids dependencies formed in the past that stays in ‘memory’.
but Critch instead defines boundaries as satisfying the following two criteria:
Vt+1,At+1→Vt,At,Pt→Et (infiltration)
DIP implies I(Vt;At,Pt)≥I(Vt;Et)
Et+1,Pt+1→At,Pt,Et→Vt (exfiltration)
DIP implies I(Vt+1,At+1;At,Pt)≥I(Vt+1,At+1;Et)
and now that the independencies are entangled across different t, there is no longer a clear upper bound on I(Vt;Et), so I don’t think the criticisms apply directly.
My immediate curiosities are on how these two formalisms relate to one another. e.g., Which independency requirements are more conceptually ‘correct’? Can we extend the future-morph construction to construct Boundaries for Critch’s formalism? etc etc
For example, a rock is very goal-directed relative to ‘blocking-a-pipe-that-happens-to-exactly-match-its-size,’ until one performs an intervention on the pipe size to discover that it can’t adapt at all.
Also, interventions are really cheap to run on digital systems (e.g., LLMs, cellular automata, simulated environments)! Limiting oneself to behavioral accounts of agency would miss out on a rich source of cheap information.
I find the intersection of computational mechanics, boundaries/frames/factored-sets, and some works from the causal incentives group—especially discovering agents and robust agents learn causal world model (review) - to be a very interesting theoretical direction.
By boundaries, I mean a sustaining/propagating system that informationally/causally insulates its ‘viscera’ from the ‘environment,’ and only allows relatively small amounts of deliberate information flow through certain channels in both directions. Living systems are an example of it (from bacteria to humans). It doesn’t even have to be a physically distinct chunk of spacetime, they can be over more abstract variables like societal norms. Agents are an example of it.
I find them very relevant to alignment especially from the direction of detecting such boundary-possessing/agent-like structures embedded in a large AI system and backing out a sparse relationship between these subsystems, which can then be used to e.g., control the overall dynamic. Check out these posts for more.
A prototypical deliverable would be an algorithm that can detect such ‘boundaries’ embedded in a dynamical system when given access to some representation of the system, performs observations & experiments and returns a summary data structure of all the ‘boundaries’ embedded in a system and their desires/wants, how they game-theoretically relate to one another (sparse causal relevance graph?), the consequences of interventions performed on them, etc—that’s versatile enough to detect e.g., gliders embedded in Game of Life / Particle Lenia, agents playing Minecraft while only given coarse grained access to the physical state of the world, boundary-like things inside LLMs, etc. (I’m inspired by this)
Why do I find the aforementioned directions relevant to this goal?
Critch’s Boundaries operationalizes boundaries/viscera/environment as functions of the underlying variable that executes policies that continuously prevents information ‘flow’ [1] between disallowed channels, quantified via conditional transfer entropy.
Relatedly, Fernando Rosas’s paper on Causal Blankets operationalize boundaries using a similar but subtly different[2] form of mutual information constraint on the boundaries/viscera/environment variables than that of Critch’s. Importantly, they show that such blankets always exist between two coupled stochastic processes (using a similar style of future morph equivalence relation characterization from compmech, and also a metric they call “synergistic coefficient” that quantifies how boundary-like this thing is.[3]
More on compmech, epsilon transducers generalize epsilon machines to input-output processes. PALO (Perception Action Loops) and Boundaries as two epsilon transducers coupled together?
These directions are interesting, but I find them still unsatisfactory because all of them are purely behavioral accounts of boundaries/agency. One of the hallmarks of agentic behavior (or some boundary behaviors) is adapting ones policy if an intervention changes the environment in a way that the system can observe and adapt to.[4][5]
(is there an interventionist extension of compmech?)
Discovering agents provide a genuine causal, interventionist account of agency and an algorithm to detect them, motivated by the intentional stance. I think the paper is very enlightening from a conceptual perspective, but there are many problems yet to be solved before we can actually implement this. Here’s my take on it.
More fundamentally, (this is more vibes, I’m really out of my depth here) I feel there is something intrinsically limiting with the use of Bayes Nets, especially with the fact that choosing which variables to use in your Bayes Net already encodes a lot of information about the specific factorization structure of the world. I heard good things about finite factored sets and I’m eager to learn more about them.
Not exactly a ‘flow’, because transfer entropy conflates between intrinsic information flow and synergistic information—a ‘flow’ connotes only the intrinsic component, while transfer entropy just measures the overall amount of information that a system couldn’t have obtained on its own. But anyways, transfer entropy seems like a conceptually correct metric to use.
Specifically, Fernando’s paper criticizes blankets of the following form (V for viscera, A and P for active/passive boundaries, E for environment):
Vt→At,Pt→Et
DIP implies I(Vt;At,Pt)≥I(Vt;Et)
This clearly forbids dependencies formed in the past that stays in ‘memory’.
but Critch instead defines boundaries as satisfying the following two criteria:
Vt+1,At+1→Vt,At,Pt→Et (infiltration)
DIP implies I(Vt;At,Pt)≥I(Vt;Et)
Et+1,Pt+1→At,Pt,Et→Vt (exfiltration)
DIP implies I(Vt+1,At+1;At,Pt)≥I(Vt+1,At+1;Et)
and now that the independencies are entangled across different t, there is no longer a clear upper bound on I(Vt;Et), so I don’t think the criticisms apply directly.
My immediate curiosities are on how these two formalisms relate to one another. e.g., Which independency requirements are more conceptually ‘correct’? Can we extend the future-morph construction to construct Boundaries for Critch’s formalism? etc etc
For example, a rock is very goal-directed relative to ‘blocking-a-pipe-that-happens-to-exactly-match-its-size,’ until one performs an intervention on the pipe size to discover that it can’t adapt at all.
Also, interventions are really cheap to run on digital systems (e.g., LLMs, cellular automata, simulated environments)! Limiting oneself to behavioral accounts of agency would miss out on a rich source of cheap information.