If anyone is interested in joining a learning community around the ideas of active inference, the mission of https://www.activeinference.org/ is to educate the community around these topics. There’s a study group around the 2022 active inference textbook by Parr, Friston, and Pezzulo. I’m in the 5th cohort and it’s been very useful for me.
winstonne
In theory, if humans and AIs aligned on their generative models (i.e., if there is methodological, scientific, and fact alignment), then goal alignment, even if sensible to talk about, will take care of itself: indeed, starting from the same “factual” beliefs, and using the same principles of epistemology, rationality, ethics, and science, people and AIs should in principle arrive at the same predictions and plans.
What about zero sum games? If you took took an agent, cloned it, then put both copies into a shared environment with only enough resources to support one agent, they would be forced to compete with one another. I guess they both have the same “goals” per se, but they are not aligned even though they are identical.
> Markov blankets, to the best of my knowledge, have never been derived, either precisely or approximately, for physical systems
This paper does just that. It introduces a ‘blanket index’ by which any state space can be analyzed to see whether a markov blanket assumption is suitable or not. Quoting MJD Ramstead’s summary of the paper’s results with respect to the markov blanket assumption:
We now know that, in the limit of increasing dimensionality, essentially all systems (both linear and nonlinear) will have Markov blankets, in the appropriate sense. That is, as both linear and nonlinear systems become increasingly high-dimensional, the probability of finding a Markov blanket between subsets approaches 1.
The assumption I find most problematic is that the environment is presumed to be at steady state
Note the assumption is that the environment is at a nonequilibrium steady state, not a heat-death-of-the-universe steady state. My reading of this is that it is an explicit assumption that probabilistic inference is possible.
Ah ok, I think I’m following you. To me, freedom describes a kind of bubble around a certain physical or abstract dimension, who’s center is at another agent. It’s main use is to limit computational complexity when sharing an environment with other agents. If I have a set of freedom values, I don’t have to infer the values of the agent so long as I don’t enter their freedom bubbles. In the traffic example, how the neighborhood is constructed should be irrelevant to McTraffic, all it needs to know is a) there are other agents present in the neighborhood already, and b) it wants to change the nature of the neighborhood, which will enter the other agent’s freedom bubbles. Therefore it needs to to negotiate with the inhabitants (so yes, at this step there’s an inference via dialogue going on).
I’m not following your final point. Regardless of determinism, the “state space” I can explore as an embedded agent is constrained by the properties of the local environment. If I value things like a walkable neighborhood, but I’m stuck in a pile of rubble, that’s going to constrain my available state space and accordingly it’s going to constrain my ability to have any rewarding outcome. McTraffic, by not allotting freedoms to me when executing their transportation redesign impeded on my freedom (which was mostly afforded to me through my and my neighbors property rights).
Freedoms (properly encoded), I believe are the proper framing for creating utility functions/value-systems for critters like our friendly neighborhood traffic agent. Sure, the traffic agent values transportation efficiency, but since it also values other agent’s freedom to property rights, they will limit their execution of their traffic efficiency preferences within a multi-agent shared environment to minimize the restriction to property rights. To me, this seems simpler, and less error prone than any approach that tries to infer my values (or human preferences more generally) and act according to that inference.
Freedoms assume awareness of external (embedded) agency, they are values you afford to other agents. They have a payoff because you are then afforded them back. This helps to ensure agents do not unilaterally bulldoze (literally or figuratively) the “available state space” for other agents to explore and exploit.
Hmm, Looks like I should add an examples section and more background on what I mean related to freedom. What you are describing sounds like a traffic system that values ergodic efficiency of it’s managed network and you are showing a way that a participant can have very non-ergodic results. It sounds like that is more of an engineering problem than what I’m imagining.
Examples off the top of my head of what I mean with respect to loss of freedom resulting from a powerful agent’s value system include things like:
paperclip maximizer terraforming the earth prevents any value-systems other than paperclip maximization from sharing the earth’s environment.
human’s value for cheap foodstuffs results in mono-culture crop fields, which cuts off forest grassland ecosystem’s values, (hiding places, alternating food stuffs which last through the seasons, etc.)
Drug dependent parent changes a child’s environment, preventing freedom for a reliable schedule, security, etc.
Or, riffing off your example: superintelligent traffic controller starts city-planning, bulldozing blocks of car-free neighborhoods because they stood in the way of a 5% city-wide traffic flow improvement
Essentially what I’m trying to describe is that freedoms need to be a value onto themselves that has certain characteristics that are functionally different than the common utility function terminology that revolves around metric maximization (like gradient descent). Freedoms describe boundary conditions within which metric maximization is allowed, but describe steep penalties for surpassing their bounds. Their general mathematical form is a manifold surrounding some state-space, whereas it seems the general form of most utility function talk is finding a minima/maxima of some state space.
Hi! Long time lurker, first time commenter. You have written a great piece here. This is a topic that has fascinated me for a while and I appreciate what you’ve laid out.
I’m wondering if there’s a base assumption on the whole intelligence vs values/beliefs/goals question that needs to be questioned.
This statement points to my question. There’s necessarily a positive correlation between internal complexity and intelligence right?. So, in order for intelligence to increase, internal complexity must also increase. My understanding is that complexity is a characteristic of dynamic and generative phenomena, and not of purely mechanical phenomena. So, what do we have to assume in order to posit a super-intelligent entity exists? It must have maintained its entity-ness over time in order to have increased its intelligence/complexity to its current level.
Has anyone explored what it takes for an agent to complexify? I would presume that for an agent to simultaneously continue existing and complexify it must stay maintain some type of fixpoint/set of autopoietic (self-maintenance, self-propagation) values/beliefs/goals throughout its dynamic evolution. If this were the case, wouldn’t it be true that there must exist a set of values/beliefs/goals that are intrinsic to the agent’s ability to complexify? Therefore there must be another set of values/beliefs/goals that are incompatible with self-complexification. If so, can we not put boundary conditions on what values/beliefs/goals are both necessary as well as incompatible with sufficiently intelligent, self-complexifying agents? After all, if we observe a complex agent, the probability of it arising full-cloth and path-independently is vanishingly small, so it is safe to say that the observed entity has evolved to reach the observed state.
I don’t think my observation is incompatible with your argument, but might place further limits on what relationships we can possibly see between entities of sufficient intelligence and their goals/values/beliefs than the limits you propose.
I think situations like a paperclip maximizer may still occur but they are degenerate cases where an evolutionarily fit entity spawns something that inherits much of its intrinsic complexity but loses its autopoietic fixpoint. Such systems do occur in nature, but to get that system, you must also assume a more-complex (and hopefully more intelligent/adapted) entity exists as well. This other entity would likely place adversarial pressure on the degenerate paperclip maximizer as it threatens its continued existence.
Some relationships/overlaps with your arguments are as follows:
totally agree with the belief/value duality
Naive belief/value factorizations lead to optimization daemons. The optimization daemons observation points to an agent’s inability to maintain autopoiesis over time, implying misalignment of its values/beliefs/goals with its desire to increase its intelligence
Intelligence changes the ontology values are expressed in. I presume that any ontology expressed by an autopoietic embedded agent must maintain concepts of self, otherwise the embedded entity cannot continue to complexify over time, therefore there must be some fix point in ontological evolution that preserves the evolutionary drive of the entity in order for it to continue to advance its intelligence
Anyways, thank you for the essay.