Why Subagents?
The justification for modelling real-world systems as “agents”—i.e. choosing actions to maximize some utility function—usually rests on various coherence theorems. They say things like “either the system’s behavior maximizes some utility function, or it is throwing away resources” or “either the system’s behavior maximizes some utility function, or it can be exploited” or things like that. Different theorems use slightly different assumptions and prove slightly different things, e.g. deterministic vs probabilistic utility function, unique vs non-unique utility function, whether the agent can ignore a possible action, etc.
One theme in these theorems is how they handle “incomplete preferences”: situations where an agent does not prefer one world-state over another. For instance, imagine an agent which prefers pepperoni over mushroom pizza when it has pepperoni, but mushroom over pepperoni when it has mushroom; it’s simply never willing to trade in either direction. There’s nothing inherently “wrong” with this; the agent is not necessarily executing a dominated strategy, cannot necessarily be exploited, or any of the other bad things we associate with inconsistent preferences. But the preferences can’t be described by a utility function over pizza toppings.
In this post, we’ll see that these kinds of preferences are very naturally described using subagents. In particular, when preferences are allowed to be path-dependent, subagents are important for representing consistent preferences. This gives a theoretical grounding for multi-agent models of human cognition.
Preference Representation and Weak Utility
Let’s expand our pizza example. We’ll consider an agent who:
Prefers pepperoni, mushroom, or both over plain cheese pizza
Prefers both over pepperoni or mushroom alone
Does not have a stable preference between mushroom and pepperoni—they prefer whichever they currently have
We can represent this using a directed graph:
The arrows show preference: our agent prefers B over A if (and only if) there is a directed path from A to B along the arrows. There is no path from pepperoni to mushroom or from mushroom to pepperoni, so the agent has no preference between them. In this case, we’re interpreting “no preference” as “agent prefers to keep whatever they have already”. Note that this is NOT the same as “the agent is indifferent”, in which case the agent is willing to switch back and forth between the two options as long as the switch doesn’t cost anything.
Key point: there is no cycle in this graph. If the agent’s preferences are cyclic, that’s when they provably throw away resources, paying to go in circles. As long as the preferences are acyclic, we call them “consistent”.
Now, at this point we can still define a “weak” utility function by ignoring the “missing” preference between pepperoni and mushroom. Here’s the idea: a normal utility function says “the agent always prefers the option with higher utility”. A weak utility function says: “if the agent has a preference, then they always prefer the option with higher utility”. The missing preference means we can’t build a normal utility function, but we can still build a weak utility function. Here’s how: since our graph has no cycles, we can always order the nodes so that the arrows only go forward along the sorted nodes—a technique called topological sorting. Each node’s position in the topological sort order is its utility. A small tweak to this method also handles indifference.
(Note: I’m using the term “weak utility” here because it seems natural; I don’t know of any standard term for this in the literature. Most people don’t distinguish between these two interpretations of utility.)
When preferences are incomplete, there are multiple possible weak utility functions. For instance, in our example, the topological sort order shown above gives pepperoni utility 1 and mushroom utility 2. But we could just as easily swap them!
Preference By Committee
The problem with the weak utility approach is that it treats the preference between pepperoni and mushroom as unknown—depending on which possible utility we pick, it could go either way. It’s pretending that there’s some hidden preference there which we simply don’t know. But there are real systems where the preference is not merely unknown, but a real preference to stay in the current state.
For example, maybe our pizza-agent is actually a committee which must unanimously agree to any proposed change. One member prefers pepperoni to no pepperoni, regardless of mushrooms; the other prefers mushrooms to no mushrooms, regardless of pepperoni. This committee is not exploitable and does not throw away resources, nor does it have any hidden preference between pepperoni and mushrooms. Viewed as a black box, its “true” preference between pepperoni and mushrooms is to keep whichever it currently has.
In fact, it turns out that we can represent any consistent preferences by a committee requiring unanimous agreement.
The key idea here is called order dimension. We want to take our directed acyclic graph of preferences, and stick it into a multidimensional space so that there is an arrow from A to B if-and-only-if B is higher along all dimensions. Each dimension represents the utility of one subagent on the committee; that subagent approves a change only if the change does not decrease the subagent’s utility. In order for the whole committee to approve a change, the trade must increase (or leave unchanged) the utilities of all subagents. The minimum number of agents required to make this work—the minimum number of dimensions required—is the order dimension of the graph.
For instance, our pizza example has order dimension 2. We can draw it in a 2-dimensional space like this:
Note that, if there are infinitely many possibilities, then the order dimension can be infinite—we may need infinitely many agents to represent some preferences. But as long as the possibilities are finite, the order dimension will be as well.
Path-Dependence
So far, we’ve interpreted “missing” preferences as “agent prefers to stay in current state”. One important reason for that interpretation is that it’s exactly what we need in order to handle path-dependent preferences.
In practice, path-dependent preferences mostly matter for systems with “hidden state”: internal variables which can change in response to the system’s choices. A great example of this is financial markets: they’re the ur-example of efficiency and inexploitability, yet it turns out that a market does not have a utility function in general (economists call this “nonexistence of a representative agent”). The reason is that the distribution of wealth across the market’s agents functions as an internal hidden variable. Depending on what path the market follows, different internal agents end up with different amounts of wealth, and the market as a whole will hold different portfolios as a result—even if the externally-visible variables, i.e. prices, end up the same.
Most path-dependence results from some hidden state directly, but even if we don’t know the hidden state, we can always add hidden state in order to model path-dependence. Whenever future preferences differ based on how the system reached the current state, we just split the state into two states—one for each possibility. Then we repeat, until we have a full set of states with path-independent preferences between them. These new states are “full” states of the system; from outside, some of them look the same.
An example: suppose I prefer New York to Boston if I just came from DC, but Boston to New York if I just came from Philadelphia.
We can represent that with hidden state:
We now have two separate hidden internal nodes, which both correspond to the same externally-visible state “New York”.
Now the key piece: there is no way to get to the “New York (from Philly)” node directly from the “New York (from DC)” node. The agent does not, and cannot, have a preference between these two nodes. Analogously, a market cannot have a preference between two different wealth distributions—the subagents who comprise a market will never spontaneously decide to redistribute their wealth amongst themselves. They always “prefer” (or “decide”) to stay in whatever state they’re currently in.
This is why we need to understand incomplete preferences in order to handle path-dependent preferences: hidden state creates situations where the agent “prefers” to stay in whatever state they’re in.
Now we can easily model the system using subagents exactly as we did for incomplete preferences. We have a directed preference graph between full states (including hidden state), it needs to be acyclic to avoid throwing away resources, so we can find a set of subagents to represent the preferences. In the case of a market, this is just the subagents which comprise the market: they’ll take a trade if it does not decrease the utility of any subagent. (Note, however, that the same externally-visible trade can correspond to multiple possible internal state changes; the subagents will take the trade if any of the possible internal state changes are non-utility-decreasing for all of them. For a market, this means they can trade amongst themselves in response to the external trade in order to make everyone happy.)
Applications & Speculations
We’ve just argued that a system with consistent preferences can be modelled as a committee of utility-maximizing agents. How does this change our interpretation and predictions of the world?
First and foremost: the subagents argument is a generalization of the standard acyclic preferences argument. Anytime we might want to use the acyclic preferences argument, but there’s no reason for the system to be path-independent, we can apply the subagents argument instead. In practice, we usually expect systems to be efficient/inexploitable because of some selection pressure (evolution, market competition, etc) - and that selection pressure usually doesn’t care about path dependence in and of itself.
Main takeaway: pretty much anywhere we’d use an agent with a utility function to model something, we can apply the subagents argument and use a committee of agents with utility functions instead. In particular, this is a good replacement for “weak” utility functions.
Humans are a particularly interesting example. We’d normally use the acyclic preferences argument (among other arguments) to argue that humans approximate utility-maximizers in most situations. But there’s no particular reason to assume path-independence; indeed, human behavior looks highly path-dependent. So, apply the subagents argument. Hypothesis: human behavior approximates the choices of a committee of utility-maximizing agents in most situations.
Sound familiar? The subagents argument offers a theoretical basis for the idea that humans have lots of internal subagents, with competing wants and needs, all constantly negotiating with each other to decide on externally-visible behavior.
In principle, we could test this hypothesis more rigorously. Lots of people think of AI “learning what humans want” by asking questions or offering choices or running simulations. Personally, I picture an AI taking in a scan of a full human connectome, then directly calculating the embedded preferences. Someday, this will be possible. When the AI solves those equations, do we expect it to find a single generic optimizer embedded in the system, approximately optimizing some “utility”? Or do we expect to find a bunch of separate generic optimizers, approximately optimizing several different “utilities”, and negotiating with each other? Probably neither picture is complete yet, but I’d bet the second is much closer to reality.
Conclusion
Let’s recap:
The acyclic preferences argument is the easiest entry point for efficiency/inexploitability-implies-utility-maximization theorems, but it doesn’t handle lots of important things, including path dependence.
Markets, for example, are efficient/inexploitable but can’t be represented by a utility function. They have hidden internal state—the distribution of wealth over agents—which makes their preferences path-dependent.
The subagents argument says that any system with deterministic, efficient/inexploitable preferences can be represented by a committee of utility-maximizing agents—even if the system has path-dependent or incomplete preferences.
That means we can substitute committees in many places where we currently use utilities. For instance, it offers a theoretical foundation for the idea that human behavior is described by many negotiating subagents.
One big piece which we haven’t touched at all is uncertainty. An obvious generalization of the subagents argument is that, once we add uncertainty (and a notion of efficiency/inexploitability which accounts for it), an efficient/inexploitable path-dependent system can be represented by a committee of Bayesian utility maximizers. I haven’t even started to tackle that conjecture yet; it’s a wide-open problem.
- (My understanding of) What Everyone in Technical Alignment is Doing and Why by 29 Aug 2022 1:23 UTC; 413 points) (
- Study Guide by 6 Nov 2021 1:23 UTC; 288 points) (
- The Plan by 10 Dec 2021 23:41 UTC; 254 points) (
- The Feeling of Idea Scarcity by 31 Dec 2022 17:34 UTC; 245 points) (
- Specializing in Problems We Don’t Understand by 10 Apr 2021 22:40 UTC; 174 points) (
- Apologizing is a Core Rationalist Skill by 2 Jan 2024 17:47 UTC; 152 points) (
- There are no coherence theorems by 20 Feb 2023 21:25 UTC; 145 points) (
- Why Not Subagents? by 22 Jun 2023 22:16 UTC; 130 points) (
- AI Alignment 2018-19 Review by 28 Jan 2020 2:19 UTC; 126 points) (
- Selection Theorems: A Program For Understanding Agents by 28 Sep 2021 5:03 UTC; 123 points) (
- What do coherence arguments actually prove about agentic behavior? by 1 Jun 2024 9:37 UTC; 123 points) (
- Why The Focus on Expected Utility Maximisers? by 27 Dec 2022 15:49 UTC; 116 points) (
- There are no coherence theorems by 20 Feb 2023 21:52 UTC; 107 points) (EA Forum;
- 2019 Review: Voting Results! by 1 Feb 2021 3:10 UTC; 99 points) (
- Rationality Exercises Prize of September 2019 ($1,000) by 11 Sep 2019 0:19 UTC; 89 points) (
- Markets are Universal for Logical Induction by 22 Aug 2019 6:44 UTC; 75 points) (
- Project Intro: Selection Theorems for Modularity by 4 Apr 2022 12:59 UTC; 73 points) (
- Review of AI Alignment Progress by 7 Feb 2023 18:57 UTC; 72 points) (
- What Selection Theorems Do We Expect/Want? by 1 Oct 2021 16:03 UTC; 67 points) (
- Rationalists are missing a core piece for agent-like structure (energy vs information overload) by 17 Aug 2024 9:57 UTC; 59 points) (
- Some Existing Selection Theorems by 30 Sep 2021 16:13 UTC; 54 points) (
- Crystal Healing — or the Origins of Expected Utility Maximizers by 25 Jun 2023 3:18 UTC; 54 points) (
- The Shutdown Problem: Incomplete Preferences as a Solution by 23 Feb 2024 16:01 UTC; 52 points) (
- 4. Existing Writing on Corrigibility by 10 Jun 2024 14:08 UTC; 47 points) (
- Understanding Selection Theorems by 28 May 2022 1:49 UTC; 41 points) (
- Convergence Towards World-Models: A Gears-Level Model by 4 Aug 2022 23:31 UTC; 38 points) (
- Selection processes for subagents by 30 Jun 2022 23:57 UTC; 36 points) (
- Formal Philosophy and Alignment Possible Projects by 30 Jun 2022 10:42 UTC; 34 points) (
- A brief review of the reasons multi-objective RL could be important in AI Safety Research by 29 Sep 2021 17:09 UTC; 30 points) (
- Characterizing Real-World Agents as a Research Meta-Strategy by 8 Oct 2019 15:32 UTC; 29 points) (
- What are MIRI’s big achievements in AI alignment? by 7 Mar 2023 21:30 UTC; 29 points) (
- Where do you get your capabilities from? by 29 Dec 2022 11:39 UTC; 28 points) (
- 31 Aug 2021 19:52 UTC; 27 points) 's comment on [Crosspost] On Hreha On Behavioral Economics by (
- The Shutdown Problem: Incomplete Preferences as a Solution by 23 Feb 2024 16:01 UTC; 26 points) (EA Forum;
- Towards Gears-Level Understanding of Agency by 16 Jun 2022 22:00 UTC; 25 points) (
- 2 Oct 2019 20:05 UTC; 25 points) 's comment on What are we assuming about utility functions? by (
- 28 Jan 2023 0:40 UTC; 18 points) 's comment on Selection Theorems: A Program For Understanding Agents by (
- Motivations, Natural Selection, and Curriculum Engineering by 16 Dec 2021 1:07 UTC; 16 points) (
- 2 Oct 2023 16:10 UTC; 14 points) 's comment on EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem by (
- 17 Jun 2021 16:37 UTC; 13 points) 's comment on Reward Is Not Enough by (
- Why The Focus on Expected Utility Maximisers? by 27 Dec 2022 15:51 UTC; 11 points) (EA Forum;
- 28 Mar 2023 16:46 UTC; 11 points) 's comment on A stylized dialogue on John Wentworth’s claims about markets and optimization by (
- On utility functions by 10 Feb 2023 1:22 UTC; 11 points) (
- 22 Jun 2021 17:09 UTC; 11 points) 's comment on I’m no longer sure that I buy dutch book arguments and this makes me skeptical of the “utility function” abstraction by (
- On expected utility, part 3: VNM, separability, and more by 22 Mar 2022 3:05 UTC; 10 points) (
- Synthesis of subagents: exercise by 20 Sep 2019 17:24 UTC; 10 points) (
- 14 Jan 2023 11:05 UTC; 9 points) 's comment on A general comment on discussions of genetic group differences by (
- On expected utility, part 3: VNM, separability, and more by 22 Mar 2022 3:05 UTC; 8 points) (EA Forum;
- 6 Apr 2022 20:06 UTC; 8 points) 's comment on What I Was Thinking About Before Alignment by (
- 21 Feb 2023 4:19 UTC; 8 points) 's comment on There are no coherence theorems by (
- 3 Jan 2023 21:15 UTC; 7 points) 's comment on Shard Theory in Nine Theses: a Distillation and Critical Appraisal by (
- 21 Feb 2023 16:58 UTC; 7 points) 's comment on There are no coherence theorems by (
- 8 Nov 2021 16:52 UTC; 6 points) 's comment on Study Guide by (
- 5 Oct 2022 19:02 UTC; 6 points) 's comment on TurnTrout’s shortform feed by (
- 1 Mar 2023 13:31 UTC; 5 points) 's comment on Contra “Strong Coherence” by (
- Noisy environment regulate utility maximizers by 5 Jun 2022 18:48 UTC; 4 points) (
- 26 Jun 2023 11:06 UTC; 4 points) 's comment on Crystal Healing — or the Origins of Expected Utility Maximizers by (
- 25 Dec 2022 1:11 UTC; 4 points) 's comment on DragonGod’s Shortform by (
- 23 Aug 2019 17:28 UTC; 3 points) 's comment on Vague Thoughts and Questions about Agent Structures by (
- Agents which are EU-maximizing as a group are not EU-maximizing individually by 4 Dec 2023 18:49 UTC; 3 points) (
- 1 Mar 2023 23:06 UTC; 2 points) 's comment on Is “Strong Coherence” Anti-Natural? by (
- 7 Oct 2022 17:25 UTC; 2 points) 's comment on A shot at the diamond-alignment problem by (
- 30 Dec 2020 3:54 UTC; 2 points) 's comment on Review Voting Thread by (
- 21 Feb 2023 11:43 UTC; 2 points) 's comment on There are no coherence theorems by (
- 21 Feb 2023 3:02 UTC; 1 point) 's comment on There are no coherence theorems by (EA Forum;
- 4 Jun 2022 14:10 UTC; 1 point) 's comment on AXRP Episode 15 - Natural Abstractions with John Wentworth by (
- 6 Apr 2022 16:30 UTC; 1 point) 's comment on Project Intro: Selection Theorems for Modularity by (
- 3 Mar 2023 0:00 UTC; 1 point) 's comment on nielsrolf’s Shortform by (
- 19 Nov 2021 8:48 UTC; 1 point) 's comment on Stop button: towards a causal solution by (
- 21 Feb 2023 3:01 UTC; 0 points) 's comment on There are no coherence theorems by (EA Forum;
This post felt like it took a problem that I was thinking about from 3 different perspectives and combined them in a way that felt pretty coherent, though I am fully sure how right it gets it. Concretely, the 3 domains I felt it touched on were:
How much can you model human minds as consistent of subagents?
How much can problems with coherence theorems be addressed by modeling things as subagents?
How much will AI systems behave like consisting of multiple subagents?
All three of these feel pretty important to me.
The “many decisions can be thought of as a committee requiring unanimous agreement” model felt intuitively right to me, and afterwards I’ve observed myself behaving in ways which seem compatible with it, and thought of this post.
Wouldn’t decisions about e.g. which objects get selected and broadcast to the global workspace be made by a majority or plurality of subagents? “Committee requiring unanimous agreement” feels more like what would be the case in practice for a unified mind, to use a TMI term. I guess the unanimous agreement is only required because we’re looking for strict/formal coherence in the overall system, whereas e.g. suboptimally-unified/coherent humans with lots of akrasia can have tug-of-wars between groups of subagents for control.
The way I’d think of it, it’s not that you literally need unanimous agreement, but that in some situations there may be subagents that are strong enough to block a given decision. And then if you only look at the subagents that are strong enough to exert a major influence on that particular decision (and ignore the ones either who don’t care about it or who aren’t strong enough to make a difference), it kind of looks like a committee requiring unanimous agreement.
It gets a little handwavy and metaphorical but so does the concept of a subagent. :)
Ah, I think that makes sense. Is this somehow related to the idea that the consciousness is more of a “last stop for a veto from the collective mind system” for already-subconsciously-initiated thoughts and actions? Struggling to remember where I read this, though.
Yeah, considering the fact that subagents are only “agents” insofar as it makes sense to apply the intentional stance (the thing we’d like to avoid having to apply to the whole system because it seems fundamentally limited) to the individual parts, I’m not surprised. It seems like it’s either “agents all the way down” or abandon the concept of agency altogether (although posing that dichotomy feels like a suspicious presumption of agency, itself!).