When I initially read this post, I got the impression that “subagents = path-dependent/incomplete DAG”. After working through more examples, it seems like all the work is being done by “committee requiring unanimous agreement” rather than by the “subagents” part.
Here are the examples I thought about:
Same as the mushroom/pepperoni situation, with the same two agents, but now each side can retaliate/hijack the rest of the mind if it doesn’t get what it wants. For example, if it starts at pepperoni, the mushroom-preferring agent will hijack the rest of the mind to remove the pepperoni, ending up at cheese. But if the agent starts at the “both” node, it will stay there (because both agents are satisfied). The preference relation can be represented as pepperoni→cheese←mushroom with an extra arrow from cheese→both. This is still a DAG, and it’s still incomplete (in the sense that we can’t compare pepperoni vs mushroom) but it’s no longer path-dependent, because no matter where we start, we end up at cheese or “both” (I am assuming that toppings-removal can always be done, whereas acquiring new toppings can’t).
Same as the previous example, except now only the mushroom-preferring agent can retaliate/hijack (because the pepperoni-preferring agent is weak or nice). Now the preferences are pepperoni→cheese→mushroom→both. This is still a DAG, but now the preferences are total, so we can also view it as a (somewhat weird) single agent. A realistic example of this is given by Andrew Critch, where pepperoni=work, cheese=burnout (i.e. neither work nor friendship), mushroom=friendship, and both=friendship-and-work.
A modified version of the Zyzzx Prime planet by Scott Alexander. Now whenever we start out at pepperoni, the pepperoni-preferring agent becomes stupid/weak, and loses dominance, so now there are edges from pepperoni to mushroom and “both”. (And similarly, mushroom points to both pepperoni and “both”.) Now we no longer have a DAG because of the cycle between pepperoni and mushroom.
It seems like when people talk about the human mind being composed of subagents, the deliberation process is not necessarily “committee requiring unanimous agreement”, so the resulting preference relations cannot necessarily be represented using path-dependent DAGs.
It also seems like the general framework of viewing systems as subagents (i.e. not restricting to “committee requiring unanimous agreement”) is broad enough that it can basically represent any kind of directed graph. On one hand, this is suspicious (if everything can be viewed as a bunch of subagents, then maybe the subagents framework isn’t adding anything after all). On the other hand, this suggests that claims of subagents are not really about the resulting behavior/preference ordering of the system, but rather about the internal dynamics of the system.
I definitely agree that most of the work is being done by the structure in which the subagents interact (i.e. committee requiring unanimous agreement) rather than the subagents themselves. That said, I wouldn’t get too hung up on “committee requiring unanimous agreement” specifically—there are structures which behave like unanimous committees but don’t look like a unanimous committee on the surface, e.g. markets. In a market, everyone has a veto, but each agent only cares about their own basket of goods—they don’t care if somebody else’ basket changes.
In the context of humans, one way to interpret this post is that it predicts that subagents in a human usually have veto power over decisions directly touching on the thing they care about. This sounds like a pretty good model of, for example, humans asked about trade-offs between sacred values.
When I initially read this post, I got the impression that “subagents = path-dependent/incomplete DAG”. After working through more examples, it seems like all the work is being done by “committee requiring unanimous agreement” rather than by the “subagents” part.
Here are the examples I thought about:
Same as the mushroom/pepperoni situation, with the same two agents, but now each side can retaliate/hijack the rest of the mind if it doesn’t get what it wants. For example, if it starts at pepperoni, the mushroom-preferring agent will hijack the rest of the mind to remove the pepperoni, ending up at cheese. But if the agent starts at the “both” node, it will stay there (because both agents are satisfied). The preference relation can be represented as pepperoni→cheese←mushroom with an extra arrow from cheese→both. This is still a DAG, and it’s still incomplete (in the sense that we can’t compare pepperoni vs mushroom) but it’s no longer path-dependent, because no matter where we start, we end up at cheese or “both” (I am assuming that toppings-removal can always be done, whereas acquiring new toppings can’t).
Same as the previous example, except now only the mushroom-preferring agent can retaliate/hijack (because the pepperoni-preferring agent is weak or nice). Now the preferences are pepperoni→cheese→mushroom→both. This is still a DAG, but now the preferences are total, so we can also view it as a (somewhat weird) single agent. A realistic example of this is given by Andrew Critch, where pepperoni=work, cheese=burnout (i.e. neither work nor friendship), mushroom=friendship, and both=friendship-and-work.
A modified version of the Zyzzx Prime planet by Scott Alexander. Now whenever we start out at pepperoni, the pepperoni-preferring agent becomes stupid/weak, and loses dominance, so now there are edges from pepperoni to mushroom and “both”. (And similarly, mushroom points to both pepperoni and “both”.) Now we no longer have a DAG because of the cycle between pepperoni and mushroom.
It seems like when people talk about the human mind being composed of subagents, the deliberation process is not necessarily “committee requiring unanimous agreement”, so the resulting preference relations cannot necessarily be represented using path-dependent DAGs.
It also seems like the general framework of viewing systems as subagents (i.e. not restricting to “committee requiring unanimous agreement”) is broad enough that it can basically represent any kind of directed graph. On one hand, this is suspicious (if everything can be viewed as a bunch of subagents, then maybe the subagents framework isn’t adding anything after all). On the other hand, this suggests that claims of subagents are not really about the resulting behavior/preference ordering of the system, but rather about the internal dynamics of the system.
I definitely agree that most of the work is being done by the structure in which the subagents interact (i.e. committee requiring unanimous agreement) rather than the subagents themselves. That said, I wouldn’t get too hung up on “committee requiring unanimous agreement” specifically—there are structures which behave like unanimous committees but don’t look like a unanimous committee on the surface, e.g. markets. In a market, everyone has a veto, but each agent only cares about their own basket of goods—they don’t care if somebody else’ basket changes.
In the context of humans, one way to interpret this post is that it predicts that subagents in a human usually have veto power over decisions directly touching on the thing they care about. This sounds like a pretty good model of, for example, humans asked about trade-offs between sacred values.