Each subagent’s bargaining power is how much compute they can use. This makes everything more chaotic, and is clearly not what you had in mind with this kind of idealized agents solution.
Probabilistic vetos, such that those of some subagents are less likely to work. I think this breaks things in your proposal and still has the game-theoretic problems.
We ensure the priors of each subagent (about how the others respond) are such that going for risky game-theoretic stuff is not individually rational. Maybe some agents have more optimistic priors, and others less optimistic, and this results in the former controlling more, and the latter only try to use their veto in extreme cases (like to ensure the wrong successor is not built). But it’d be fiddly to think about the effect of these different priors on behavior, and how “extreme” the cases are in which veto is useful. And also this might mess up the agent’s interactions with the world in other ways: for example, dogmatically believing that algorithms that look like subagents have “exactly this behavior”, which is sometimes false. Although of course this kind of problem was already present in your proposal.
Brain-storming fixes:
Each subagent’s bargaining power is how much compute they can use. This makes everything more chaotic, and is clearly not what you had in mind with this kind of idealized agents solution.
Probabilistic vetos, such that those of some subagents are less likely to work. I think this breaks things in your proposal and still has the game-theoretic problems.
We ensure the priors of each subagent (about how the others respond) are such that going for risky game-theoretic stuff is not individually rational. Maybe some agents have more optimistic priors, and others less optimistic, and this results in the former controlling more, and the latter only try to use their veto in extreme cases (like to ensure the wrong successor is not built). But it’d be fiddly to think about the effect of these different priors on behavior, and how “extreme” the cases are in which veto is useful. And also this might mess up the agent’s interactions with the world in other ways: for example, dogmatically believing that algorithms that look like subagents have “exactly this behavior”, which is sometimes false. Although of course this kind of problem was already present in your proposal.