I haven’t read Critch in-depth, so I can’t guarantee I’m pointing towards the same concept he is. Consider this a bit of an impromptu intuition dump, this might be trivial. No claims on originality of any of these thoughts and epistemic status “¯\_(ツ)_/¯”
The way I currently think about it is that multi-multi is the “full hard problem”, and single-single is a particularly “easy” (still not easy) special case.
In a way we’re making some simplifying assumptions in the single-single case. That we have one (pseudo-cartesian) “agent” that has some kind of definite (or at least bounded-ly complicated) values that can be expressed. This means we kind of have “just” the usual problems of a) expressing/extracting/understanding the values, in so far as that is possible (outer alignment) and b) making sure the agent actually fulfills those values (inner alignment).
Multi principals then relaxes this assumption into saying we don’t have a “single” function, but multiple, which introduces another “necessary ingredient”: Some kind of social choice theory “synthesis function”, that can take in all the individual functions and spit out a “super utility function” that represents some morally acceptable amalgamation of the other functions (whatever that means). The single case is a simpler special case in that the synthesis function is the equivalent of the identity function, but that no longer works if you have multiple inputs.
In a very simplistic sense, multi is “harder” because we are introducing an additional “degree of freedom”. So you might argue we have outer alignment, inner alignment and “even-more-outerer alignment” or “multi-outer alignment” (which would be the synthesis problem), and you probably have to make hard (potentially irreconcilable) moral choices for at least the latter (probably for all).
In multi-multi, if the agents serve (or have different levels of alignment towards) different subsets of principals, this would then add the additional difficulty of game theory between the different agents and how they should coordinate. We can call that the “multi-inner alignment problem” or something, the question of how to get the amalgamation of competing agents to be “inner aligned” and not blow everything up and getting stuck in defect-defect spirals or whatever. (This reminds me a lot of what CLR works on)
I tbh am not sure if single-multi would be harder/different from single-single just “applied multiple times”. Maybe if the agents have different ideas of what the principal wants they could compete, but that seems like a failure of outer alignment, but maybe it would be better cast as a kind of failure of “multi-inner alignment”.
So in summary I think solutions (in so far as such a thing even exists in an objective fashion, which it may or may not) to the multi-multi problem are a superset of solutions to multi-single, single-multi and single-single. Vaguely, outer alignment = normativity/value learning, inner alignment = principal agent problem, multi-outer alignment = social choice, multi-inner alignment = game theory, and you need to solve all four to solve multi-multi. If you make certain simplifying assumptions which correspond to introducing “singles”, you can ignore one or more of these (i.e. a single agent doesn’t need game theory, a single principal doesn’t need social choice).
Or something. Maybe the metaphor is too much of a stretch and I’m seeing spurious patterns.
I wrote out the 2x2 grid you suggested in MS paint
I’m not sure I’m catching how multi-inner is game theory. Except that I think “GT is the mesa- of SCT” is an interesting, reasonable (to me) claim that is sort of blowing my mind as I contemplate it, so far.
I haven’t read Critch in-depth, so I can’t guarantee I’m pointing towards the same concept he is. Consider this a bit of an impromptu intuition dump, this might be trivial. No claims on originality of any of these thoughts and epistemic status “¯\_(ツ)_/¯”
The way I currently think about it is that multi-multi is the “full hard problem”, and single-single is a particularly “easy” (still not easy) special case.
In a way we’re making some simplifying assumptions in the single-single case. That we have one (pseudo-cartesian) “agent” that has some kind of definite (or at least bounded-ly complicated) values that can be expressed. This means we kind of have “just” the usual problems of a) expressing/extracting/understanding the values, in so far as that is possible (outer alignment) and b) making sure the agent actually fulfills those values (inner alignment).
Multi principals then relaxes this assumption into saying we don’t have a “single” function, but multiple, which introduces another “necessary ingredient”: Some kind of social choice theory “synthesis function”, that can take in all the individual functions and spit out a “super utility function” that represents some morally acceptable amalgamation of the other functions (whatever that means). The single case is a simpler special case in that the synthesis function is the equivalent of the identity function, but that no longer works if you have multiple inputs.
In a very simplistic sense, multi is “harder” because we are introducing an additional “degree of freedom”. So you might argue we have outer alignment, inner alignment and “even-more-outerer alignment” or “multi-outer alignment” (which would be the synthesis problem), and you probably have to make hard (potentially irreconcilable) moral choices for at least the latter (probably for all).
In multi-multi, if the agents serve (or have different levels of alignment towards) different subsets of principals, this would then add the additional difficulty of game theory between the different agents and how they should coordinate. We can call that the “multi-inner alignment problem” or something, the question of how to get the amalgamation of competing agents to be “inner aligned” and not blow everything up and getting stuck in defect-defect spirals or whatever. (This reminds me a lot of what CLR works on)
I tbh am not sure if single-multi would be harder/different from single-single just “applied multiple times”. Maybe if the agents have different ideas of what the principal wants they could compete, but that seems like a failure of outer alignment, but maybe it would be better cast as a kind of failure of “multi-inner alignment”.
So in summary I think solutions (in so far as such a thing even exists in an objective fashion, which it may or may not) to the multi-multi problem are a superset of solutions to multi-single, single-multi and single-single. Vaguely, outer alignment = normativity/value learning, inner alignment = principal agent problem, multi-outer alignment = social choice, multi-inner alignment = game theory, and you need to solve all four to solve multi-multi. If you make certain simplifying assumptions which correspond to introducing “singles”, you can ignore one or more of these (i.e. a single agent doesn’t need game theory, a single principal doesn’t need social choice).
Or something. Maybe the metaphor is too much of a stretch and I’m seeing spurious patterns.
I wrote out the 2x2 grid you suggested in MS paint
I’m not sure I’m catching how multi-inner is game theory. Except that I think “GT is the mesa- of SCT” is an interesting, reasonable (to me) claim that is sort of blowing my mind as I contemplate it, so far.