Thanks for this! I agree that inter-agent safety problems are highly neglected, and that it’s not clear that intent alignment or the kinds of capability robustness incentivized by default will solve (or are the best ways to solve) these problems. I’d recommend looking into Cooperative AI, and the “multi/multi” axis of ARCHES.
This sequence discusses similar concerns—we operationalize what you call inter-agent alignment problems as either:
Subsets of capability robustness, because if an AGI wants to achieve X in some multi-agent environment, then accounting for the dependencies of its strategy on other agents’ strategies is instrumental to achieving X (but accounting for these dependencies might be qualitatively harder than default capabilities); or
Subsets of intent alignment, because the AGI’s preferences partly shape how likely it is to cooperate with others, and we might be able to intervene on cooperation-relevant preferences even if full intent alignment fails.
Thanks for this! I agree that inter-agent safety problems are highly neglected, and that it’s not clear that intent alignment or the kinds of capability robustness incentivized by default will solve (or are the best ways to solve) these problems. I’d recommend looking into Cooperative AI, and the “multi/multi” axis of ARCHES.
This sequence discusses similar concerns—we operationalize what you call inter-agent alignment problems as either:
Subsets of capability robustness, because if an AGI wants to achieve X in some multi-agent environment, then accounting for the dependencies of its strategy on other agents’ strategies is instrumental to achieving X (but accounting for these dependencies might be qualitatively harder than default capabilities); or
Subsets of intent alignment, because the AGI’s preferences partly shape how likely it is to cooperate with others, and we might be able to intervene on cooperation-relevant preferences even if full intent alignment fails.