It’s great to see work on the multiagent setting! This setting does seem quite a bit more complex, and hasn’t been explored very much to my knowledge.
One major question I have after reading this post and the associated paper is, how does this relate to the work already done in academia? Sure, game theory makes unrealistic assumptions, but these assumptions don’t seem horribly wrong when applied to simplified settings when we do have a good model. For example, with overfishing, even if everyone knew exactly what would happen if they overfished, the problem would still arise (and it’s not unreasonable to think that everyone could know exactly what would happen if they overfished). I know that in your setting, the agents don’t know about what other agents are doing, which makes it different from a classic tragedy of the commons, but I’d be surprised if this hadn’t been studied before.
Even if you dislike the assumptions of game theory, I’m sure political science, law, etc. have tackled these sorts of situations, and they aren’t going to make the Bayesian/rational assumption unless it’s a reasonable model. (Or at least, if it isn’t a reasonable model, people will call them out on it and probably develop something better.)
To give my quick takes on how each of the failure modes are related to existing academic work: Accidental steering is novel to me (but I wouldn’t be surprised if there has been work on it), coordination failures seem like a particular kind of (large scale) prisoner’s dilemma, adversarial misalignment is a special case of the principal-agent problem, input spoofing and filtering and goal co-option seem like special cases of adversarial misalignment (and are related to ML security as you pointed out).
Yes, there is a ton of work on some of these in certain settings, and I’m familiar with some of it.
In fact, the connections are so manifold that I suspect it would be useful to lay out which if these connections seems useful, in another paper, if only to save other people time and energy trying to do the same and finding dead-ends. On reflection, however, I’m concerned about how big of a project this ends up becoming, and I am unsure how useful it would be to applied work in AI coordination.
Just as one rabbit hole to go down, there is a tremendous amount of work on cooperation, which spans several very different literatures. The most relevant work, to display my own obvious academic bias, seems to be from public policy and economics, and includes work on participatory decision making and cooperative models for managing resources. Next, you mentioned law—I know there is work on interest-based negotiation, where defining the goals clearly allows better solutions, as well as work on mediation. In business, there is work on team-building that touches on these points, as well as inter-group and inter-firm competition and cooperation, which touch on related work in economics. I know the work on principle agent problems, as well as game-theory applied to more realistic scenarios. (Game theorists I’ve spoken with have noted the fragility of solutions to very minor changes in the problem, which is why it’s rarely applied.) There’s work in evolutionary theory, as well as systems biology, that touches on some of these points. Social psychology, Anthropology, and Sociology all presumably have literatures on the topic as well, but I’m not at all familiar with them.
Agreed that finding all connections would be a big project, but I think that anyone who tries to build off your work will either have to do the literature search anyway (at least for the parts they want to work on), or will end up reinventing the wheel. Perhaps you could find one or two literature review papers for each topic and cite those? I would imagine that you could get most of the value by finding ~20 such papers, which while not an easy task for fields you aren’t familiar with, should still be doable with tens of hours of effort, and hopefully those papers would be useful for your own thinking.
It’s great to see work on the multiagent setting! This setting does seem quite a bit more complex, and hasn’t been explored very much to my knowledge.
One major question I have after reading this post and the associated paper is, how does this relate to the work already done in academia? Sure, game theory makes unrealistic assumptions, but these assumptions don’t seem horribly wrong when applied to simplified settings when we do have a good model. For example, with overfishing, even if everyone knew exactly what would happen if they overfished, the problem would still arise (and it’s not unreasonable to think that everyone could know exactly what would happen if they overfished). I know that in your setting, the agents don’t know about what other agents are doing, which makes it different from a classic tragedy of the commons, but I’d be surprised if this hadn’t been studied before.
Even if you dislike the assumptions of game theory, I’m sure political science, law, etc. have tackled these sorts of situations, and they aren’t going to make the Bayesian/rational assumption unless it’s a reasonable model. (Or at least, if it isn’t a reasonable model, people will call them out on it and probably develop something better.)
To give my quick takes on how each of the failure modes are related to existing academic work: Accidental steering is novel to me (but I wouldn’t be surprised if there has been work on it), coordination failures seem like a particular kind of (large scale) prisoner’s dilemma, adversarial misalignment is a special case of the principal-agent problem, input spoofing and filtering and goal co-option seem like special cases of adversarial misalignment (and are related to ML security as you pointed out).
Yes, there is a ton of work on some of these in certain settings, and I’m familiar with some of it.
In fact, the connections are so manifold that I suspect it would be useful to lay out which if these connections seems useful, in another paper, if only to save other people time and energy trying to do the same and finding dead-ends. On reflection, however, I’m concerned about how big of a project this ends up becoming, and I am unsure how useful it would be to applied work in AI coordination.
Just as one rabbit hole to go down, there is a tremendous amount of work on cooperation, which spans several very different literatures. The most relevant work, to display my own obvious academic bias, seems to be from public policy and economics, and includes work on participatory decision making and cooperative models for managing resources. Next, you mentioned law—I know there is work on interest-based negotiation, where defining the goals clearly allows better solutions, as well as work on mediation. In business, there is work on team-building that touches on these points, as well as inter-group and inter-firm competition and cooperation, which touch on related work in economics. I know the work on principle agent problems, as well as game-theory applied to more realistic scenarios. (Game theorists I’ve spoken with have noted the fragility of solutions to very minor changes in the problem, which is why it’s rarely applied.) There’s work in evolutionary theory, as well as systems biology, that touches on some of these points. Social psychology, Anthropology, and Sociology all presumably have literatures on the topic as well, but I’m not at all familiar with them.
Agreed that finding all connections would be a big project, but I think that anyone who tries to build off your work will either have to do the literature search anyway (at least for the parts they want to work on), or will end up reinventing the wheel. Perhaps you could find one or two literature review papers for each topic and cite those? I would imagine that you could get most of the value by finding ~20 such papers, which while not an easy task for fields you aren’t familiar with, should still be doable with tens of hours of effort, and hopefully those papers would be useful for your own thinking.