How well LLMs follow which decision theory affects their ability to cooperate. This could mean the difference between peace and conflict in AI-assisted political bargaining or enable AIs to collude when one is meant to monitor the other, undermining human control.
Do you have any thoughts on “red lines” for AI collusion? That is, “if an AI could do X, then we should acknowledge that AIs can likely collude with each other in monitoring setups.”
Do you have any thoughts on “red lines” for AI collusion? That is, “if an AI could do X, then we should acknowledge that AIs can likely collude with each other in monitoring setups.”