PrudentBot is modelling its counterparty, and the setup in which it runs is what makes the modelling and legibility possible. To make PrudentBot work, the comprehension of decision theory, counterparty modelling, and legibility are all required. It’s just that these elements are spread out, in various ways, between (a) the minds of the researchers who created the bots (b) the source code of the bots themselves (c) the setup / testbed that makes it possible for the bots to faithfully exchange source code with each other.
Also, arenas where you can submit a simple program are kind of toy examples—if you’re facing a real, high-stakes prisoner’s dilemma and you can set things up such that you can just have some programs make the decisions for you, you’re probably already capable of coordinating and cooperating with your counterparty sufficiently well that you could just avoid the prisoner’s dilemma entirely, if it were happening in real life and not a simulated game.
PrudentBot’s counterparty is another program intended to be legible, not a human. The point is that in practice it’s not necessary to model any humans, humans can delegate legibility to programs they submit as their representatives. It’s a popular meme that humans are incapable of performing Löbian cooperation, because they can’t model each other’s messy minds, that only AIs could make their own thinking legible to each other, granting them unique powers of coordination. This is not the case.
if it were happening in real life and not a simulated game
Programs and protocols become real life when they are given authority to enact their computations. To the extent Pareto inefficient outcomes actually happen in real life, it’s worth replacing negotiations with things like this, and fall back to BATNA when the arena says (D,D).
The point is that in practice it’s not necessary to model any humans,
Right, but my point is that it’s still necessary for something to model something. The bot arena setup in the paper has been carefully arranged so that the modelling is in the bots, the legibility is in the setup, and the decision theory comprehension is in the author’s brains.
I claim that all three of these components are necessary for robust cooperation, along with some clever system design work to make each component separable and realizable (e.g. it would be much harder to have the modelling happen in the researcher brains and the decision theory comprehension happen in the bots).
Two humans, locked in a room together, facing a true PD, without access to computers or an arena or an adjudicator, cannot necessarily robustly cooperate with each other for decision theoretic reasons, even if they both understand decision theory.
When you don’t model your human counterparty’s mind anyway, it doesn’t matter if they comprehend decision theory. The whole point of delegating to bots is that only understanding of bots by bots remains necessary after that. If your human counterparty doesn’t understand decision theory, they might submit a foolish bot, while your understanding of decision theory earns you a pile of utility.
So while the motivation for designing and setting up an arena in a particular way might be in decision theory, the use of the arena doesn’t require this understanding of the human users, and yet it can shape incentives in a way that defeats bad equilibria of classical game theory.
PrudentBot is modelling its counterparty, and the setup in which it runs is what makes the modelling and legibility possible. To make PrudentBot work, the comprehension of decision theory, counterparty modelling, and legibility are all required. It’s just that these elements are spread out, in various ways, between (a) the minds of the researchers who created the bots (b) the source code of the bots themselves (c) the setup / testbed that makes it possible for the bots to faithfully exchange source code with each other.
Also, arenas where you can submit a simple program are kind of toy examples—if you’re facing a real, high-stakes prisoner’s dilemma and you can set things up such that you can just have some programs make the decisions for you, you’re probably already capable of coordinating and cooperating with your counterparty sufficiently well that you could just avoid the prisoner’s dilemma entirely, if it were happening in real life and not a simulated game.
PrudentBot’s counterparty is another program intended to be legible, not a human. The point is that in practice it’s not necessary to model any humans, humans can delegate legibility to programs they submit as their representatives. It’s a popular meme that humans are incapable of performing Löbian cooperation, because they can’t model each other’s messy minds, that only AIs could make their own thinking legible to each other, granting them unique powers of coordination. This is not the case.
Programs and protocols become real life when they are given authority to enact their computations. To the extent Pareto inefficient outcomes actually happen in real life, it’s worth replacing negotiations with things like this, and fall back to BATNA when the arena says (D,D).
Right, but my point is that it’s still necessary for something to model something. The bot arena setup in the paper has been carefully arranged so that the modelling is in the bots, the legibility is in the setup, and the decision theory comprehension is in the author’s brains.
I claim that all three of these components are necessary for robust cooperation, along with some clever system design work to make each component separable and realizable (e.g. it would be much harder to have the modelling happen in the researcher brains and the decision theory comprehension happen in the bots).
Two humans, locked in a room together, facing a true PD, without access to computers or an arena or an adjudicator, cannot necessarily robustly cooperate with each other for decision theoretic reasons, even if they both understand decision theory.
When you don’t model your human counterparty’s mind anyway, it doesn’t matter if they comprehend decision theory. The whole point of delegating to bots is that only understanding of bots by bots remains necessary after that. If your human counterparty doesn’t understand decision theory, they might submit a foolish bot, while your understanding of decision theory earns you a pile of utility.
So while the motivation for designing and setting up an arena in a particular way might be in decision theory, the use of the arena doesn’t require this understanding of the human users, and yet it can shape incentives in a way that defeats bad equilibria of classical game theory.