How does one correctly handle multi-agent dilemmas, in which you know the other agents follow the same decision theory? My implementation of “UDT” defects in a prisoner’s dilemma against an agent that it knows is following the same decision procedure. More precisely: Alice and Bob follow the same decision procedure, and they both know it. Alice will choose between cooperate/defect, then Bob will choose between cooperate/defect without knowing what Alice picked, then the utility will be delivered. My “UDT” decision procedure reasons as follows for Alice: “if I had pre-commited to cooperate, then Bob would know that, so he would defect, therefore I defect”. Is there a known way out of this, besides special casing symmetric dilemmas, which is brittle?
My solution, which assumes computation is expensive, is to reason about other agents based on their behavior towards a simplified-model third agent; the simplest possible version of this is “Defect against bots who defect against cooperate-bot, otherwise cooperate” (and this seems relatively close to how humans operate—we don’t like people who defect against the innocent).
My solution, which assumes computation is expensive
Ah, so I’m interested in normative decision theory: how one should ideally behave to maximize their own utility. This is what e.g. UDT&FDT are aiming for. (Keep in mind that “your own utility” can, and should, often include other people’s utility too.)
Minimizing runtime is not at all a goal. I think the runtime of the decision theories I implemented is something like doubly exponential in the number of steps of the simulation (the number of events in the simulation is exponential in its duration; each decision typically involves running the simulation using a trivial decision theory).
reason about other agents based on their behavior towards a simplified-model third agent
That’s an interesting approach I hadn’t considered. While I don’t care about efficiency in the “how fast does it run” sense, I do care about efficiency in the “does it terminate” sense, and that approach has the advantage of terminating.
Defect against bots who defect against cooperate-bot, otherwise cooperate
You’re doing to defect against UDT/FDT then. They defect against cooperate-bot. You’re thinking it’s bad to defect against cooperate-bot, because you have empathy for the other person. But I suspect you didn’t account for that empathy in your utility function in the payoff matrix, and that if you do, you’ll find that you’re not actually in a prisoner’s dilemma in the game-theory sense. There was a good SlateStarCodex post about this that I can’t find.
Evolution gave us “empathy for the other person”, and evolution is a reasonable proxy for a perfectly selfish utility machine, which is probably good evidence that this might be an optimal solution to the game theory problem. (Note: Not -the- optimal solution, but -an- optimal solution, in an ecosystem of optimal solutions.)
If you think evolution has a utility function, and that it’s the SAME function that an agent formed by an evolutionary process has, you’re not likely to get me to follow you down any experimental or reasoning path. And if you think this utility function is “perfectly selfish”, you’ve got EVEN MORE work cut out in defining terms, because those just don’t mean what I think you want them to.
Empathy as a heuristic to enable cooperation is easy to understand, but when normatively modeling things, you have to deconstruct the heuristics to actual goals and strategies.
Take a step back and try rereading what I wrote in a charitable light, because it appears you have completely misconstrued what I was saying.
A major part of the “cooperation” involved here is in being able to cooperate with yourself. In an environment with a well-mixed group of bots each employing differing strategies, and some kind of reproductive rule (if you have 100 utility, say, spawn a copy of yourself), Cooperate-bots are unlikely to be terribly prolific; they lose out against many other bots.
In such an environment, a strategem of defecting against bots that defect against cooperate-bot is a -cheap- mechanism of coordination; you can coordinate with other “Selfish Altruist” bots, and cooperate with them, but you don’t take a whole lot of hits from failing to edit: defect against cooperate-bot. Additionally, you’re unlikely to run up against very many bots that cooperate with cooperate-bot, but defect against you. As a coordination strategy, it is therefore inexpensive.
And if “computation time” is considered as an expense against utility, which I think reasonably should be the case, you’re doing a relatively good job minimizing this; you have to perform exactly one prediction of what another bot will do. I did mention this was a factor.
My solution, which assumes computation is expensive, is to reason about other agents based on their behavior towards a simplified-model third agent; the simplest possible version of this is “Defect against bots who defect against cooperate-bot, otherwise cooperate” (and this seems relatively close to how humans operate—we don’t like people who defect against the innocent).
Ah, so I’m interested in normative decision theory: how one should ideally behave to maximize their own utility. This is what e.g. UDT&FDT are aiming for. (Keep in mind that “your own utility” can, and should, often include other people’s utility too.)
Minimizing runtime is not at all a goal. I think the runtime of the decision theories I implemented is something like doubly exponential in the number of steps of the simulation (the number of events in the simulation is exponential in its duration; each decision typically involves running the simulation using a trivial decision theory).
That’s an interesting approach I hadn’t considered. While I don’t care about efficiency in the “how fast does it run” sense, I do care about efficiency in the “does it terminate” sense, and that approach has the advantage of terminating.
You’re doing to defect against UDT/FDT then. They defect against cooperate-bot. You’re thinking it’s bad to defect against cooperate-bot, because you have empathy for the other person. But I suspect you didn’t account for that empathy in your utility function in the payoff matrix, and that if you do, you’ll find that you’re not actually in a prisoner’s dilemma in the game-theory sense. There was a good SlateStarCodex post about this that I can’t find.
Evolution gave us “empathy for the other person”, and evolution is a reasonable proxy for a perfectly selfish utility machine, which is probably good evidence that this might be an optimal solution to the game theory problem. (Note: Not -the- optimal solution, but -an- optimal solution, in an ecosystem of optimal solutions.)
If you think evolution has a utility function, and that it’s the SAME function that an agent formed by an evolutionary process has, you’re not likely to get me to follow you down any experimental or reasoning path. And if you think this utility function is “perfectly selfish”, you’ve got EVEN MORE work cut out in defining terms, because those just don’t mean what I think you want them to.
Empathy as a heuristic to enable cooperation is easy to understand, but when normatively modeling things, you have to deconstruct the heuristics to actual goals and strategies.
Take a step back and try rereading what I wrote in a charitable light, because it appears you have completely misconstrued what I was saying.
A major part of the “cooperation” involved here is in being able to cooperate with yourself. In an environment with a well-mixed group of bots each employing differing strategies, and some kind of reproductive rule (if you have 100 utility, say, spawn a copy of yourself), Cooperate-bots are unlikely to be terribly prolific; they lose out against many other bots.
In such an environment, a strategem of defecting against bots that defect against cooperate-bot is a -cheap- mechanism of coordination; you can coordinate with other “Selfish Altruist” bots, and cooperate with them, but you don’t take a whole lot of hits from failing to edit: defect against cooperate-bot. Additionally, you’re unlikely to run up against very many bots that cooperate with cooperate-bot, but defect against you. As a coordination strategy, it is therefore inexpensive.
And if “computation time” is considered as an expense against utility, which I think reasonably should be the case, you’re doing a relatively good job minimizing this; you have to perform exactly one prediction of what another bot will do. I did mention this was a factor.