Playing ultimatum game against an agent that gives in to $9 from rocks but not from us is not in the fair problem class, as the payoffs depend directly on our algorithm and not just on our choices and policies.
LDT decision theories are probably the best decision theories for problems in the fair problem class.
The post demonstrates why this statement is misleading.
If “play the ultimatum game against a LDT agent” is not in the fair problem class, I’d say that LDT shouldn’t be in the “fair agent class”. It is like saying that in a tortoise-only race, the best racer is a hare because a hare can beat all the tortoises.
So based on the definitions you gave I’d classify “LDT is the best decision theory for problems in the fair problem class” as not even wrong.
In particular, consider a class of allowable problems S, but then also say that an agent X is allowable only if “play a given game with X” is in S. Then the proof in the No agent is rational in every problem section of my proof goes through for allowable agents. (Note that that argument in that section is general enough to apply to agents that don’t give into $9 rock.)
Practically speaking: if you’re trying to follow decision theory X, than playing against other X is a reasonable problem
It’s reasonable to consider two agents playing against each other. “Playing against your copy” is a reasonable problem. ($9 rocks get 0 in this problem, LDTs probably get $5.)
Newcomb, Parfit’s hitchhiker, smoking, etc. are all very reasonable problems that essentially depend on the buttons you press when you play the game. It is important to get these problems right.
But playing against LDT is not necessarily in the “fair problem class” because the game might behave differently depending on your algorithm/on how you arrive at taking actions, and not just depending on your actions.
Your version of it- playing against an LDT- is indeed different from playing against a game that looks at whether we’re an alphabetizing agent and pick X instead of Y because X<Y and not because we looked at the expected utility: we would want LDT to perform optimally in this game. But the reason LDT-created-rock loses to a natural rock here isn’t fundamentally different from the reason LDT loses to an alphabetizing agent in the other game and it is known that you can construct a game like that where LDT will lose to something else. You can make the game description sound more natural, but I feel like there’s a sharp divide between the “fair problem class” problems and others.
(I also think that in real life, where this game might play out, there isn’t really a choice we can make, to make our AI a $9 rock instead of an LDT agent; because when we do that due to the rock’s better performance in this game, our rock gets slightly less than $5 in EV instead of getting $9; LDT doesn’t perform worse than other agents we could’ve chosen in this game.)
Playing ultimatum game against an agent that gives in to $9 from rocks but not from us is not in the fair problem class, as the payoffs depend directly on our algorithm and not just on our choices and policies.
https://arbital.com/p/fair_problem_class/
A simpler game is “if you implement or have ever implemented LDT, you get $0; otherwise, you get $100”.
LDT decision theories are probably the best decision theories for problems in the fair problem class.
(Very cool that you’ve arrived at the idea of this post independently!)
The post demonstrates why this statement is misleading.
If “play the ultimatum game against a LDT agent” is not in the fair problem class, I’d say that LDT shouldn’t be in the “fair agent class”. It is like saying that in a tortoise-only race, the best racer is a hare because a hare can beat all the tortoises.
So based on the definitions you gave I’d classify “LDT is the best decision theory for problems in the fair problem class” as not even wrong.
In particular, consider a class of allowable problems S, but then also say that an agent X is allowable only if “play a given game with X” is in S. Then the proof in the No agent is rational in every problem section of my proof goes through for allowable agents. (Note that that argument in that section is general enough to apply to agents that don’t give into $9 rock.)
Practically speaking: if you’re trying to follow decision theory X, than playing against other X is a reasonable problem
It’s reasonable to consider two agents playing against each other. “Playing against your copy” is a reasonable problem. ($9 rocks get 0 in this problem, LDTs probably get $5.)
Newcomb, Parfit’s hitchhiker, smoking, etc. are all very reasonable problems that essentially depend on the buttons you press when you play the game. It is important to get these problems right.
But playing against LDT is not necessarily in the “fair problem class” because the game might behave differently depending on your algorithm/on how you arrive at taking actions, and not just depending on your actions.
Your version of it- playing against an LDT- is indeed different from playing against a game that looks at whether we’re an alphabetizing agent and pick X instead of Y because X<Y and not because we looked at the expected utility: we would want LDT to perform optimally in this game. But the reason LDT-created-rock loses to a natural rock here isn’t fundamentally different from the reason LDT loses to an alphabetizing agent in the other game and it is known that you can construct a game like that where LDT will lose to something else. You can make the game description sound more natural, but I feel like there’s a sharp divide between the “fair problem class” problems and others.
(I also think that in real life, where this game might play out, there isn’t really a choice we can make, to make our AI a $9 rock instead of an LDT agent; because when we do that due to the rock’s better performance in this game, our rock gets slightly less than $5 in EV instead of getting $9; LDT doesn’t perform worse than other agents we could’ve chosen in this game.)