The hosts aren’t competing with the human, only each other, so even if the hosts move first logically they have no reason or opportunity to try to dissuade the player from whatever they’d do otherwise. FDT is underdefined in zero-sum symmetrical strategy games against psychological twins, since it can foresee a draw no matter what, but choosing optimal strategy to get to the draw still seems better than playing dumb strategies on purpose and then still drawing anyway.
Why do you think they should be $100 and $200? Maybe you could try simulating it?
What happens if FDT tries to force all the incentives into one box? If the hosts know exactly what every other host will predict, what happens to their zero-sum competition and their incentive to coordinate with FDT?
If the hosts move first logically, then TDT will lead to the same outcomes as CDT, since it’s in each host’s interest to precommit to incentivising the human to pick their own box—once the host has precommitted to doing this, the incentive works regardless of what decision theory the human uses. In math terms, if x is the choice of which box to incentivize (with “incentivize your own box” being interpreted as “don’t place any money in any of the other boxes”), the human gets to choose a box f(x) on the basis of x, and the host gets to choose x=g(f) on the basis of the function f, which is known to the host since it is assumed to be superintelligent enough to simulate the human’s choices in hypothetical simulations. By definition, the host moving first in logical time would mean that g is chosen before f, and f is chosen on the basis of what’s in the human’s best interest given that the host will incentivize box g(f). But then the optimal strategy is for g to be a constant function.
Regarding $100 and $200, I think I missed the part where you said the human picks the box with the maximum amount of money—I was assuming he picked a random box.
Regarding the question of how to force all the incentives into one box, what about the following strategy: choose box 1 with probability 1 - (400 - x) epsilon, where x is the payoff of box 1. Then it is obviously in each host’s interest to predict box 1, since it has the largest probability of any box, but then it is also in each host’s interest to minimize 400 - x i.e. maximize x. This is true even though the hosts’ competition is zero-sum.
Regarding the question of how to force all the incentives into one box, what about the following strategy: choose box 1 with probability 1 - (400 - x) epsilon, where x is the payoff of box 1. Then it is obviously in each host’s interest to predict box 1, since it has the largest probability of any box, but then it is also in each host’s interest to minimize 400 - x i.e. maximize x. This is true even though the hosts’ competition is zero-sum.
If the hosts are all predicting box 1, why does it matter with what probability the human picks box 1? (If the hosts’ payoffs for all-predict-correctly and all-predict-incorrectly are different, then their game isn’t zero-sum.)
If the hosts move first logically, then TDT will lead to the same outcomes as CDT, since it’s in each host’s interest to precommit to incentivising the human to pick their own box
It’s in the hosts interests to do that if they think the player is CDT, but it’s not in their interests to commit to doing that. They don’t lose anything by retaining the ability to select a better strategy later after reading the players mind.
Yes they do. For simplicity suppose there are only two hosts, and suppose host A precommits to not putting money host B’s box, while host B makes no precommitments about how much money he will put in host A’s box. Then the human’s optimal strategy is “pick host A’s box with probability 1 - x epsilon, where x is the amount of money in host A’s box”. This incentivizes host B to maximize the amount in host A’s box (resulting in payoff ~101 for the human), but it would have been better for him if he had precommitted to do the same as A, since then by symmetry his box would have been picked half the time instead of 101 epsilon of the time.
The hosts aren’t competing with the human, only each other, so even if the hosts move first logically they have no reason or opportunity to try to dissuade the player from whatever they’d do otherwise. FDT is underdefined in zero-sum symmetrical strategy games against psychological twins, since it can foresee a draw no matter what, but choosing optimal strategy to get to the draw still seems better than playing dumb strategies on purpose and then still drawing anyway.
Why do you think they should be $100 and $200? Maybe you could try simulating it?
What happens if FDT tries to force all the incentives into one box? If the hosts know exactly what every other host will predict, what happens to their zero-sum competition and their incentive to coordinate with FDT?
If the hosts move first logically, then TDT will lead to the same outcomes as CDT, since it’s in each host’s interest to precommit to incentivising the human to pick their own box—once the host has precommitted to doing this, the incentive works regardless of what decision theory the human uses. In math terms, if x is the choice of which box to incentivize (with “incentivize your own box” being interpreted as “don’t place any money in any of the other boxes”), the human gets to choose a box f(x) on the basis of x, and the host gets to choose x=g(f) on the basis of the function f, which is known to the host since it is assumed to be superintelligent enough to simulate the human’s choices in hypothetical simulations. By definition, the host moving first in logical time would mean that g is chosen before f, and f is chosen on the basis of what’s in the human’s best interest given that the host will incentivize box g(f). But then the optimal strategy is for g to be a constant function.
Regarding $100 and $200, I think I missed the part where you said the human picks the box with the maximum amount of money—I was assuming he picked a random box.
Regarding the question of how to force all the incentives into one box, what about the following strategy: choose box 1 with probability 1 - (400 - x) epsilon, where x is the payoff of box 1. Then it is obviously in each host’s interest to predict box 1, since it has the largest probability of any box, but then it is also in each host’s interest to minimize 400 - x i.e. maximize x. This is true even though the hosts’ competition is zero-sum.
If the hosts are all predicting box 1, why does it matter with what probability the human picks box 1? (If the hosts’ payoffs for all-predict-correctly and all-predict-incorrectly are different, then their game isn’t zero-sum.)
Ah, you’re right. That makes more sense now.
It’s in the hosts interests to do that if they think the player is CDT, but it’s not in their interests to commit to doing that. They don’t lose anything by retaining the ability to select a better strategy later after reading the players mind.
Yes they do. For simplicity suppose there are only two hosts, and suppose host A precommits to not putting money host B’s box, while host B makes no precommitments about how much money he will put in host A’s box. Then the human’s optimal strategy is “pick host A’s box with probability 1 - x epsilon, where x is the amount of money in host A’s box”. This incentivizes host B to maximize the amount in host A’s box (resulting in payoff ~101 for the human), but it would have been better for him if he had precommitted to do the same as A, since then by symmetry his box would have been picked half the time instead of 101 epsilon of the time.