My understanding was that “this problem” constitutes randomly picking a single TDT agent, which would presumably also have been done in the simulation.
So that’s another variant—in that interpretation you’re correct that C-sim would hardly ever see the same source-code C-sim in its own instances of the problem. I think you are right here that the chance of winning rises to at least 55%; not sure yet if it’s possible to do any better.
EDIT. I have a strategy for your variant which gives almost 100% chance for TDT winning the prize. The trick is that instead of each agent having a favourite number it has a least-favourite or “unlucky number” selected in a balanced way from the set {1,2,...,10}. Again consider a construction like SHA-256(C-act), reduce modulo 10 and then add 1. Here’s how the strategy works:
If C-sim has the same unlucky number as C-act then
Pick the unlucky numbered box with probability 1 - epsilon.
Pick the others with equal probability epsilon / 9
Else
Pick the box with C-sim's unlucky number
End If
It’s quite easy to see that each C-act, if presented multiple instances of the problem with different C-sim codes, will pick its own unlucky-numbered box slightly less often than any of the others. So the money is always in the box with C-sim’s unlucky number. This gives C-act 9⁄10 + 1⁄10 x (1 - epsilon) or approx 100% chance of winning. CDT has exactly 100% chance of winning still, but the gap’s negligible.
EDIT 2. I just realised we can get TDT to win with close to 100% probability in your variant. I’ve amended above...
That’s still not quite the formulation of the problem I was considering, though it seems valid. Admittedly, your formulation is closer to the original idea since it does say “multiple simulations”, though I will note that the number of simulations has to be something like O(1/epsilon) for the difference to be noticeable.
My previous strategy was designed for a variant of the problem where Omega only simulates a single instance of the problem (and calculates the probabilities directly from the source code of C-sim).
Omega selects C-sim at random from some distribution over TDT full source-codes.
Then Omega selects C-sim-sim at random from the same distribution.
Then Omega calculates what will happen if it presents the problem to C-sim, but specifying the simulation’s full source code as C-sim-sim. Omega determines the probability of C-sim choosing each of the boxes, conditional on it having seen that fixed C-sim-sim.
Then Omega fills the box with lowest probability (of being chosen by C-sim) or uses the tie-break rule.
Finally Omega presents the real problem to C-act, but specifying the simulation’s full source code as C-sim.
What is the best strategy for TDT to play as C-act?
If that is the problem, then consider the following. It still uses the “unlucky number” construction from the set {1, 2, …, 10}. Each C-act will always choose its unlucky number with lowest probability, so the money is always in C-sim’s unlucky number box.
If C-sim has a different unlucky number from C-act then
Pick C-sim's unlucky number with probability 1 - epsilon
Pick C-act's unlucky number with probability 0
Pick each of the other boxes with probability epsilon / 8
Else
Pick the common unlucky number with probability 1/10 - epsilon
Pick each other box with probability 1/10 + epsilon / 9
End If
That looks like winning with probability 9⁄10 x (1 - epsilon) + 1⁄10 x (1/10 - epsilon) so close to 91%.
Is there a better strategy though?
P.S. We are getting some interesting behaviour here, with slight variations under the conditions for selecting C-sim and calculating its choice probabilities leading to very different best strategies (and different success probabilities such as 10%, 50%, 91% or close to 100%). Quite fascinating.
So that’s another variant—in that interpretation you’re correct that C-sim would hardly ever see the same source-code C-sim in its own instances of the problem. I think you are right here that the chance of winning rises to at least 55%; not sure yet if it’s possible to do any better.
EDIT. I have a strategy for your variant which gives almost 100% chance for TDT winning the prize. The trick is that instead of each agent having a favourite number it has a least-favourite or “unlucky number” selected in a balanced way from the set {1,2,...,10}. Again consider a construction like SHA-256(C-act), reduce modulo 10 and then add 1. Here’s how the strategy works:
If C-sim has the same unlucky number as C-act then
Else
End If
It’s quite easy to see that each C-act, if presented multiple instances of the problem with different C-sim codes, will pick its own unlucky-numbered box slightly less often than any of the others. So the money is always in the box with C-sim’s unlucky number. This gives C-act 9⁄10 + 1⁄10 x (1 - epsilon) or approx 100% chance of winning. CDT has exactly 100% chance of winning still, but the gap’s negligible.
EDIT 2. I just realised we can get TDT to win with close to 100% probability in your variant. I’ve amended above...
That’s still not quite the formulation of the problem I was considering, though it seems valid. Admittedly, your formulation is closer to the original idea since it does say “multiple simulations”, though I will note that the number of simulations has to be something like O(1/epsilon) for the difference to be noticeable.
My previous strategy was designed for a variant of the problem where Omega only simulates a single instance of the problem (and calculates the probabilities directly from the source code of C-sim).
Sorry I misunderstood you then.
Does your variant looks like this?
Omega selects C-sim at random from some distribution over TDT full source-codes.
Then Omega selects C-sim-sim at random from the same distribution.
Then Omega calculates what will happen if it presents the problem to C-sim, but specifying the simulation’s full source code as C-sim-sim. Omega determines the probability of C-sim choosing each of the boxes, conditional on it having seen that fixed C-sim-sim.
Then Omega fills the box with lowest probability (of being chosen by C-sim) or uses the tie-break rule.
Finally Omega presents the real problem to C-act, but specifying the simulation’s full source code as C-sim.
What is the best strategy for TDT to play as C-act?
If that is the problem, then consider the following. It still uses the “unlucky number” construction from the set {1, 2, …, 10}. Each C-act will always choose its unlucky number with lowest probability, so the money is always in C-sim’s unlucky number box.
If C-sim has a different unlucky number from C-act then
Else
End If
That looks like winning with probability 9⁄10 x (1 - epsilon) + 1⁄10 x (1/10 - epsilon) so close to 91%.
Is there a better strategy though?
P.S. We are getting some interesting behaviour here, with slight variations under the conditions for selecting C-sim and calculating its choice probabilities leading to very different best strategies (and different success probabilities such as 10%, 50%, 91% or close to 100%). Quite fascinating.
Yeah, that’s the problem I had in mind, and your “unlucky number” strategy definitely seems pretty solid in that case.