drnickbone comments on Sneaky Strategies for TDT

drnickbone 26 May 2012 6:19 UTC
0 points
To be clearer, the full “favourite number” proposal looks like this:

If Omega reveals C-sim then
```
   If C-sim = C-act then 

          Pick each box with probability 1/10

   Else

          Pick box 1 with probability 1

   End If
```
Else
```
   Pick favourite number from {1, 2} with probability 1
```
End If

That has a worst-case 10% probability of winning (in the case Omega simulated exactly you and you know that), a best-case 100% probability of winning (in the case Omega simulated anyone other than you and you know that) and a mid-case probability of 50% of winning (where you don’t know which agent Omega simulated).

I think that’s optimal for this problem where Omega is simulating a single fixed agent. I don’t see how adding epsilon to the favourite box (in the first sub-case) helps things—reduces the probability of winning in the worst case to slightly less than 10%.

Edit I think the confusion was with my note that “any other agent should pick Box 1 with certainty”. This was supposed to mean any other flavour of TDT that discovers it is not C-sim i.e. any other C-act. I’ll edit the OP slightly to make it clearer.
- lackofcheese 26 May 2012 7:11 UTC
  1 point
  Parent
  
  I think that’s optimal for this problem where Omega is simulating a single fixed agent. I don’t see how adding epsilon to the favourite box (in the first sub-case) helps things—reduces the probability of winning in the worst case to slightly less than 10%.
  
  Here’s the scenario I had in mind: C-act, with a favourite number of 2, is presented with the problem, and told that C-sim has a favourite number of 1 (and hence C-act != C-sim). In the simulation, C-sim was presented with the source code of C-sim-sim, who has a favourite number of 2. Your strategy would imply that C-sim and C-act both choose box 1 with probability 1, which means C-act misses out on the money as it turns out it’s in box 2.
  
  I think the confusion was with my note that “any other agent should pick Box 1 with certainty”. This was supposed to mean any other flavour of TDT that discovers it is not C-sim i.e. any other C-act.
  
  Ah, but the point of this problem is that you can’t discover whether you’re being simulated, no?
  - drnickbone 26 May 2012 7:19 UTC
    0 points
    Parent
    But in that case, the sim was not presented with the same problem as the actual agent (it saw different code presented in the problem description). Since Omega said it presented “this problem” to the simulation, it would have been lying to the actual agent.
    
    The assumption is that Omega doesn’t lie to actual participants in the problem, so when it presents “this problem” to the simulation, then the simulation also sees that the simulated code is C-sim.
    
    On the point re can you tell whether you’re simulated or not, this is the reason for looking at different variants of the problem where C-act has different levels of information about C-sim.
    - lackofcheese 26 May 2012 7:26 UTC
      0 points
      Parent
      Well, if that’s what was meant then your proposal is fine, but if you aren’t guaranteed that C-sim = C-sim-sim, then the epsilon helps a lot. My understanding was that “this problem” constitutes randomly picking a single TDT agent, which would presumably also have been done in the simulation.
      - drnickbone 26 May 2012 11:16 UTC
        1 point
        Parent
        
        My understanding was that “this problem” constitutes randomly picking a single TDT agent, which would presumably also have been done in the simulation.
        
        So that’s another variant—in that interpretation you’re correct that C-sim would hardly ever see the same source-code C-sim in its own instances of the problem. I think you are right here that the chance of winning rises to at least 55%; not sure yet if it’s possible to do any better.
        
        EDIT. I have a strategy for your variant which gives almost 100% chance for TDT winning the prize. The trick is that instead of each agent having a favourite number it has a least-favourite or “unlucky number” selected in a balanced way from the set {1,2,...,10}. Again consider a construction like SHA-256(C-act), reduce modulo 10 and then add 1. Here’s how the strategy works:
        
        If C-sim has the same unlucky number as C-act then
        
        Pick the unlucky numbered box with probability 1 - epsilon. Pick the others with equal probability epsilon / 9
        Else
        
        Pick the box with C-sim's unlucky number
        End If
        
        It’s quite easy to see that each C-act, if presented multiple instances of the problem with different C-sim codes, will pick its own unlucky-numbered box slightly less often than any of the others. So the money is always in the box with C-sim’s unlucky number. This gives C-act ⁹⁄₁₀ + ¹⁄₁₀ x (1 - epsilon) or approx 100% chance of winning. CDT has exactly 100% chance of winning still, but the gap’s negligible.
        
        EDIT 2. I just realised we can get TDT to win with close to 100% probability in your variant. I’ve amended above...
        lackofcheese 26 May 2012 23:32 UTC
        0 points
        Parent
        That’s still not quite the formulation of the problem I was considering, though it seems valid. Admittedly, your formulation is closer to the original idea since it does say “multiple simulations”, though I will note that the number of simulations has to be something like O(1/epsilon) for the difference to be noticeable.
        
        My previous strategy was designed for a variant of the problem where Omega only simulates a single instance of the problem (and calculates the probabilities directly from the source code of C-sim).
        drnickbone 27 May 2012 13:40 UTC
        1 point
        Parent
        Sorry I misunderstood you then.
        
        Does your variant looks like this?
        
        Omega selects C-sim at random from some distribution over TDT full source-codes.
        Then Omega selects C-sim-sim at random from the same distribution.
        Then Omega calculates what will happen if it presents the problem to C-sim, but specifying the simulation’s full source code as C-sim-sim. Omega determines the probability of C-sim choosing each of the boxes, conditional on it having seen that fixed C-sim-sim.
        Then Omega fills the box with lowest probability (of being chosen by C-sim) or uses the tie-break rule.
        Finally Omega presents the real problem to C-act, but specifying the simulation’s full source code as C-sim.
        
        What is the best strategy for TDT to play as C-act?
        
        If that is the problem, then consider the following. It still uses the “unlucky number” construction from the set {1, 2, …, 10}. Each C-act will always choose its unlucky number with lowest probability, so the money is always in C-sim’s unlucky number box.
        
        If C-sim has a different unlucky number from C-act then
        
        Pick C-sim's unlucky number with probability 1 - epsilon Pick C-act's unlucky number with probability 0 Pick each of the other boxes with probability epsilon / 8
        Else
        
        Pick the common unlucky number with probability 1/10 - epsilon Pick each other box with probability 1/10 + epsilon / 9
        End If
        
        That looks like winning with probability ⁹⁄₁₀ x (1 - epsilon) + ¹⁄₁₀ x (1/10 - epsilon) so close to 91%.
        
        Is there a better strategy though?
        
        P.S. We are getting some interesting behaviour here, with slight variations under the conditions for selecting C-sim and calculating its choice probabilities leading to very different best strategies (and different success probabilities such as 10%, 50%, 91% or close to 100%). Quite fascinating.
        lackofcheese 27 May 2012 14:38 UTC
        0 points
        Parent
        Yeah, that’s the problem I had in mind, and your “unlucky number” strategy definitely seems pretty solid in that case.