Deterministic Strategies Can Be Suboptimal

Robert KennedyMar 4, 2022, 9:09 PM

4 points

Yesterday I posted the following problem

You are playing a game against ROB, the robber. In this game, ROB will choose two arbitrary (real/computable) numbers, $A$ and $B$ . ROB will send both numbers to TABI the arbiter, who will secretly flip a fair coin and then hand you either $A$ or $B$ depending on the outcome of the flip. Your task is to come up with an algorithm which does better than 50% accuracy to determine if you have the larger number.

I will now insert line breaks, then some hints, then more line breaks, then the solution. Edit: This sentence is meant to remind you that you can only discover the answer once, and it is sweet victory to solve puzzles.

Note that ROB is obviously not uniformly choosing any two arbitrary real numbers—that is incoherent! Much worse, ROB is stealing your algorithm and choosing his numbers from a probability distribution which minimizes your changes of winning.

You can probably solve this if you restrict ROB to only picking two of ${0, 1, 2}$ with different probability distributions, and then extrapolate.

The solution is to choose any strictly increasing function $P$ whose range is (a subset of) $(0, 1)$ , and guess that your number $X$ is the larger number with probability $P (X)$ . For any given pair $A < B$ selected by ROB, you will guess “larger” correctly with probability $\frac{P (B)}{2}$ , and you will guess “larger” incorrectly with probability $\frac{P (A)}{2}$ ; since $P$ is increasing, it is clear you will guess “larger” correctly more often than you guess “larger” incorrectly. I hope the remaining details are clear.

Shout-out to jacob.wood who posted a solution first, and dxu who posted the entire solution first.

This is naively surprising to those who know that strategies should often not be probabilistic, even when the games you are playing are. Yudkowsky’s newest novel demonstrates other areas where probabilistic strategies dominate. I will cover an example in my next post, as well as illustrate real world use cases where I have immediately applied the lesson.

The link to Yudkowsky’s newest novel is here: https://www.glowfic.com/posts/4582

What links here?

Robert KennedyMar 4, 2022, 9:09 PM

4 points

11 comments1 min readLW link

World Modeling

Dagon Mar 5, 2022, 4:46 PM
4 points
0

My understanding is this generalizes to using a deterministic strategy when your “opponent” is non-strategic. Probabilistic approaches are mostly reserved for adversarial conditions, to prevent exploitation if your opponent can model your behavior.
Jonnston Mar 5, 2022, 12:14 AM
4 points
0

I want to register my conviction that ROB can mitigate effectiveness by keeping you arbitrialy close to 50% success rate by selecting numbers arbitrarily close to one another.
- Tapatakt Mar 5, 2022, 4:53 PM
  1 point
  0
  Parent
  
  Yes, or just arbitrarily big numbers.
Oscar_Cunningham Mar 5, 2022, 2:39 AM
2 points
0

Note that Eliezer believes the opposite: https://www.lesswrong.com/posts/GYuKqAL95eaWTDje5/worse-than-random.
- dxu Mar 5, 2022, 3:08 AM
  5 points
  0
  Parent
  
  From near the end of the post you linked:
  As a general principle, on any problem for which you know that a particular unrandomized algorithm is unusually stupid—so that a randomized algorithm seems wiser—you should be able to use the same knowledge to produce a superior derandomized algorithm. If nothing else seems obvious, just avoid outputs that look “unrandomized”! If you know something is stupid, deliberately avoid it! (There are exceptions to this rule, but they have to do with defeating cryptographic adversaries—that is, preventing someone else’s intelligence from working on you. Certainly entropy can act as an antidote to intelligence!)
  In the case of Robert’s problem, since ROB is specified to have access to your algorithm, it effectively “moves second”, which does indeed place it in a position to “use its intelligence on you”. So it should be unsurprising that a mixed strategy can beat out a pure strategy. Indeed, this is why mixed strategies are sometimes optimal in game-theoretic situations: when adversaries are involved, being unpredictable has its advantages.
  Of course, this does presume that you have access to a source of randomness that even ROB cannot predict. If you do not have access to such a source, then you are in fact screwed. But then the problem becomes inherently unfair, and thus of less interest.
  (From a certain perspective, even this is compatible with Eliezer’s main message: for an adversary to be looking over your shoulder, “moving second” relative to you, and anti-optimizing your objective function, is indeed a special case of things being “worse than random”. The problem is that producing a “superior derandomized algorithm” in such a case requires inverting the “move order” of yourself and your adversary, which is not possible in many scenarios.)
  - Oscar_Cunningham Mar 5, 2022, 3:24 AM
    2 points
    0
    Parent
    
    The original problem doesn’t say that ROB has access to your algorithm, or that ROB wants you to lose.
    - Tapatakt Mar 5, 2022, 5:02 PM
      1 point
      0
      Parent
      
      In such problems, it is usually assumed that your solution have to work (in this case work = better than 50% accuracy) always, even in the worst case, when all unknowns are against you.
      - Oscar_Cunningham Mar 5, 2022, 5:18 PM
        6 points
        0
        Parent
        
        https://www.lesswrong.com/posts/AAqTP6Q5aeWnoAYr4/?commentId=WJ5hegYjp98C4hcRt
        
        I don’t dispute what you say. I just suggest that the confusing term “in the worst case” be replaced by the more accurate phrase “supposing that the environment is an adversarial superintelligence who can perfectly read all of your mind except bits designated ‘random’”.
Olomana Mar 5, 2022, 7:26 AM
1 point
0

The problem statement says “arbitrary real numbers”, so the domain of your function P is -infinity to +infinity. P represents a probability distribution, so the area under the curve is equal to 1. P is strictly increasing, so… I’m having trouble visualizing a function that meets all these conditions.
You say “any” such function… perhaps you could give just one example.
- Oscar_Cunningham Mar 5, 2022, 9:34 AM
  2 points
  0
  Parent
  
  In this case P is the cumulative distribution function, so it has to approach 1 at infinity, rather than the area under the curve being 1. An example would be 1/(1+exp(-x)).
  - Robert Kennedy Mar 6, 2022, 9:02 PM
    3 points
    0
    Parent
    
    Actually, for any given P which works, P’(x)=P(x)/10 is also a valid algorithm.