Sune comments on Geometric Exploration, Arithmetic Exploitation

Sune 24 Nov 2022 20:00 UTC
7 points
0
Super interesting series! Your second post actually gave me some insights on a problem that was a big part of my phd, but it will take me some time to think through. So here is a ~~simpler~~ unrelated comment.

Thompson sampling seems to be too perfectionistic to me: it wants to take an action that is optimal in some world, rather than one that is merely great in all worlds. For example, suppose you have a suitcase with a 6 digit lock. In each turn, you can make a guess at the correct combination. If you guess correctly, you get 10 utils, if you guess wrong you get 1 util. If you don’t guess but instead pass, you get 9 utils.

(The model is supposed to be greedy and ignore future advantages so information revealed from a wrong guess shouldn’t matter. If this bothers you, you can imaging that when you “pass”, you still get to make a guess at the correct combination, and you learn if it is correct, but you get 9 utils no matter if your guess is correct or not.)

Each possible combination is considered a hypothesis so each hypothesis is deterministic. Thompson sampling would never pass, because that is not the optimal thing to do for any particular hypothesis (although it is the optimal thing to do when you don’t know which hypothesis is true). Instead it would keep guessing combinations, because each combination is optimal is some hypothesis.

This is not an issue of AM vs. GM: the same thing happens in the plurality decision procedure when we only use AM. It is an issue about what we take argmax over. If we consider the combination to be due to randomness within a single hypothesis, the decision procedures will correctly choose to pass until the correct combination has been revealed.

Possibly related question: Is there any reason the AM-GM boundary is at the same level as where we take argmax in the definition of m? Or could we have an arithmetic expectation outside of argmax (hence outside of m) but inside the geometric expectation? Or even have a geometric expectation inside of argmax in m but possibly over the arithmetic expectation?
- Scott Garrabrant 24 Nov 2022 20:15 UTC
  8 points
  0
  Parent
  Yeah, the Thompson sampling and Nash bargaining are different in that the Thompson sampling proposal has two argmaxes, where as Nash bargaining only has one. There are really two things being brought it with Thompson sampling, and Plurality is what you get if you only add the inner argmax, and something like Nash bargaining is what you get if you only add the geometric part. There is no reason you have to add the two things at the same place. All I know is Thompson sampling has some pretty nice asymptotic guarantees.
  You could just Nash bargain between your hypotheses directly, but then you are dependent on where the 0 point is. One nice thing about Thompson sampling is that it gives you a semi-principled place to put the 0, because the inner argmax means we convert everything to probabilities.