paulfchristiano comments on Does Solomonoff always win?

paulfchristiano 25 Feb 2011 1:05 UTC
2 points
Suppose we are playing any game in which we repeatedly choose an option from some potentially infinite class and then receive a bounded payoff (dependent only on my choice in this round). Then I can do as well as the best computable strategy in expectation (to within a constant) by using multiplicative weights over the space of computable strategies (I’ve mentioned this before, but didn’t realize the difference between these set-ups at the time). Note that this strategy may be randomized, if you only allow me to bet all-or-nothing (or more generally if the space of strategies isn’t convex in the suitable sense).

This lets you win probabilistically at game 3, but its not immediately clear whether it lets you win at games 1 and 2, since the payoff is not bounded. I suspect it wins at games 1 and 2, but not very strongly.

Edit: Why should we restrict attention to deterministic strategies? It is immediately clear that no deterministic strategy can outperform all other deterministic strategies, but it seems like compelling evidence in Solomonoff induction’s favor if it outperforms any randomized computable strategy.
- cousin_it 25 Feb 2011 7:50 UTC
  1 point
  Parent
  Five minutes’ thought allowed me to prove the following stupid “theorem”:
  
  Consider any game (haha). The only restriction is that the game must never make total wins go below zero, as in my game 2. Then there’s a general-purpose winning agent: choose one strategy at the outset, sampled from the space of all computable strategies according to some distribution, and then follow it for all eternity. Obviously, this agent’s expected accumulated utilities at all times cannot be worse than any individual strategy by more than a multiplicative constant, which is equal to that strategy’s weight in the initial distribution.
  
  Perhaps this result is easy in retrospect. Now I’d like to know what happens if utility can become negative (taking the exponent doesn’t seem to work), and also how to improve the agent because it looks kinda stupid (even though it solves game 2 about as well as Solomonoff does). Sorry if this all sounds obvious, I’ve only been studying the topic for several days.
- cousin_it 25 Feb 2011 9:18 UTC
  0 points
  Parent
  Sorry for the many replies, your comment continues to intrigue me.
  
  Game 2 is different from games 1 and 3 because the available options/payoffs in game 2 also depend on your current utility or previous betting history—you cannot bet more than you’ve won so far. In general, if such things are allowed, there’s no predictor that can win all games up to an additive or multiplicative constant, even when payoffs are bounded. Here’s a game that shows why: on round 1 the player chooses x=1 or x=-1, then the universe chooses y=1 or y=-1, then the game effectively ends and the player receives utility x*y on every round thereafter. Whatever mixture of x=1 and x=-1 is chosen by your proposed predictor on the first round, the universe can choose y so as to make you go into the negatives or 0 in expectation, yet there exists a computable human that receives linearly increasing utility in that universe.
  
  Note that my reasoning in the previous comment solves game 2 just fine, but it doesn’t work for game 1 or game 3 because they can go into negative utilities. Maybe we’re facing different classes of game that call for different solutions? Right now it seems we have a general theorem for game 1 (log score—regular Solomonoff induction), another general theorem for games of type 2 (positive utilities—my other comment), and a third general theorem for games of type 3 (bounded payoffs—your claimed result). We ought to unify some of them.
- cousin_it 25 Feb 2011 6:04 UTC
  0 points
  Parent
  Good idea. So you’re claiming that some sort of “randomized Solomonoff induction” can win all games, including uncomputable ones, that have bounded payoffs per step (and possibly all other games too) against all computable agents who are also allowed to randomize? If true, this looks like a new and important result. We need to pin it down formally.