paulfchristiano comments on Implementing CDT with optimal predictor systems

paulfchristiano 20 Dec 2015 20:42 UTC
LW: 2 AF: 1
AF
I find these theorem statements quite hard to follow. It looks like this is doing roughly the right thing and would be better to work with than reflective oracles (i.e. to the extent that there were new complications, those complications would represent real and important phenomena that were simply covered up by the simplifying assumptions). In that case I am quite interested in AIXI-with-optimal-predictors (though probably I’m most interested in a more accessible presentation, and I really feel like it should be possible to have a cleaner formalism).

As an example, it seems like this result might be of broad interest to computer scientists, since e.g. there is no polynomial time algorithm that finds an approximate Nash equilibria in general games, even apparently simple ones. An extremely natural open question is “so what should we actually expect to happen when extremely rational players play such games?” It looks like this approach may be able to give some kind of an answer via an appropriate notion of boundedly optimal, but it is pretty hard to understand whether this approach does give a satisfying answer, and if so what it is.

I think that the relevant take-away is in Theorem 1. My impression is that you are holding the strategy sets constant while effectively taking a limit over computational resources of the players (and potentially the difficulty of understanding the payoff matrix). Is that right? In that case, I expect convergence to the N.E. to only occur once the computational resources are exponential in the size of the game.

(Of course that would still be a big improvement over uncomputability, I’m just trying to understand what is going on.)
- Vanessa Kosoy 22 Dec 2015 8:02 UTC
  0 points
  AF Parent
  AIXI-with-optimal-predictors: I believe this is relatively straightforward. However, my plan for the next step was adapting these results to a decision rule based on logical counterfactuals in a way which produces metathreat equilibria.
  
  Bounded Nash equilibria: I don’t think the concept is entirely novel. I’ve seen some papers which discuss Nash-like equilibria with computational resource bounds, although the the area seems to remain largely unexplored. The particular setting I use here is not very relevant to what you’re suggesting since finding Nash equilibria is non-polynomial in the number of strategies whereas here I keep the number of strategies constant. Instead, the complexity comes from the dependence of the payoff tensor on the parameter sampled from $μ^{k}$ .
  
  Your description of Theorem 1 is more or less correct except there’s only a “payoff vector” here since this is a 1 player setting. The multiplayer setting is used in Corollary 2.
  
  Regarding dependence on game size, it is not as bad as exponential. The Lipton-Markakis-Mehta algorithm finds $ϵ$ -equilibria in time $O (n^{\frac{log n}{ϵ^{2}}})$