Vanessa Kosoy comments on Formalising decision theory is hard

Vanessa Kosoy 16 Sep 2019 8:23 UTC
LW: 2 AF: 1
AF

I don’t currently see why we shouldn’t ask to converge to pareto optima. Obviously, we can’t expect to do so with arbitrary other agents; but it doesn’t seem unreasonable to use an algorithm which has the property of reaching pareto-optima with other agents who use that same algorithm.

I had an interesting thought that made me update towards your position. There is my old post about “metathreat equilibria”, an idea I developed with Scott’s help during my trip to Los Angeles. So, I just realized the same principle can be realized in the setting of repeated games. In particular, I am rather confident that the following, if formulated a little more rigorously, is a theorem:

Consider the Iterated Prisoner’s Dilemma in which strategies are constrained to depend only on the action of the opponent in the previous round. Then, at the limit $γ \to 1$ (shallow time discount), the only thermodynamic equilibrium is mutual cooperation.

Then, there is the question of how to generalize it. Obviously we want to consider more general games and more general strategies, but allowing fully general strategies probably won’t work (just like allowing arbitrary programs doesn’t work in the “self-modification” setting). One obvious generalization is allowing the strategy to depend on some finite suffix of the history. But, this isn’t natural in the context of agent theory: why would agents forget everything that happened before a certain time? Instead, we can constraint the strategy to be finite-state, and maybe require it to be communicating (i.e. forbid “grim triggers” where players change their behavior forever). On the game side, we can consider arbitrary repeated games, or communicating stochastic games (they have to be communicating because otherwise we can represent one-shot games), or even communicating partially observable stochastic games. This leads me to the following bold conjecture:

Consider any suitable (as above) game in which strategies are constrained to be (communicating?) finite-state. Then, at the limit $γ \to 1$ , all thermodynamic equilibria are Pareto efficient.

This would be the sort of result that seems like it should be applicable to learning agents. That is, if the conjecture is true, there is good hope that appropriate learning agents are guaranteed to converge to Pareto efficient outcomes. Even if it the conjecture requires some moderately strong additional assumptions, it seems worth studying.
What links here?