Yes, but the idea (I think!) is that you can recover the policy from just the beliefs (on the presumption of CDT EU maxxing). Saying that A does xyz because B is going to do abc is one thing; it builds in some of the fixpoint finding. The common knowledge of beliefs instead says: A does xyz because he believes “B believes that A will do xyz, and therefore B will do abc as the best response”; so A chooses xyz because it’s the best response to abc.
But that’s just one step. Instead you could keep going:
--> A believes that
----> B believes that
------> A believes that
--------> B believes that A will do xyz,
--------> and therefore B will do abc as the best response
------> and therefore A will do xyz as the best response
----> and therefore B will do abc as the best response
so A does xyz as the best response.
And then you go to infinityyyy.
Being able to deduce a policy from beliefs doesn’t mean that common knowledge of beliefs is required.
The common knowledge of policy thing is true but is external to the game. We don’t assume that players in prisoner’s dilemma know each others policies. As part of our analysis of the structure of the game, we might imagine that in practice some sort of iterative responding-to-each-other’s-policy thing will go on, perhaps because players face off regularly (but myopically), and so the policies selected will be optimal wrt each other. But this isn’t really a part of the game, it’s just part of our analysis. And we can analyse games in various different ways e.g. by considering different equilibrium concepts.
In any case it doesn’t mean that an agent in reality in a prisoner’s dilemma has a crystal ball telling them the other’s policy.
Certainly it’s natural to consider the case where the agents are used to playing against each other so the have the chance to learn and react to each other’s policies. But a case where they each learn each other’s beliefs doesn’t feel that natural to me—might as well go full OSGT at that point.
Being able to deduce a policy from beliefs doesn’t mean that common knowledge of beliefs is required.
Sure, I didn’t say it was. I’m saying it’s sufficient (given some assumptions), which is interesting.
In any case it doesn’t mean that an agent in reality in a prisoner’s dilemma has a crystal ball telling them the other’s policy.
Sure, who’s saying so?
But a case where they each learn each other’s beliefs doesn’t feel that natural to me
It’s analyzed this way in the literature, and I think it’s kind of natural; how else would you make the game be genuinely perfect information (in the intuitive sense), including the other agent, without just picking a policy?
Yes, but the idea (I think!) is that you can recover the policy from just the beliefs (on the presumption of CDT EU maxxing). Saying that A does xyz because B is going to do abc is one thing; it builds in some of the fixpoint finding. The common knowledge of beliefs instead says: A does xyz because he believes “B believes that A will do xyz, and therefore B will do abc as the best response”; so A chooses xyz because it’s the best response to abc.
But that’s just one step. Instead you could keep going:
--> A believes that
----> B believes that
------> A believes that
--------> B believes that A will do xyz,
--------> and therefore B will do abc as the best response
------> and therefore A will do xyz as the best response
----> and therefore B will do abc as the best response
so A does xyz as the best response. And then you go to infinityyyy.
Being able to deduce a policy from beliefs doesn’t mean that common knowledge of beliefs is required.
The common knowledge of policy thing is true but is external to the game. We don’t assume that players in prisoner’s dilemma know each others policies. As part of our analysis of the structure of the game, we might imagine that in practice some sort of iterative responding-to-each-other’s-policy thing will go on, perhaps because players face off regularly (but myopically), and so the policies selected will be optimal wrt each other. But this isn’t really a part of the game, it’s just part of our analysis. And we can analyse games in various different ways e.g. by considering different equilibrium concepts.
In any case it doesn’t mean that an agent in reality in a prisoner’s dilemma has a crystal ball telling them the other’s policy.
Certainly it’s natural to consider the case where the agents are used to playing against each other so the have the chance to learn and react to each other’s policies. But a case where they each learn each other’s beliefs doesn’t feel that natural to me—might as well go full OSGT at that point.
Sure, I didn’t say it was. I’m saying it’s sufficient (given some assumptions), which is interesting.
Sure, who’s saying so?
It’s analyzed this way in the literature, and I think it’s kind of natural; how else would you make the game be genuinely perfect information (in the intuitive sense), including the other agent, without just picking a policy?