Assume that each agent has his own game (that is one game for each agent). That is there are overall 18 (or 2) games (depending the result of the coin flip.)
Then the first calculation would be correct in every respect, and it makes sense to say yes from a global point of view. (And also with any other reward matrix, the dynamic update would be consistent with the apriori decision all the time)
This shows that the error made by the agent was to implicitely assume that he has his own game.
Assume that each agent has his own game (that is one game for each agent). That is there are overall 18 (or 2) games (depending the result of the coin flip.)
Then the first calculation would be correct in every respect, and it makes sense to say yes from a global point of view. (And also with any other reward matrix, the dynamic update would be consistent with the apriori decision all the time)
This shows that the error made by the agent was to implicitely assume that he has his own game.