I’m missing something (and I haven’t digested the math, so maybe it’s obvious but just missing from the narrative description). Is epsilon the same for both players, in that they see the same V, it just may not exactly match u? or is it different for each player, meaning for the same u, they have different V? From your analysis (risk of 0), it sounds like the latter.
In that case, I don’t see how additional shared knowledge helps coordinate them, nor why it needs to be random rather than just a fixed value they agree on in advance. And certainly not why it matters if the additional random shared value is generated before or after the game starts.
If they don’t have this additional source of shared randomness, can they just decide in their pre-game discussion to use R=0.5? Why or why not?
Sure, but that goes for a randomly-chosen R too. For every possible R, there is a u value for which they get bad outcomes. It doesn’t get better by randomly choosing R.
The assumption is that R is chosen after uy . So for every uy the pair of policies gets a good expected utility. See the point on Bayesian algorithms in the conclusion for more on why “get a high expected utility regardless of uy ” might be a desirable goal.
How can it possibly matter whether R is chosen before or after uy? R is completely independent of u, right? It’s not a covert communication mechanism about the players’ observations, it’s a random value.
If uy is chosen after R then it might be chosen to depend on R in such a way that the algorithm gets bad performance, e.g. using the method in the proof of Claim 1.
Based on other comments, I realize I’m making an assumption for something you haven’t specified. How is uy chosen? If it’s random and independent, then my assertion holds, if it’s selected by an adversary who knows the players’ full strategies somehow, then R is just a way of keeping a secret from the adversary—sequence doesn’t matter, but knowledge does.
uy and R are independently chosen from well-defined distributions. Regardless of sequence, neither knows the other and CANNOT be chosen based on the other. I’ll see if I can find time tonight to figure out whether I’m saying your claim 1 is wrong (it dropped epsilon too soon from the floor value, but I’m not sure if it’s more fundamentally problematic than that) or that your claim 2 is misleading.
My current expectation is that I’ll find that your claim 2 results are available in situation 1, by using your given function with a pre-agreed value rather than a random one.
True, they will fail to cooperate for some R, but the values of such R have a low probability. (But yeah, it’s also required that uy and R are chosen independently—otherwise an adversary could just choose either so that it results in the players choosing different actions.)
The smoothness comes in from marginalising a random R. The coordination comes from making R and ε common knowledge, so they cooperate using the correlation in their observations—an interesting phenomenon.
If you assume a fixed probability distribution over possible $u_y$ that both players know when coordinating, then they can set up the rules they choose to make sure that they probably win. The extra random information is only useful because of the implicit “for all $u_y$”. If some malicious person had overheard their strategy, and was allowed to choose $u_y$, but didn’t have access to the random number source, then the random numbers are useful.
I’m missing something (and I haven’t digested the math, so maybe it’s obvious but just missing from the narrative description). Is epsilon the same for both players, in that they see the same V, it just may not exactly match u? or is it different for each player, meaning for the same u, they have different V? From your analysis (risk of 0), it sounds like the latter.
In that case, I don’t see how additional shared knowledge helps coordinate them, nor why it needs to be random rather than just a fixed value they agree on in advance. And certainly not why it matters if the additional random shared value is generated before or after the game starts.
If they don’t have this additional source of shared randomness, can they just decide in their pre-game discussion to use R=0.5? Why or why not?
ϵ is the same for both players but V1 and V2 (the players’ observations of uy ) are different, both sampled independently uniformly from [uy−ϵ,uy+ϵ] .
If they decide on R=0.5 then there exists some uy value for which they get a bad expected utility (see Claim 1).
Sure, but that goes for a randomly-chosen R too. For every possible R, there is a u value for which they get bad outcomes. It doesn’t get better by randomly choosing R.
The assumption is that R is chosen after uy . So for every uy the pair of policies gets a good expected utility. See the point on Bayesian algorithms in the conclusion for more on why “get a high expected utility regardless of uy ” might be a desirable goal.
How can it possibly matter whether R is chosen before or after uy? R is completely independent of u, right? It’s not a covert communication mechanism about the players’ observations, it’s a random value.
If uy is chosen after R then it might be chosen to depend on R in such a way that the algorithm gets bad performance, e.g. using the method in the proof of Claim 1.
Based on other comments, I realize I’m making an assumption for something you haven’t specified. How is uy chosen? If it’s random and independent, then my assertion holds, if it’s selected by an adversary who knows the players’ full strategies somehow, then R is just a way of keeping a secret from the adversary—sequence doesn’t matter, but knowledge does.
Claim 1 says there exists some uy value for which the algorithm gets high regret, so we might as well assume it’s chosen to maximize regret.
Claim 2 says the algorithm has low regret regrardless of uy , so we might as well assume it’s chosen to maximize regret.
uy and R are independently chosen from well-defined distributions. Regardless of sequence, neither knows the other and CANNOT be chosen based on the other. I’ll see if I can find time tonight to figure out whether I’m saying your claim 1 is wrong (it dropped epsilon too soon from the floor value, but I’m not sure if it’s more fundamentally problematic than that) or that your claim 2 is misleading.
My current expectation is that I’ll find that your claim 2 results are available in situation 1, by using your given function with a pre-agreed value rather than a random one.
The theorems are of the form “For all uy, you get good outcomes” or “There exists a uy that causes bad outcomes”.
When you want to prove statements of this form, uy is chosen adversarially, so it matters whether it is chosen before or after R.
What distribution is uy chosen from? That’s not specified anywhere in the post.
True, they will fail to cooperate for some R, but the values of such R have a low probability. (But yeah, it’s also required that uy and R are chosen independently—otherwise an adversary could just choose either so that it results in the players choosing different actions.)
The smoothness comes in from marginalising a random R. The coordination comes from making R and ε common knowledge, so they cooperate using the correlation in their observations—an interesting phenomenon.
(How can I write LaTeX in the comments?)
ctrl-4
If you assume a fixed probability distribution over possible $u_y$ that both players know when coordinating, then they can set up the rules they choose to make sure that they probably win. The extra random information is only useful because of the implicit “for all $u_y$”. If some malicious person had overheard their strategy, and was allowed to choose $u_y$, but didn’t have access to the random number source, then the random numbers are useful.