Sure, but that goes for a randomly-chosen R too. For every possible R, there is a u value for which they get bad outcomes. It doesn’t get better by randomly choosing R.
The assumption is that R is chosen after uy . So for every uy the pair of policies gets a good expected utility. See the point on Bayesian algorithms in the conclusion for more on why “get a high expected utility regardless of uy ” might be a desirable goal.
How can it possibly matter whether R is chosen before or after uy? R is completely independent of u, right? It’s not a covert communication mechanism about the players’ observations, it’s a random value.
If uy is chosen after R then it might be chosen to depend on R in such a way that the algorithm gets bad performance, e.g. using the method in the proof of Claim 1.
Based on other comments, I realize I’m making an assumption for something you haven’t specified. How is uy chosen? If it’s random and independent, then my assertion holds, if it’s selected by an adversary who knows the players’ full strategies somehow, then R is just a way of keeping a secret from the adversary—sequence doesn’t matter, but knowledge does.
uy and R are independently chosen from well-defined distributions. Regardless of sequence, neither knows the other and CANNOT be chosen based on the other. I’ll see if I can find time tonight to figure out whether I’m saying your claim 1 is wrong (it dropped epsilon too soon from the floor value, but I’m not sure if it’s more fundamentally problematic than that) or that your claim 2 is misleading.
My current expectation is that I’ll find that your claim 2 results are available in situation 1, by using your given function with a pre-agreed value rather than a random one.
True, they will fail to cooperate for some R, but the values of such R have a low probability. (But yeah, it’s also required that uy and R are chosen independently—otherwise an adversary could just choose either so that it results in the players choosing different actions.)
The smoothness comes in from marginalising a random R. The coordination comes from making R and ε common knowledge, so they cooperate using the correlation in their observations—an interesting phenomenon.
ϵ is the same for both players but V1 and V2 (the players’ observations of uy ) are different, both sampled independently uniformly from [uy−ϵ,uy+ϵ] .
If they decide on R=0.5 then there exists some uy value for which they get a bad expected utility (see Claim 1).
Sure, but that goes for a randomly-chosen R too. For every possible R, there is a u value for which they get bad outcomes. It doesn’t get better by randomly choosing R.
The assumption is that R is chosen after uy . So for every uy the pair of policies gets a good expected utility. See the point on Bayesian algorithms in the conclusion for more on why “get a high expected utility regardless of uy ” might be a desirable goal.
How can it possibly matter whether R is chosen before or after uy? R is completely independent of u, right? It’s not a covert communication mechanism about the players’ observations, it’s a random value.
If uy is chosen after R then it might be chosen to depend on R in such a way that the algorithm gets bad performance, e.g. using the method in the proof of Claim 1.
Based on other comments, I realize I’m making an assumption for something you haven’t specified. How is uy chosen? If it’s random and independent, then my assertion holds, if it’s selected by an adversary who knows the players’ full strategies somehow, then R is just a way of keeping a secret from the adversary—sequence doesn’t matter, but knowledge does.
Claim 1 says there exists some uy value for which the algorithm gets high regret, so we might as well assume it’s chosen to maximize regret.
Claim 2 says the algorithm has low regret regrardless of uy , so we might as well assume it’s chosen to maximize regret.
uy and R are independently chosen from well-defined distributions. Regardless of sequence, neither knows the other and CANNOT be chosen based on the other. I’ll see if I can find time tonight to figure out whether I’m saying your claim 1 is wrong (it dropped epsilon too soon from the floor value, but I’m not sure if it’s more fundamentally problematic than that) or that your claim 2 is misleading.
My current expectation is that I’ll find that your claim 2 results are available in situation 1, by using your given function with a pre-agreed value rather than a random one.
The theorems are of the form “For all uy, you get good outcomes” or “There exists a uy that causes bad outcomes”.
When you want to prove statements of this form, uy is chosen adversarially, so it matters whether it is chosen before or after R.
What distribution is uy chosen from? That’s not specified anywhere in the post.
True, they will fail to cooperate for some R, but the values of such R have a low probability. (But yeah, it’s also required that uy and R are chosen independently—otherwise an adversary could just choose either so that it results in the players choosing different actions.)
The smoothness comes in from marginalising a random R. The coordination comes from making R and ε common knowledge, so they cooperate using the correlation in their observations—an interesting phenomenon.
(How can I write LaTeX in the comments?)
ctrl-4