I’m pretty sure that maximizing the expectation of any proper scoring rule will do all of these exactly the same, except maybe the last section because G has nice chaining properties that I’m too lazy to check for other scoring rules.
Do you think this has implications for there being other perfectly good versions of Thompson sampling etc? Or is this limited in implications because the argmax makes things too simple?
Yeah, I think Thompson sampling is even more robust, but I don’t know much about the nice properties of Thompson sampling besides the density 0 exploration.
I’m pretty sure that maximizing the expectation of any proper scoring rule will do all of these exactly the same, except maybe the last section because G has nice chaining properties that I’m too lazy to check for other scoring rules.
Do you think this has implications for there being other perfectly good versions of Thompson sampling etc? Or is this limited in implications because the argmax makes things too simple?
Yeah, I think Thompson sampling is even more robust, but I don’t know much about the nice properties of Thompson sampling besides the density 0 exploration.