It seems worth mentioning than anything which involves enumerating over the space of possible actions, or policies, is often not tractable in practice (or, will be exploitable by adversarial enumeration)
So another desideratum may be “it’s easy to implement using sampling”. On this, normalizing by some sort of variance is probably best.
It seems worth mentioning than anything which involves enumerating over the space of possible actions, or policies, is often not tractable in practice (or, will be exploitable by adversarial enumeration)
So another desideratum may be “it’s easy to implement using sampling”. On this, normalizing by some sort of variance is probably best.