It seems to me that the main goal of quantilization is to reduce the extreme unintended outcomes of maximizing (by sampling from something like a human-learned distribution over actions) while still remaining competitive (by sampling from only the upper quantile of said distribution).
But that still leaves open the possibility of sampling one of those super highly maximized actions! Why not just hard-cut off the upper portion of the distribution as well? Why not have a quantilizer that samples between the 90%ile and the 99%ile?
Or, while we’re at it, why are we sampling at all? Why not just say that we’re taking the 99%ile action?
[Question] Why don’t quantilizers also cut off the upper end of the distribution?
It seems to me that the main goal of quantilization is to reduce the extreme unintended outcomes of maximizing (by sampling from something like a human-learned distribution over actions) while still remaining competitive (by sampling from only the upper quantile of said distribution).
But that still leaves open the possibility of sampling one of those super highly maximized actions! Why not just hard-cut off the upper portion of the distribution as well? Why not have a quantilizer that samples between the 90%ile and the 99%ile?
Or, while we’re at it, why are we sampling at all? Why not just say that we’re taking the 99%ile action?