Sure, other things equal. But other things aren’t necessarily equal. For example, regularization could stack the deck in favor of one policy over another, even if the latter has been systematically producing slightly higher reward. There are lots of things like that; the details depend on the exact RL algorithm. In the context of brains, I have discussion and examples in §9.3.3 here.
Sure, other things equal. But other things aren’t necessarily equal. For example, regularization could stack the deck in favor of one policy over another, even if the latter has been systematically producing slightly higher reward. There are lots of things like that; the details depend on the exact RL algorithm. In the context of brains, I have discussion and examples in §9.3.3 here.