EliasHasle comments on Avoiding Side Effects in Complex Environments

EliasHasle 26 Apr 2023 13:16 UTC
1 point
It seems like the method is sensitive to the ranges of the game reward and the auxiliary penalty. In real life, I suppose one would have to clamp the “game” reward to allow the impact penalty to dominate even when massive gains are foreseen from a big-impact course?