It seems like the method is sensitive to the ranges of the game reward and the auxiliary penalty. In real life, I suppose one would have to clamp the “game” reward to allow the impact penalty to dominate even when massive gains are foreseen from a big-impact course?
It seems like the method is sensitive to the ranges of the game reward and the auxiliary penalty. In real life, I suppose one would have to clamp the “game” reward to allow the impact penalty to dominate even when massive gains are foreseen from a big-impact course?