It’s not clear how well ‘increasing switching increases proximity to the ideal reward function’ generalizes beyond this problem. (And we probably want the robot to not run forever.)
It’s not clear how well ‘increasing switching increases proximity to the ideal reward function’ generalizes beyond this problem. (And we probably want the robot to not run forever.)