It’s not clear to me why you think the concept of reward functions “breaks down” when applied to more complicated environments. I think maybe you mean to ask for something else.
It’s not clear to me why you think the concept of reward functions “breaks down” when applied to more complicated environments. I think maybe you mean to ask for something else.