I’ve found the post “Reward is not the optimization target” quite confusing. This post cleared the concept up for me. Especially the selection framing and example. Thank you!
I’ve found the post “Reward is not the optimization target” quite confusing. This post cleared the concept up for me. Especially the selection framing and example. Thank you!