markov comments on AI Safety 101 : Reward Misspecification

markov 19 Oct 2023 8:40 UTC
3 points
0
Thanks for the feedback. I actually had an entire subsection in an earlier draft that covered Reward is not the optimization target. I decided to move it to the upcoming chapter 3 which covers optimization, goal misgen and inner alignment. I thought it would fit better as an intro section there since it ties the content back to the previous chapter, while also differentiating rewards from objectives. This flows well into differentiating which goals the system is actually pursuing.