Cole Wyeth comments on Models Don’t “Get Reward”

Cole Wyeth 11 Mar 2023 20:14 UTC
1 point
0
This is a useful post that helped me finally understand the importance of inner alignment. However, as the author is careful to note by discussing “vanilla” RL, there are settings in which the model directly sees the rewards, such as history-based RL, which is the framework chosen for Professor Hutter’s development of AIXI. In my mind this is the “true” RL setting in the sense that it most closely reflects the problem humans face in the real world, and it is what I personally usually mean when I think of the RL framework, so I believe this is worth clarifying. Thanks!