DragonGod comments on Models Don’t “Get Reward”

DragonGod 7 Jan 2023 15:22 UTC
3 points
−1
It doesn’t have to be a preference ordering. My point was that depending on the level of detail at which you consider the reward function slightly different functions could be identical.

I don’t think it makes sense to tie a reward function to a piece of code; a function can have multiple implementations.

My contention is that if seems possible for the model’s objective function to be identical (at the level of detail we care about) to the reward function. In that case, I think the model is indistinguishable from a reward maximiser and it doesn’t make sense to say that it’s not a reward maximiser.