Maximizing the sum of the difference of state value just maximizes state value again, which the point of narrow reinforcement learning was to get away from.
The goal of narrow reinforcement learning is to get something-like-human-level behavior using human-level oversight. Optimizing the human value function over short time horizons seems like a fine approach to me.
The difference with broad reinforcement learning is that you aren’t trying to evaluate actions you can’t understand by looking at the consequences you can observe.
Maximizing the sum of the difference of state value just maximizes state value again, which the point of narrow reinforcement learning was to get away from.
The goal of narrow reinforcement learning is to get something-like-human-level behavior using human-level oversight. Optimizing the human value function over short time horizons seems like a fine approach to me.
The difference with broad reinforcement learning is that you aren’t trying to evaluate actions you can’t understand by looking at the consequences you can observe.