paulfchristiano comments on Iterated Distillation and Amplification

paulfchristiano 30 Nov 2018 22:20 UTC
LW: 4 AF: 2
AF
The goal of narrow reinforcement learning is to get something-like-human-level behavior using human-level oversight. Optimizing the human value function over short time horizons seems like a fine approach to me.
The difference with broad reinforcement learning is that you aren’t trying to evaluate actions you can’t understand by looking at the consequences you can observe.