Richard_Ngo comments on The alignment problem from a deep learning perspective

Richard_Ngo 28 Oct 2022 17:11 UTC
3 points
0
Good question. I imagine the first head mostly being trained on existing data (e.g. text, videos) but then when it comes to data gathered by the network itself, my default story is that it’d be trained to output predictions conditional on actions, so that it’s not duplicating the learning done by the action head. But this is all fairly speculative and either seems reasonable.