sid comments on The alignment problem from a deep learning perspective

sid 28 Oct 2022 15:25 UTC
3 points
0
It says that the first head predicts the next observation. Does this mean that that head is first predicting what action the network itself is going to make, and then predicting the state that will ensue after that action is taken?
(And I guess this means that the action is likely getting determined in the shared portion of the network—not in either of the heads, since they both use the action info—and that the second head would likely just be translating the model’s internal representation of the action to whatever output format is needed.)
- Richard_Ngo 28 Oct 2022 17:11 UTC
  3 points
  0
  Parent
  Good question. I imagine the first head mostly being trained on existing data (e.g. text, videos) but then when it comes to data gathered by the network itself, my default story is that it’d be trained to output predictions conditional on actions, so that it’s not duplicating the learning done by the action head. But this is all fairly speculative and either seems reasonable.