Good question. I imagine the first head mostly being trained on existing data (e.g. text, videos) but then when it comes to data gathered by the network itself, my default story is that it’d be trained to output predictions conditional on actions, so that it’s not duplicating the learning done by the action head. But this is all fairly speculative and either seems reasonable.
Good question. I imagine the first head mostly being trained on existing data (e.g. text, videos) but then when it comes to data gathered by the network itself, my default story is that it’d be trained to output predictions conditional on actions, so that it’s not duplicating the learning done by the action head. But this is all fairly speculative and either seems reasonable.