The model has capacity to learn to do this, but can’t just leverage existing capabilities in the foundation model, as the performance of that model is limited to that of the best humans it saw in the self-supervised training data. So, we need to do RL for many more time steps.
It’s unclear to me if this is true because modelling the humans that generated the training data sufficiently well probably requires the model to be smarter than the humans it is modelling. So I expect the current regime where RLHF just elicits a particular persona from the model rather than teaching any new abilities to be sufficient to reach superhuman capabilities.
It’s unclear to me if this is true because modelling the humans that generated the training data sufficiently well probably requires the model to be smarter than the humans it is modelling. So I expect the current regime where RLHF just elicits a particular persona from the model rather than teaching any new abilities to be sufficient to reach superhuman capabilities.