I definitely agree that some version of this is the crux, at least on how well we can generalize the result, since I think it does more generally apply than just contemporary language models, and I suspect it applies to almost all AI that can use Pretraining from Human Feedback, which is offline training, so the crux is really how much can we expect a alignment technique to generalize and scale
I definitely agree that some version of this is the crux, at least on how well we can generalize the result, since I think it does more generally apply than just contemporary language models, and I suspect it applies to almost all AI that can use Pretraining from Human Feedback, which is offline training, so the crux is really how much can we expect a alignment technique to generalize and scale