kave comments on instruction tuning and autoregressive distribution shift

kave 5 Sep 2024 20:35 UTC
4 points
0
Is the central argumentative line of this post that high-quality & informative text in the training distribution rarely corrects itself, post-training locates the high-quality part of the distribution, and so LLMs rarely correct themselves?
Or is it the more specific claim that post-training is locating parts of the distribution where the text is generated by someone in a context that highlights their prestige from their competence, and such text rarely corrects itself?
I don’t see yet why the latter would be true, so my guess is you meant the former. (Though I do think the latter prompt would more strongly imply non-self-correction).