The prior is not the gears. The distribution of authors a transformer is learning to imitate doesn’t necessarily have an important impact on the collection of features in the meaning of situations it’s learning to be aware of, features that pre-training is refining and fine-tuning is assembling into agency. There are some features that classify common kinds of authors, but all other common features of human cognition found in any other kinds of authors are also represented to some degree, to be learned and become available as ingredients for fine-tuning. If a sufficiently powerful LLM learns to predict Carlsen, it doesn’t matter that most chess players are worse than that at chess, the features are there to be found.
The prior is not the gears. The distribution of authors a transformer is learning to imitate doesn’t necessarily have an important impact on the collection of features in the meaning of situations it’s learning to be aware of, features that pre-training is refining and fine-tuning is assembling into agency. There are some features that classify common kinds of authors, but all other common features of human cognition found in any other kinds of authors are also represented to some degree, to be learned and become available as ingredients for fine-tuning. If a sufficiently powerful LLM learns to predict Carlsen, it doesn’t matter that most chess players are worse than that at chess, the features are there to be found.