Felix J Binder comments on LLMs can learn about themselves by introspection

Felix J Binder 18 Oct 2024 21:06 UTC
2 points
0
One way in which a LLM is not purely derived from its training data is noise in the training process. This includes the random initialization of the weights. If you were given the random initialization of the weights, it’s true that with large amounts of time and computation (and assuming a deterministic world) you could perfectly simulate the resulting model.
Following this definition, we specify it with the following two clauses:
1. M 1 correctly reports F when queried.
2. F is not reported by a stronger language model M 2 that is provided with M 1’s training data
and given the same query as M 1. Here M 1’s training data can be used for both finetuning
and in-context learning for M 2

Here, we use another language model as the external predictor, which might be considerably more powerful, but arguably falls well short of the above scenario. What we mean to illustrate is that introspective facts are those that are neither contained in the training data nor are they those that can be derived from it (such as by asking “What would a reasonable person do in this situation?”)—rather, they are those that can only answered by reference to the model itself.