What if the input “conditions” in training samples omit information which contributed to determining the associated continuations in the original generative process? This is true for GPT, where the text “initial condition” of most training samples severely underdetermines the real-world process which led to the choice of next token.
What if the training data is a biased/limited sample, representing only a subset of all possible conditions? There may be many “laws of physics” which equally predict the training distribution but diverge in their predictions out-of-distribution.
I honestly think these are not physics related questions though they are very important to ask. These can be better associated to the bias of the researchers that chosed the input conditons and the relevance of training data.
I honestly think these are not physics related questions though they are very important to ask. These can be better associated to the bias of the researchers that chosed the input conditons and the relevance of training data.