That’s not reflection, just more initial training data. Reflection acts on the training data it already has, the point is to change the learning problem, by introducing an inductive bias that’s not part of the low level learning algorithm, that improves sample efficiency with respect to loss that’s also not part of low level learning. LLMs are a very good solution to the wrong problem, and a so-so solution to the right problem. Changing the learning incentives might get a better use out of the same training data for improving performance on the right problem.
A language model retrained on generated text (which is one obvious form of implementing reflection) likely does worse as a language model of the original training data, it’s only a better model of the original data with respect to some different metric of being a good model (such as being a good map of the actual world, whatever that means). Machine learning doesn’t know how to specify or turn this different metric into a learning algorithm, but an amplification process that makes use of faculties an LLM captured from human use of language might manage to do this by generating appropriate text for low level learning.
That’s not reflection, just more initial training data. Reflection acts on the training data it already has, the point is to change the learning problem, by introducing an inductive bias that’s not part of the low level learning algorithm, that improves sample efficiency with respect to loss that’s also not part of low level learning. LLMs are a very good solution to the wrong problem, and a so-so solution to the right problem. Changing the learning incentives might get a better use out of the same training data for improving performance on the right problem.
A language model retrained on generated text (which is one obvious form of implementing reflection) likely does worse as a language model of the original training data, it’s only a better model of the original data with respect to some different metric of being a good model (such as being a good map of the actual world, whatever that means). Machine learning doesn’t know how to specify or turn this different metric into a learning algorithm, but an amplification process that makes use of faculties an LLM captured from human use of language might manage to do this by generating appropriate text for low level learning.