Hence, the underlying empirical uncertainty that this question sort of asks us to condition on: is there a meaningful difference between what happens in human brains / models trained this way in the first 10 minutes of thought and the first 1,000 days of thought?
I agree. We can frame this empirical uncertainty more generally by asking: What is the smallest x such that there is no meaningful difference between all the things that can happen in a human brain while thinking about a question for x minutes vs. 1,000 days.
Or rather: What is the smallest x such that ‘learning to generate answers that humans may give after thinking for x minutes’ is not easier than ‘learning to generate answers that humans may give after thinking for 1,000 days’.
I should note that, conditioned on the above scenario, I expect that labeled 10-minute-thinking training examples would be at most a tiny fraction of all the training data (when considering all the learning that had a role in building the model, including learning that produced pre-trained weights etcetera). I expect that most of the learning would be either ‘supervised with automatic labeling’ or unsupervised (e.g. ‘predict the next token’) and that a huge amount of text (and code) that humans wrote will be used; some of which would be the result of humans thinking for a very long time (e.g. a paper on arXiv that is the result of someone thinking about a problem for a year).
I agree. We can frame this empirical uncertainty more generally by asking: What is the smallest x such that there is no meaningful difference between all the things that can happen in a human brain while thinking about a question for x minutes vs. 1,000 days.
Or rather: What is the smallest x such that ‘learning to generate answers that humans may give after thinking for x minutes’ is not easier than ‘learning to generate answers that humans may give after thinking for 1,000 days’.
I should note that, conditioned on the above scenario, I expect that labeled 10-minute-thinking training examples would be at most a tiny fraction of all the training data (when considering all the learning that had a role in building the model, including learning that produced pre-trained weights etcetera). I expect that most of the learning would be either ‘supervised with automatic labeling’ or unsupervised (e.g. ‘predict the next token’) and that a huge amount of text (and code) that humans wrote will be used; some of which would be the result of humans thinking for a very long time (e.g. a paper on arXiv that is the result of someone thinking about a problem for a year).