The first 4 paragraphs sound almost like something I would write and I agree up to:
Large training runs will be infrequent; mostly it will be a combination of fine-tuning and composing from components with subsequent fine-tuning of the combined system, so a typical turn-around will be rapid.
We currently have large training runs for a few reasons, but the most important is that GPT training is very easy to parallelize on GPUs, but GPT inference is not. This is a major limitation because it means GPUs can only accelerate GPT training on (mostly human) past knowledge, but aren’t nearly as efficient at accelerating the rate at which GPT models accumulate experience or self-knowledge.
So if that paradigm continues, large training runs continue to be very important as that is the only way these models can learn new long term knowledge and expand their crystallized intelligence (which at this point is their main impressive capability).
The brain is considered to use more continual learning—but really it just has faster cycles and shorter mini-training runs (via hippocampal replay during sleep). If we move to that kind of paradigm then the training is still very important, but is now just more continuous.
I think we can fine-tune on GPU nicely (fine-tuning is similar to short training runs and results in long-term crystallized knowledge).
But I do agree that the rate of progress here does depend on our progress in doing less uniform things faster (e.g. there are signs of progress in parallelization and acceleration of tree processing (think trees with labeled edges and numerical leaves, which are essentially flexible tensors), but this kind of progress is not mainstream yet, and is not common place yes, instead one has to look at rather obscure papers to see those accelerations of non-standard workloads).
I think this will be achieved (in part, because I somehow do expect less of “winner takes all” dynamics in the field of AI which we have currently; Transformers lead right now, so (almost) all eyes are on Transformers, other efforts attract less attention and resources; with artificial AI researchers not excessively overburdened by human motivations of career and prestige, one would expect better coverage of all possible directions of progress, less crowding around “the winner of the day”).
The first 4 paragraphs sound almost like something I would write and I agree up to:
We currently have large training runs for a few reasons, but the most important is that GPT training is very easy to parallelize on GPUs, but GPT inference is not. This is a major limitation because it means GPUs can only accelerate GPT training on (mostly human) past knowledge, but aren’t nearly as efficient at accelerating the rate at which GPT models accumulate experience or self-knowledge.
So if that paradigm continues, large training runs continue to be very important as that is the only way these models can learn new long term knowledge and expand their crystallized intelligence (which at this point is their main impressive capability).
The brain is considered to use more continual learning—but really it just has faster cycles and shorter mini-training runs (via hippocampal replay during sleep). If we move to that kind of paradigm then the training is still very important, but is now just more continuous.
I think we can fine-tune on GPU nicely (fine-tuning is similar to short training runs and results in long-term crystallized knowledge).
But I do agree that the rate of progress here does depend on our progress in doing less uniform things faster (e.g. there are signs of progress in parallelization and acceleration of tree processing (think trees with labeled edges and numerical leaves, which are essentially flexible tensors), but this kind of progress is not mainstream yet, and is not common place yes, instead one has to look at rather obscure papers to see those accelerations of non-standard workloads).
I think this will be achieved (in part, because I somehow do expect less of “winner takes all” dynamics in the field of AI which we have currently; Transformers lead right now, so (almost) all eyes are on Transformers, other efforts attract less attention and resources; with artificial AI researchers not excessively overburdened by human motivations of career and prestige, one would expect better coverage of all possible directions of progress, less crowding around “the winner of the day”).