Some of my confidence here arises from things that I don’t think would be wise to blab about in public, so my arguments might not be quite as convincing sounding as I’d like, but I’ll give a try.
I wouldn’t quite say it’s not a problem at all, but rather it’s the type of problem that the field is really good at solving. They don’t have to solve ethics or something. They just need to do some clever engineering with the backing of infinite money.
I’d put it at a similar tier of difficulty as scaling up transformers to begin with. That wasn’t nothing! And the industry blew straight through it.
To give some examples that I’m comfortable having in public:
Suppose you stick to text-only training. Could you expand your training sets automatically? Maybe create a higher quality transcription AI and use it to pad your training set using the entirety of youtube?
Maybe you figure out a relatively simple way to extract more juice from a smaller dataset that doesn’t collapse into pathological overfitting.
Maybe you make existing datasets more informative by filtering out sequences that seem to interfere with training.
Maybe you embrace multimodal training where text-only bottlenecks are irrelevant.
Maybe you do it the hard way. What’s a few billion dollars?
(I guess this technically covers my “by the end of this year we’ll see at least one large model making progress on Chinchilla” prediction, though apparently it was up even before my prediction!)
Some of my confidence here arises from things that I don’t think would be wise to blab about in public, so my arguments might not be quite as convincing sounding as I’d like, but I’ll give a try.
I wouldn’t quite say it’s not a problem at all, but rather it’s the type of problem that the field is really good at solving. They don’t have to solve ethics or something. They just need to do some clever engineering with the backing of infinite money.
I’d put it at a similar tier of difficulty as scaling up transformers to begin with. That wasn’t nothing! And the industry blew straight through it.
To give some examples that I’m comfortable having in public:
Suppose you stick to text-only training. Could you expand your training sets automatically? Maybe create a higher quality transcription AI and use it to pad your training set using the entirety of youtube?
Maybe you figure out a relatively simple way to extract more juice from a smaller dataset that doesn’t collapse into pathological overfitting.
Maybe you make existing datasets more informative by filtering out sequences that seem to interfere with training.
Maybe you embrace multimodal training where text-only bottlenecks are irrelevant.
Maybe you do it the hard way. What’s a few billion dollars?
Another recent example: https://openreview.net/forum?id=NiEtU7blzN
(I guess this technically covers my “by the end of this year we’ll see at least one large model making progress on Chinchilla” prediction, though apparently it was up even before my prediction!)