I currently suspect you don’t go far enough! If people are allowed to collect more data and fine-tune on it, the sky’s the limit… or at least, it’s unclear how far above our heads the limit is.
Consider a hypothetical world like ours except where for some reason the government paid a million people to predict random internet text all day.
In that world, a million people just got their jobs automated away by GPT-3, which I hear* is superhuman at predicting random internet text, sufficiently so that it would probably be able to beat trained professionals (it’s unclear since there are no such people in our world). And PaLM, Chinchilla, etc. are much better still.
OK, so why hasn’t the singularity happened yet? Because our models are good at predicting internet text but only mediocre at the downstream tasks that are actually economically valuable.
But why are they only mediocre at those tasks?
You could say: Because those tasks are inherently, objectively harder! We’ll need much bigger models or new architectures before we can do them!
But maybe instead the answer is mostly that our big models haven’t been trained on those tasks. Or maybe they have been trained a little bit (fine-tuned) but not very much. Train for a trillion data points directly on an important economic task, and then we’ll see what happens...
I’m especially interested to hear counterarguments to this take.
*I heard this from Buck, Ryan, and Tao, who I think work at Redwood Research. They pointed me to this nifty tool which can show you what random samples of internet text look like.
I currently suspect you don’t go far enough! If people are allowed to collect more data and fine-tune on it, the sky’s the limit… or at least, it’s unclear how far above our heads the limit is.
Consider a hypothetical world like ours except where for some reason the government paid a million people to predict random internet text all day.
In that world, a million people just got their jobs automated away by GPT-3, which I hear* is superhuman at predicting random internet text, sufficiently so that it would probably be able to beat trained professionals (it’s unclear since there are no such people in our world). And PaLM, Chinchilla, etc. are much better still.
OK, so why hasn’t the singularity happened yet? Because our models are good at predicting internet text but only mediocre at the downstream tasks that are actually economically valuable.
But why are they only mediocre at those tasks?
You could say: Because those tasks are inherently, objectively harder! We’ll need much bigger models or new architectures before we can do them!
But maybe instead the answer is mostly that our big models haven’t been trained on those tasks. Or maybe they have been trained a little bit (fine-tuned) but not very much. Train for a trillion data points directly on an important economic task, and then we’ll see what happens...
I’m especially interested to hear counterarguments to this take.
*I heard this from Buck, Ryan, and Tao, who I think work at Redwood Research. They pointed me to this nifty tool which can show you what random samples of internet text look like.
I agree, and I look forward to seeing how far it goes!