I agree with you on tasks where there is not a lot of headroom. But on tasks like International Olympiad level mathematics and programming 4x reduction in model size keeping performance constant will be small. I expect many 1000x and bigger improvements vs. what scaling laws would predict currently.
For example, on MATH dataset “(...) models would need around 10^35 parameters to achieve 40% accuracy” where 40% accuracy is achieved by a PhD student and International Olympiad participant will get close to 90%. https://arxiv.org/abs/2103.03874
With 100 trillion models (10^14) we would still be short by 10^21 parameters. So we will need to get some 20 orders of magnitude improvements in model size for the same performance from somewhere else.
Worth noticing the 40% vs. 90% gap for expert humans on MATH. And similar gap on MMLU (Massive Multitask Language Understanding) 35% for average human vs. 90% experts. Experts don’t have orders of magnitude bigger brains, different architecture or learning algorithm in their brains.
When replying, I also noticed that I made assumptions about what mean by x factor quality improvement. I’m not sure I understood correctly. Could you clarify what you meant precisely?
If you have big communities working on math, I don’t think you will see improvements like 1000x model size (the bigger the community, the harder it will be to get any fixed size of advantage). And I think you will have big communities working on the problem well before it becomes a big deal economically (the bigger the economic deal, the bigger the community). Both of those are quantitative and imperfect and uncertain, but I think they are pretty important rules of thumb for making sense of what happens in the world.
Regarding the IMO disagreement, I think it’s very plausible the IMO will be solved before there is a giant community. So that’s more of a claim that even now, with not many people working on it, you probably aren’t going to get progress that fast. I don’t feel like this speaks to either the two main disagreements with Eliezer, but it does speak to something like “How often do we see jumps that look big to Paul?” where I’m claiming that I have a better sense for what improvements are “surprisingly big.”
I agree with you on tasks where there is not a lot of headroom. But on tasks like International Olympiad level mathematics and programming 4x reduction in model size keeping performance constant will be small. I expect many 1000x and bigger improvements vs. what scaling laws would predict currently.
For example, on MATH dataset “(...) models would need around 10^35 parameters to achieve 40% accuracy” where 40% accuracy is achieved by a PhD student and International Olympiad participant will get close to 90%. https://arxiv.org/abs/2103.03874
With 100 trillion models (10^14) we would still be short by 10^21 parameters. So we will need to get some 20 orders of magnitude improvements in model size for the same performance from somewhere else.
Worth noticing the 40% vs. 90% gap for expert humans on MATH. And similar gap on MMLU (Massive Multitask Language Understanding) 35% for average human vs. 90% experts. Experts don’t have orders of magnitude bigger brains, different architecture or learning algorithm in their brains.
When replying, I also noticed that I made assumptions about what mean by x factor quality improvement. I’m not sure I understood correctly. Could you clarify what you meant precisely?
If you have big communities working on math, I don’t think you will see improvements like 1000x model size (the bigger the community, the harder it will be to get any fixed size of advantage). And I think you will have big communities working on the problem well before it becomes a big deal economically (the bigger the economic deal, the bigger the community). Both of those are quantitative and imperfect and uncertain, but I think they are pretty important rules of thumb for making sense of what happens in the world.
Regarding the IMO disagreement, I think it’s very plausible the IMO will be solved before there is a giant community. So that’s more of a claim that even now, with not many people working on it, you probably aren’t going to get progress that fast. I don’t feel like this speaks to either the two main disagreements with Eliezer, but it does speak to something like “How often do we see jumps that look big to Paul?” where I’m claiming that I have a better sense for what improvements are “surprisingly big.”