I think the review makes a lot of good points and am glad you wrote it.
Here are some hastily-written responses, focusing on areas of disagreement:
it is possible that AI generated synthetic data will ultimately be higher quality than random Internet text. Still I agree directionally about the data.
it seems possible to me that abstraction comes with scale. A lot of the problems you describe get much less bad with scale. And it seems on abstract level that understanding causality deeply is useful for predicting the next word on text that you have not seen before, as models must do during training. Still, I agree that algorithmic innovations, for example relating to memory, maybe needed to get to full automation and that could delay things significantly.
I strongly agree that my GDP assumptions are aggressive and unrealistic. I’m not sure that quantitatively it matters that much. You are of course, right about all of the feedback loops. I don’t think that GDP being higher overall matters very much compared to the fraction of GDP invested. I think it will depend on whether people are willing to invest large fractions of GDP for the potential impact, or whether they need to see the impact there and then. If the delays you mentioned delay wake up then that will make a big difference, otherwise I think the difference is small.
You may be right about the parallelization penalty. But I will share some context about that parameter that I think reduces the force of your argument. When I chose the parameters for the rate of increased investment, I was often thinking about how quickly you could in practice increase the size of the community of people working on the problem. That means that I was not accounting for the fact that the average salary rises when spending in an area rises. That salary rise will create the appearance of a large parallelization penalty. Another factor is that one contributor to the parallelization penalty is that the average quality of the researcher decreases over time with the side of the field. But when AI labor floods in, it’s average quality will not decrease as the quantity increases. And so the parallelization penalty for AI will be lower. But perhaps my penalty is still too small. One final point. If indeed the penalty should be very low then AGI will increase output by a huge amount. You can run fewer copies much faster in serial time. If there is a large parallelization penalty, then the benefit of running fewer copies faster will be massive. So a large parallel penalty would increase the boost just as you get AGI I believe.
Hey Tom, thanks again for your work creating the initial report and for kicking off this discussion. Apologies for the Christmastime delay in reply.
Two quick responses, focused on points of disagreement that aren’t stressed in my original text.
On AI-Generated Synthetic Data:
Breakthroughs in synthetic data would definitely help overcome my dataset quality concerns. Two main obstacles I’d want to see overcome: How will synthetic data retain (1) the fidelity to individual data points of ground truth (how well it represents the “real world” its simulation prepares models for) and (2) the higher-level distribution of datapoints.
On Abstraction with Scale:
Understanding causality deeply would definitely be useful for predicting next words. However, I don’t think that this potential utility implies that current models have such understanding. It might mean that algorithmic innovations that “figure this out” will outcompete others, but that time might still be to-come.
I agree, though, that performance definitely improves with scale and more data collection/feedback when deployed more frequently. Time will tell the level of sophistication to which scale can take us on its own?
On the latter two points (GDP Growth and Parallelization), the factors you flag are definitely also parts of the equation. A higher percentage of GDP invested can increase total investment even if total GDP remains level. Additional talent coming into AI helps combat diminishing returns on the next researcher up, even given duplicative efforts and bad investments.
The name and description of the paralllelization penalty makes it sound like it’s entirely about parallelization—“the penalty to concurrent R&D efforts.” But then the math makes it sound like it’s about more than that—“The ouputs of the hardware and software production function R&D get raised to this penalty before being aggregated to the cumulative total.”
What if we produce an AI system that assists human researchers in doing their research, (say) by automating the coding? And suppose that is, like, 50% of the research cycle, so that now there is the same number of researchers but they are all going 2x faster?
This feels like a case where there is no additional parallelization happening, just a straightforward speedup. So the parallelization penalty shouldn’t be relevant. It feels like we shouldn’t model this as equivalent to increasing the size of the population of researchers.
Hi Trent!
I think the review makes a lot of good points and am glad you wrote it.
Here are some hastily-written responses, focusing on areas of disagreement:
it is possible that AI generated synthetic data will ultimately be higher quality than random Internet text. Still I agree directionally about the data.
it seems possible to me that abstraction comes with scale. A lot of the problems you describe get much less bad with scale. And it seems on abstract level that understanding causality deeply is useful for predicting the next word on text that you have not seen before, as models must do during training. Still, I agree that algorithmic innovations, for example relating to memory, maybe needed to get to full automation and that could delay things significantly.
I strongly agree that my GDP assumptions are aggressive and unrealistic. I’m not sure that quantitatively it matters that much. You are of course, right about all of the feedback loops. I don’t think that GDP being higher overall matters very much compared to the fraction of GDP invested. I think it will depend on whether people are willing to invest large fractions of GDP for the potential impact, or whether they need to see the impact there and then. If the delays you mentioned delay wake up then that will make a big difference, otherwise I think the difference is small.
You may be right about the parallelization penalty. But I will share some context about that parameter that I think reduces the force of your argument. When I chose the parameters for the rate of increased investment, I was often thinking about how quickly you could in practice increase the size of the community of people working on the problem. That means that I was not accounting for the fact that the average salary rises when spending in an area rises. That salary rise will create the appearance of a large parallelization penalty. Another factor is that one contributor to the parallelization penalty is that the average quality of the researcher decreases over time with the side of the field. But when AI labor floods in, it’s average quality will not decrease as the quantity increases. And so the parallelization penalty for AI will be lower. But perhaps my penalty is still too small. One final point. If indeed the penalty should be very low then AGI will increase output by a huge amount. You can run fewer copies much faster in serial time. If there is a large parallelization penalty, then the benefit of running fewer copies faster will be massive. So a large parallel penalty would increase the boost just as you get AGI I believe.
Hey Tom, thanks again for your work creating the initial report and for kicking off this discussion. Apologies for the Christmastime delay in reply.
Two quick responses, focused on points of disagreement that aren’t stressed in my original text.
On AI-Generated Synthetic Data:
Breakthroughs in synthetic data would definitely help overcome my dataset quality concerns. Two main obstacles I’d want to see overcome: How will synthetic data retain (1) the fidelity to individual data points of ground truth (how well it represents the “real world” its simulation prepares models for) and (2) the higher-level distribution of datapoints.
On Abstraction with Scale:
Understanding causality deeply would definitely be useful for predicting next words. However, I don’t think that this potential utility implies that current models have such understanding. It might mean that algorithmic innovations that “figure this out” will outcompete others, but that time might still be to-come.
I agree, though, that performance definitely improves with scale and more data collection/feedback when deployed more frequently. Time will tell the level of sophistication to which scale can take us on its own?
On the latter two points (GDP Growth and Parallelization), the factors you flag are definitely also parts of the equation. A higher percentage of GDP invested can increase total investment even if total GDP remains level. Additional talent coming into AI helps combat diminishing returns on the next researcher up, even given duplicative efforts and bad investments.
The name and description of the paralllelization penalty makes it sound like it’s entirely about parallelization—“the penalty to concurrent R&D efforts.” But then the math makes it sound like it’s about more than that—“The ouputs of the hardware and software production function R&D get raised to this penalty before being aggregated to the cumulative total.”
What if we produce an AI system that assists human researchers in doing their research, (say) by automating the coding? And suppose that is, like, 50% of the research cycle, so that now there is the same number of researchers but they are all going 2x faster?
This feels like a case where there is no additional parallelization happening, just a straightforward speedup. So the parallelization penalty shouldn’t be relevant. It feels like we shouldn’t model this as equivalent to increasing the size of the population of researchers.