The debate was about whether a small group could quickly explode to take over the world. AI development projects are now billion-dollar affairs and continuing to grow quickly, important results are increasingly driven by giant projects, and 9 people taking over the world with AI looks if anything even more improbable and crazy than it did then.
Maybe you mean something else there, but wasn’t Open AI like 30 people when they released GPT-2 and maybe like 60 when they released GPT-3? This doesn’t seem super off from 9 people, and my guess is there is probably a subset of 9 people that you could poach from OpenAI that could have made 80% as fast progress on that research as the full set of 30 people (at least from talking to other people at OpenAI, my sense is that contributions are very heavy-tailed)?
Like, my sense is that cutting-edge progress is currently made by a few large teams, but that cutting-edge performance can easily come from 5-10 person teams, and that if we end up trying to stop race-dynamics, that the risk from 5-10 person teams would catch up pretty quickly with the risk from big teams, if the big teams halted progress. It seems to me that if I sat down with 8 other smart people, I could probably build a cutting-edge system within 1-2 years. The training cost of modern systems are only in the 10 million range, which is well within the reach of a 10 person team.
Of course, we might see that go up, but I feel confused about why you are claiming that 10 person teams building systems that are at the cutting-edge of capabilities and therefore might pose substantial risk is crazy.
GPT-2 is very far from taking over the world (and was indeed <<10 people). GPT-3 was bigger (though still probably <10 people depending how you amortize infrastructure), and remains far from taking over the world. Modern projects are >10 people, and still not yet taking over the world. It looks like it’s already not super plausible for 10 people to catch up, and it’s rapidly getting less plausible. The prediction isn’t yet settled, but neither are the predictions in Eliezer’s favor, and it’s clear which way the wind blows.
These projects are well-capitalized, with billions of dollars in funding now and valuations rapidly rising (though maybe a dip right now with tech stocks overall down ~25%). These projects need to negotiate absolutely massive compute contracts, and lots of the profit looks likely to flow to compute companies. Most of the work is going into the engineering aspects of these projects. There are many labs with roughly-equally-good approaches, and no one has been able to pull much ahead of the basic formula—most variation is explained by how big a bet different firms are willing to make.
Eliezer is not talking about 10 people making a dominant AI because the rest of the world is being busy slowing down out of concern for AI risk, he is talking about the opposite situation of 10 people making a dominant AI which is also safer, while people are barreling ahead, which is possible because they are building AI in a better way. In addition to 10 people, the view “you can find a better way to build AI that’s way more efficient than other people” is also starting to look increasingly unlikely as performance continues to be dominated by scale and engineering rather than clever ideas.
Everything is vague enough that I might be totally misunderstanding the view, and there is a lot of slack in how you compare it to reality. But for me the most basic point is that this is not a source of words about the future that I personally should be listening to; if there is a way to turn these words into an accurate visualization of the future, I lack the machinery to do so.
(The world I imagined when I read Robin’s words also looks different from the world of today in a lot of important ways. But it’s just not such a slam dunk comparing them, on this particular axis it sure looks more like Robin’s world to me. I do wish that people had stated some predictions so we could tell precisely rather than playing this game.)
Historical track record of software projects is that it’s relatively common that a small team of ~10 people outperforms 1000+ person teams. Indeed, I feel like this is roughly what happened with Deepmind and OpenAI. I feel like in 2016 you could have said that current AGI projects already have 500+ employees and are likely to grow even bigger and so it’s unlikely that a small 10-person team could catch up, and then suddenly the most cutting-edge project was launched by a 10-person team. (Yes, that 10 person team needed a few million dollars, but a few million dollars are not that hard to come by in the tech-sector).
My current guess is that we will continue to see small 10-person teams push the cutting-edge forward in AI, just as we’ve seen the same in most other domains of software.
In addition to 10 people, the view “you can find a better way to build AI that’s way more efficient than other people” is also starting to look increasingly unlikely as performance continues to be dominated by scale and engineering rather than clever ideas.
I do agree with this in terms of what has been happening in the last few years, though I do expect this to break down as we see more things in the “leveraging AI to improve AI development progress” and “recursive self-improvement stuff” categories, which seem to currently enter the horizon. It already seems pretty plausible to me that a team that had exclusive access to a better version of Codex has some chance of outperforming other software development teams by 3-5x, which would then feed into more progress on the performance of the relevant systems.
I do think this is substantially less sharp than what Eliezer was talking about at the time, but I personally find the “a small team of people who will use AI tools to develop AI systems, can vastly outperform large teams that are less smart about it” hypothesis pretty plausible, and probably more likely than not for what will happen eventually.
I think “team uses Codex to be 3x more productive” is more like the kind of thing Robin is talking about than the kind of thing Eliezer is talking about (e.g. see the discussion of UberTool, or just read the foom debate overall). And if you replace 3x with a more realistic number, and consider the fact that right now everyone is definitely selling that as a product rather than exclusively using it internally as a tool, then it’s even more like Robin’s story.
Everyone involved believes in the possibility of tech startups, and I’m not even sure if they have different views about the expected returns to startup founders. The 10 people who start an AI startup can make a lot of money, and will typically grow to a large scale (with significant dilution, but still quite a lot of influence for founders) before they make their most impressive AI systems.
I think this kind of discussion seems pretty unproductive, and it mostly just reinforces the OP’s point that people should actually predict something about the world if we want this kind of discussion to be remotely useful for deciding how to change beliefs as new evidence comes in (at least about what people / models / reasoning strategies work well). If you want to state any predictions about the next 5 years I’m happy to disagree with them.
The kinds of thing I expect are that (i) big models will still be where it’s at, (ii) compute budgets and team sizes continue to grow, (iii) improvements from cleverness continue to shrink, (iv) influence held by individual researchers grows in absolute terms but continues to shrink in relative terms, (v) AI tools become 2x more useful over more like a year than a week, (vi) AI contributions to AI R&D look similar to human contributions in various ways. Happy to put #s on those if you want to disagree on any. Places where I agree with the foom story are that I expect AI to be applied differentially to AI R&D, I expect the productivity of individual AI systems to scale relatively rapidly with compute and R&D investment, I expect overall progress to qualitatively be large, and so on.
Methodology: I copied text from the contributors page (down to just before it says “We also acknowledge and thank every OpenAI team member”), used some quick Emacs keyboard macros to munge out the section headers and non-name text (like “[topic] lead”), deduplicated and counted in Python (and subtracted one for a munging error I spotted after the fact), and got 290. Also, you might not count some sections of contributors (e.g., product management, legal) as relevant to your claim.
Yep, that is definitely counterevidence! Though my model did definitely predict that we would also continue seeing huge teams make contributions, but of course each marginal major contribution is still evidence.
I have more broadly updated against this hypothesis over the past year or so, though I still think there will be lots of small groups of people quite close to the cutting edge (like less than 12 months behind).
Currently the multiple on stuff like better coding tools and setting up development to be AI-guided just barely entered the stage where it feels plausible that a well-set-up team could just completely destroy large incumbents. We’ll see how it develops in the next year or so.
It seems to me that if I sat down with 8 other smart people, I could probably build a cutting-edge system within 1-2 years.
If you’re not already doing machine learning research and engineering, I think it takes more than two years of study to reach the frontier? (The ordinary software engineering you use to build Less Wrong, and the futurism/alignment theory we do here, are not the same skills.)
As my point of comparison for thinking about this, I have a couple hundred commits in Rust, but I would still feel pretty silly claiming to be able to build a state-of-the-art compiler in 2 years with 7 similarly-skilled people, even taking into account that a lot of the work is already done by just using LLVM (similar to how ML projects can just use PyTorch or TensorFlow).
Is there some reason to think AGI (!) is easier than compilers? I think “newer domain, therefore less distance to the frontier” is outweighed by “newer domain, therefore less is known about how to get anything to work at all.”
If you’re not already doing machine learning research and engineering, I think it takes more than two years of study to reach the frontier? (The ordinary software engineering you use to build Less Wrong, and the futurism/alignment theory we do here, are not the same skills.)
Yeah, to be clear, I think I would try hard to hire some people with more of the relevant domain-knowledge (trading off against some other stuff). I do think I also somewhat object to it taking such a long time to get the relevant domain-knowledge (a good chunk of people involved in GPT-3 had less than two years of ML experience), but it doesn’t feel super cruxy for anything here, I think?
“newer domain, therefore less is known about how to get anything to work at all.”
To be clear, I agree with this, but I think this mostly pushes towards making me think that small teams with high general competence will be more important than domain-knowledge. But maybe you meant something else by this.
I think the argument “newer domain hence nearer frontier” still holds. The fact that we don’t know how to make an AGI doesn’t bear on how much you need to learn to match an expert.
Maybe you mean something else there, but wasn’t Open AI like 30 people when they released GPT-2 and maybe like 60 when they released GPT-3? This doesn’t seem super off from 9 people, and my guess is there is probably a subset of 9 people that you could poach from OpenAI that could have made 80% as fast progress on that research as the full set of 30 people (at least from talking to other people at OpenAI, my sense is that contributions are very heavy-tailed)?
Like, my sense is that cutting-edge progress is currently made by a few large teams, but that cutting-edge performance can easily come from 5-10 person teams, and that if we end up trying to stop race-dynamics, that the risk from 5-10 person teams would catch up pretty quickly with the risk from big teams, if the big teams halted progress. It seems to me that if I sat down with 8 other smart people, I could probably build a cutting-edge system within 1-2 years. The training cost of modern systems are only in the 10 million range, which is well within the reach of a 10 person team.
Of course, we might see that go up, but I feel confused about why you are claiming that 10 person teams building systems that are at the cutting-edge of capabilities and therefore might pose substantial risk is crazy.
GPT-2 is very far from taking over the world (and was indeed <<10 people). GPT-3 was bigger (though still probably <10 people depending how you amortize infrastructure), and remains far from taking over the world. Modern projects are >10 people, and still not yet taking over the world. It looks like it’s already not super plausible for 10 people to catch up, and it’s rapidly getting less plausible. The prediction isn’t yet settled, but neither are the predictions in Eliezer’s favor, and it’s clear which way the wind blows.
These projects are well-capitalized, with billions of dollars in funding now and valuations rapidly rising (though maybe a dip right now with tech stocks overall down ~25%). These projects need to negotiate absolutely massive compute contracts, and lots of the profit looks likely to flow to compute companies. Most of the work is going into the engineering aspects of these projects. There are many labs with roughly-equally-good approaches, and no one has been able to pull much ahead of the basic formula—most variation is explained by how big a bet different firms are willing to make.
Eliezer is not talking about 10 people making a dominant AI because the rest of the world is being busy slowing down out of concern for AI risk, he is talking about the opposite situation of 10 people making a dominant AI which is also safer, while people are barreling ahead, which is possible because they are building AI in a better way. In addition to 10 people, the view “you can find a better way to build AI that’s way more efficient than other people” is also starting to look increasingly unlikely as performance continues to be dominated by scale and engineering rather than clever ideas.
Everything is vague enough that I might be totally misunderstanding the view, and there is a lot of slack in how you compare it to reality. But for me the most basic point is that this is not a source of words about the future that I personally should be listening to; if there is a way to turn these words into an accurate visualization of the future, I lack the machinery to do so.
(The world I imagined when I read Robin’s words also looks different from the world of today in a lot of important ways. But it’s just not such a slam dunk comparing them, on this particular axis it sure looks more like Robin’s world to me. I do wish that people had stated some predictions so we could tell precisely rather than playing this game.)
Historical track record of software projects is that it’s relatively common that a small team of ~10 people outperforms 1000+ person teams. Indeed, I feel like this is roughly what happened with Deepmind and OpenAI. I feel like in 2016 you could have said that current AGI projects already have 500+ employees and are likely to grow even bigger and so it’s unlikely that a small 10-person team could catch up, and then suddenly the most cutting-edge project was launched by a 10-person team. (Yes, that 10 person team needed a few million dollars, but a few million dollars are not that hard to come by in the tech-sector).
My current guess is that we will continue to see small 10-person teams push the cutting-edge forward in AI, just as we’ve seen the same in most other domains of software.
I do agree with this in terms of what has been happening in the last few years, though I do expect this to break down as we see more things in the “leveraging AI to improve AI development progress” and “recursive self-improvement stuff” categories, which seem to currently enter the horizon. It already seems pretty plausible to me that a team that had exclusive access to a better version of Codex has some chance of outperforming other software development teams by 3-5x, which would then feed into more progress on the performance of the relevant systems.
I do think this is substantially less sharp than what Eliezer was talking about at the time, but I personally find the “a small team of people who will use AI tools to develop AI systems, can vastly outperform large teams that are less smart about it” hypothesis pretty plausible, and probably more likely than not for what will happen eventually.
I think “team uses Codex to be 3x more productive” is more like the kind of thing Robin is talking about than the kind of thing Eliezer is talking about (e.g. see the discussion of UberTool, or just read the foom debate overall). And if you replace 3x with a more realistic number, and consider the fact that right now everyone is definitely selling that as a product rather than exclusively using it internally as a tool, then it’s even more like Robin’s story.
Everyone involved believes in the possibility of tech startups, and I’m not even sure if they have different views about the expected returns to startup founders. The 10 people who start an AI startup can make a lot of money, and will typically grow to a large scale (with significant dilution, but still quite a lot of influence for founders) before they make their most impressive AI systems.
I think this kind of discussion seems pretty unproductive, and it mostly just reinforces the OP’s point that people should actually predict something about the world if we want this kind of discussion to be remotely useful for deciding how to change beliefs as new evidence comes in (at least about what people / models / reasoning strategies work well). If you want to state any predictions about the next 5 years I’m happy to disagree with them.
The kinds of thing I expect are that (i) big models will still be where it’s at, (ii) compute budgets and team sizes continue to grow, (iii) improvements from cleverness continue to shrink, (iv) influence held by individual researchers grows in absolute terms but continues to shrink in relative terms, (v) AI tools become 2x more useful over more like a year than a week, (vi) AI contributions to AI R&D look similar to human contributions in various ways. Happy to put #s on those if you want to disagree on any. Places where I agree with the foom story are that I expect AI to be applied differentially to AI R&D, I expect the productivity of individual AI systems to scale relatively rapidly with compute and R&D investment, I expect overall progress to qualitatively be large, and so on.
Yeah, I think this is fair. I’ll see whether I can come up with some good operationalizations.
Possible counterevidence (10 months later)?—the GPT-4 contributors list lists almost 300 names.[1]
Methodology: I copied text from the contributors page (down to just before it says “We also acknowledge and thank every OpenAI team member”), used some quick Emacs keyboard macros to munge out the section headers and non-name text (like “[topic] lead”), deduplicated and counted in Python (and subtracted one for a munging error I spotted after the fact), and got 290. Also, you might not count some sections of contributors (e.g., product management, legal) as relevant to your claim.
Yep, that is definitely counterevidence! Though my model did definitely predict that we would also continue seeing huge teams make contributions, but of course each marginal major contribution is still evidence.
I have more broadly updated against this hypothesis over the past year or so, though I still think there will be lots of small groups of people quite close to the cutting edge (like less than 12 months behind).
Currently the multiple on stuff like better coding tools and setting up development to be AI-guided just barely entered the stage where it feels plausible that a well-set-up team could just completely destroy large incumbents. We’ll see how it develops in the next year or so.
If you’re not already doing machine learning research and engineering, I think it takes more than two years of study to reach the frontier? (The ordinary software engineering you use to build Less Wrong, and the futurism/alignment theory we do here, are not the same skills.)
As my point of comparison for thinking about this, I have a couple hundred commits in Rust, but I would still feel pretty silly claiming to be able to build a state-of-the-art compiler in 2 years with 7 similarly-skilled people, even taking into account that a lot of the work is already done by just using LLVM (similar to how ML projects can just use PyTorch or TensorFlow).
Is there some reason to think AGI (!) is easier than compilers? I think “newer domain, therefore less distance to the frontier” is outweighed by “newer domain, therefore less is known about how to get anything to work at all.”
Yeah, to be clear, I think I would try hard to hire some people with more of the relevant domain-knowledge (trading off against some other stuff). I do think I also somewhat object to it taking such a long time to get the relevant domain-knowledge (a good chunk of people involved in GPT-3 had less than two years of ML experience), but it doesn’t feel super cruxy for anything here, I think?
To be clear, I agree with this, but I think this mostly pushes towards making me think that small teams with high general competence will be more important than domain-knowledge. But maybe you meant something else by this.
I think the argument “newer domain hence nearer frontier” still holds. The fact that we don’t know how to make an AGI doesn’t bear on how much you need to learn to match an expert.