I think my biggest crux here is how much the development to AGI is driven by compute progress.
I think it’s mostly driven by new insights plus trying out old, but expensive, ideas. So, I provisionally think that OpenAI has mostly been harmful, far in excess of it’s real positive impacts.
Elaborating:
Compute vs. Insight
One could adopt a (false) toy model in which the price of compute is the only input to AGI. Once the price falls low enough, we get AGI. [a compute-constrained world]
Or a different toy model: When AGI arrives depends entirely on algorithmic / architectural progress, and the price of compute is irrelevant. In this case there’s a number of steps on the “tech tree” to AGI, and the world takes each of those steps, approximately in sequence. Some of those steps are new core insights, like the transformer architecture or RLHF or learning about the chinchilla scaling laws, and others are advances in scaling, going from GPT-2 to GPT-3. [an insight-constrained world]
(Obviously both those models are fake. Both compute and architecture are inputs to AGI, and to some extent they can substitute for each other: you can make up for having a weaker algorithm with more brute force, and vis versa. But these extreme cases are easier for me, at least, to think about.)
In the fully compute-constrained world, OpenAI’s capabilities work is strictly good, because it means we get intermediate products of AGI development earlier.
In this world, progress towards AGI is ticking along at the drum-beat of Moore’s law. We’re going to get AGI in 20XY. But because of OpenAI, we get GPT-3 and 4, which give us subjects for interpretability work, and gives the world a headsup about what’s coming.
Under the compute-constraint assumption, OpenAI is stretching out capabilities development, by causing some of the precursor developments to happen earlier, but more gradually. AGI still arrives at 20XY, but we get intermediates earlier than we otherwise would have.
In the fully insight-constrained world, OpenAI’s impact is almost entirely harmful. Under that model, Large Language Models would have been discovered eventually, but OpenAI made a bet on scaling GPT-2. That caused us to get that technology earlier, and also pulled forward the date of AGI, both by checking off one of the steps, and by showing what was possible and so generating counterfactual interest in transformers.
In this world, OpenAI might have other benefits, but they are at least doing the counterfactual harm of burning our serial time.
They don’t get the credit for “sounding the alarm” by releasing ChatGPT, because that was on the tech tree already, that was going to happen at some point. Giving OpenAI credit for it, would be sort of the reverse of “shooting the messenger”, where you credit someone for letting you know about a bad situation when they were the cause of the bad situation in the first place (or at least made it worse).
Again, neither of these models is correct. But I think our world is closer to the insight-constrained world than the compute-constrained world.
This makes me much less sympathetic to OpenAI.
Costs and Benefits
It doesn’t settle the question, because maybe OpenAI’s other impacts (many of which I agree are positive!) more than make up for the harm done by shortening the timeline to AGI.
In particular...
I’m not inclined to give them credit for deciding to release their models for the world to engage with, rather than keep them as private lab-curiosities. Releasing their language models as products, it seems to me, is fully aligned with their incentives. They have an impressive and useful new technology. I think the vast majority of possible counterfactual companies would do the same thing in their place. It isn’t (I think) an extra service they’re doing the world, relative to the counterfactual.[1]
I am inclined to give them credit for their charter, their pseudo-non-profit structure, and the merge and assist clause[2], all of which seems like at least a small improvement over the kind of commitment to the public good that I would expect from a counterfactual AGI lab.
I am inclined to give them credit for choosing not to release the technical details of GPT-4.
I am inclined to give them credit for publishing their plans and thoughts regarding x-risk, AI alignment, and planning for superintelligence.
Overall, it currently seems to me that OpenAI is somewhat better than a random draw from the distribution of possible counterfactual AGI companies (maybe 90th percentile?). But also that they are not so much better that that makes up for burning 3 to 7 years of the timeline.
3 to 7 years is just my eyeballing how much later someone would have developed ChatGPT-like capabilities, if OpenAI hadn’t bet on scaling up GPT-2 into GPT-3 and hadn’t decided to to invest in RLHF, both moves that it looks to me like few orgs in the world were positioned to try, and even fewer actually would have tried in the near term.
That’s not a very confident number. I’m very interested in getting more informed estimates of how long it would have taken for the world to develop something like ChatGPT without OpenAI.
(I’m selecting ChatGPT as the criterion, because I think that’s the main pivot point at which the world woke up to the promise and power of AI. Conditional on someone developing something ChatGPT-like, it doesn’t seem plausible to me that the world goes another three years without developing a language model as impressive as GPT-4. At that point developing bigger and better language models is an obvious thing to try, rather than an interesting bet that the broader world isn’t much interested in.)
I’m also very interested if anyone thinks that the benefits (either ones that I listed or others) outweigh an extra 3 to 7 years of working on alignment (not to mention 3 to 7 years of additional years of life expectancy for all of us).
That said, “a value-aligned, safety-conscious project comes close to building AGI before we do”, really gives a lot of wiggle-room for deciding if some competitor is “a good guy”. But, still better than the counterfactual.
outweigh an extra 3 to 7 years of working on alignment
Another relevant-seeming question is the extent to which LLMs have been a requirement for alignment progress. It seems to me like LLMs have shown some earlier assumptions about alignment to be incorrect (e.g. pre-LLM discourse had lots of arguments about how AIs have to be agentic in a way that wasn’t aware of the possibility of simulators; things like the Outcome Pump thought experiment feel less like they show alignment to be really hard than they did before, given that an Outcome Pump driven by something like an LLM would probably get the task done right).
In old alignment writing, there seemed to be an assumption that an AGI’s mind would act more like a computer program than like a human mind. Now with us seeing an increased number of connections between the way ANNs seem to work like and the way the brain seems to work like, it looks to me as if the AGI might end up resembling a human mind quite a lot as well. Not only does it weaken the conclusions of some previous writing, it also makes it possible to formulate approaches to alignment that draw stronger inspiration from the human mind, such as my preference fulfillment hypothesis. Even if you think that that one is implausible, various approaches to LLM interpretability look like they might provide insights into how later AGIs might work, which is the first time that we’ve gotten something like experimental data (as opposed to armchair theorizing) to the workings of a proto-AGI.
What this is suggesting to me is that if OpenAI didn’t bet on LLMs, we effectively wouldn’t have gotten more time to do alignment research, because most alignment research done before an understanding of LLMs would have been a dead end. And that actually solving alignment may require people who have internalized the paradigm shift represented by LLMs and figuring out solutions based on that. Under this model, even if we are in an insight-constrained world, OpenAI mostly hasn’t burned away effective years of alignment research (because alignment research carried out before we had LLMs would have been mostly useless anyway).
What this is suggesting to me is that if OpenAI didn’t bet on LLMs, we effectively wouldn’t have gotten more time to do alignment research, because most alignment research done before an understanding of LLMs would have been a dead end. And that actually solving alignment may require people who have internalized the paradigm shift represented by LLMs and figuring out solutions based on that. Under this model, even if we are in an insight-constrained world, OpenAI mostly hasn’t burned away effective years of alignment research (because alignment research carried out before we had LLMs would have been mostly useless anyway).
Here’s a paraphrase of the way I take you to be framing the question. Please let me know if I’m distorting it in my translation.
We often talk about ‘the timeline to AGI’ as a resource that can be burned. We want to have as much time as we can to prepare before the end. But that’s actually not quite right. The relevant segment of time is not (from “as soon as we notice the problem” to “the arrival of AGI”) it’s (from “as soon as we can make real technical headway on the problem” to “the arrival of AGI”). We’ll call that second time-segment “preparation time”.
The development of LLMs maybe did bring the date of AGI towards us, but it also pulled forward the start of the “preparation time clock”.
In fact it was maybe feasible that the “preparation time” clock might have started only just before AGI, or not at all.
So all things considered, the impact of pulling the start time forward seems much larger than the impact of pulling the time of AGI forward.
So in evaluating that, the key question here is whether LLMs were on the critical path already.
Is it more like...
We’re going to get AGI at some point and we might or might not have gotten LLMs before that.
or
It was basically invertible that we get LLMs before AGI. LLMs “always” come X years ahead of AGI.
or
It was basically inevitable that we get LLMs before AGI, but there’s a big range of when they can arrive relative to AGI.
And OpenAI made the gap between LLMs and AGI bigger than the counterfactual.
or
And OpenAI made the gap between LLMs and AGI smaller than the counterfactual.
My guess is that the true answer is closest to the second option: LLMs happen a predictable-ish period ahead of AGI, in large part because they’re impressive enough and generally practical enough to drive AGI development.
I think my biggest crux here is how much the development to AGI is driven by compute progress.
I think it’s mostly driven by new insights plus trying out old, but expensive, ideas. So, I provisionally think that OpenAI has mostly been harmful, far in excess of it’s real positive impacts.
Elaborating:
Compute vs. Insight
One could adopt a (false) toy model in which the price of compute is the only input to AGI. Once the price falls low enough, we get AGI. [a compute-constrained world]
Or a different toy model: When AGI arrives depends entirely on algorithmic / architectural progress, and the price of compute is irrelevant. In this case there’s a number of steps on the “tech tree” to AGI, and the world takes each of those steps, approximately in sequence. Some of those steps are new core insights, like the transformer architecture or RLHF or learning about the chinchilla scaling laws, and others are advances in scaling, going from GPT-2 to GPT-3. [an insight-constrained world]
(Obviously both those models are fake. Both compute and architecture are inputs to AGI, and to some extent they can substitute for each other: you can make up for having a weaker algorithm with more brute force, and vis versa. But these extreme cases are easier for me, at least, to think about.)
In the fully compute-constrained world, OpenAI’s capabilities work is strictly good, because it means we get intermediate products of AGI development earlier.
In this world, progress towards AGI is ticking along at the drum-beat of Moore’s law. We’re going to get AGI in 20XY. But because of OpenAI, we get GPT-3 and 4, which give us subjects for interpretability work, and gives the world a headsup about what’s coming.
Under the compute-constraint assumption, OpenAI is stretching out capabilities development, by causing some of the precursor developments to happen earlier, but more gradually. AGI still arrives at 20XY, but we get intermediates earlier than we otherwise would have.
In the fully insight-constrained world, OpenAI’s impact is almost entirely harmful. Under that model, Large Language Models would have been discovered eventually, but OpenAI made a bet on scaling GPT-2. That caused us to get that technology earlier, and also pulled forward the date of AGI, both by checking off one of the steps, and by showing what was possible and so generating counterfactual interest in transformers.
In this world, OpenAI might have other benefits, but they are at least doing the counterfactual harm of burning our serial time.
They don’t get the credit for “sounding the alarm” by releasing ChatGPT, because that was on the tech tree already, that was going to happen at some point. Giving OpenAI credit for it, would be sort of the reverse of “shooting the messenger”, where you credit someone for letting you know about a bad situation when they were the cause of the bad situation in the first place (or at least made it worse).
Again, neither of these models is correct. But I think our world is closer to the insight-constrained world than the compute-constrained world.
This makes me much less sympathetic to OpenAI.
Costs and Benefits
It doesn’t settle the question, because maybe OpenAI’s other impacts (many of which I agree are positive!) more than make up for the harm done by shortening the timeline to AGI.
In particular...
I’m not inclined to give them credit for deciding to release their models for the world to engage with, rather than keep them as private lab-curiosities. Releasing their language models as products, it seems to me, is fully aligned with their incentives. They have an impressive and useful new technology. I think the vast majority of possible counterfactual companies would do the same thing in their place. It isn’t (I think) an extra service they’re doing the world, relative to the counterfactual.[1]
I am inclined to give them credit for their charter, their pseudo-non-profit structure, and the merge and assist clause[2], all of which seems like at least a small improvement over the kind of commitment to the public good that I would expect from a counterfactual AGI lab.
I am inclined to give them credit for choosing not to release the technical details of GPT-4.
I am inclined to give them credit for publishing their plans and thoughts regarding x-risk, AI alignment, and planning for superintelligence.
Overall, it currently seems to me that OpenAI is somewhat better than a random draw from the distribution of possible counterfactual AGI companies (maybe 90th percentile?). But also that they are not so much better that that makes up for burning 3 to 7 years of the timeline.
3 to 7 years is just my eyeballing how much later someone would have developed ChatGPT-like capabilities, if OpenAI hadn’t bet on scaling up GPT-2 into GPT-3 and hadn’t decided to to invest in RLHF, both moves that it looks to me like few orgs in the world were positioned to try, and even fewer actually would have tried in the near term.
That’s not a very confident number. I’m very interested in getting more informed estimates of how long it would have taken for the world to develop something like ChatGPT without OpenAI.
(I’m selecting ChatGPT as the criterion, because I think that’s the main pivot point at which the world woke up to the promise and power of AI. Conditional on someone developing something ChatGPT-like, it doesn’t seem plausible to me that the world goes another three years without developing a language model as impressive as GPT-4. At that point developing bigger and better language models is an obvious thing to try, rather than an interesting bet that the broader world isn’t much interested in.)
I’m also very interested if anyone thinks that the benefits (either ones that I listed or others) outweigh an extra 3 to 7 years of working on alignment (not to mention 3 to 7 years of additional years of life expectancy for all of us).
It is worth noting that at some point PaLM was (probably) the most powerful LLM in the world, and google didn’t release it as a product.
But I don’t think this is a very stable equilibrium. I expect to see a ChatGPT competitor from google before 2024 (50%) and before 2025 (90%).
That said, “a value-aligned, safety-conscious project comes close to building AGI before we do”, really gives a lot of wiggle-room for deciding if some competitor is “a good guy”. But, still better than the counterfactual.
Another relevant-seeming question is the extent to which LLMs have been a requirement for alignment progress. It seems to me like LLMs have shown some earlier assumptions about alignment to be incorrect (e.g. pre-LLM discourse had lots of arguments about how AIs have to be agentic in a way that wasn’t aware of the possibility of simulators; things like the Outcome Pump thought experiment feel less like they show alignment to be really hard than they did before, given that an Outcome Pump driven by something like an LLM would probably get the task done right).
In old alignment writing, there seemed to be an assumption that an AGI’s mind would act more like a computer program than like a human mind. Now with us seeing an increased number of connections between the way ANNs seem to work like and the way the brain seems to work like, it looks to me as if the AGI might end up resembling a human mind quite a lot as well. Not only does it weaken the conclusions of some previous writing, it also makes it possible to formulate approaches to alignment that draw stronger inspiration from the human mind, such as my preference fulfillment hypothesis. Even if you think that that one is implausible, various approaches to LLM interpretability look like they might provide insights into how later AGIs might work, which is the first time that we’ve gotten something like experimental data (as opposed to armchair theorizing) to the workings of a proto-AGI.
What this is suggesting to me is that if OpenAI didn’t bet on LLMs, we effectively wouldn’t have gotten more time to do alignment research, because most alignment research done before an understanding of LLMs would have been a dead end. And that actually solving alignment may require people who have internalized the paradigm shift represented by LLMs and figuring out solutions based on that. Under this model, even if we are in an insight-constrained world, OpenAI mostly hasn’t burned away effective years of alignment research (because alignment research carried out before we had LLMs would have been mostly useless anyway).
Here’s a paraphrase of the way I take you to be framing the question. Please let me know if I’m distorting it in my translation.
How’s that as a summary?
So in evaluating that, the key question here is whether LLMs were on the critical path already.
Is it more like...
We’re going to get AGI at some point and we might or might not have gotten LLMs before that.
or
It was basically invertible that we get LLMs before AGI. LLMs “always” come X years ahead of AGI.
or
It was basically inevitable that we get LLMs before AGI, but there’s a big range of when they can arrive relative to AGI.
And OpenAI made the gap between LLMs and AGI bigger than the counterfactual.
or
And OpenAI made the gap between LLMs and AGI smaller than the counterfactual.
My guess is that the true answer is closest to the second option: LLMs happen a predictable-ish period ahead of AGI, in large part because they’re impressive enough and generally practical enough to drive AGI development.
Thank you, that seems exactly correct.