GPT-4 is more general than GPT-3, which was more general than GPT-2, and presumably this trend will continue as we scale up our models indefinitely, rather than shooting up discontinuously past human level at some point.
I don’t presume that at all. I think that is in fact something we should not expect. I think that as soon as we cross the threshold where, as you say above “it will begin to meaningfully feed into itself, increasing AI R&D itself, accelerating the rate of technological progress” we should expect that the thing we will see is “shooting up discontinuously past human level”. This ‘shooting up’ seems likely to me to be continuous in the sense that there will be multiple model iterations, each a bit better than the last, rather than a single training run which goes all the way above human level from near current GPT-4 level. It will not be continuous in a historical sense, it will look like a clear departure from a straight line fit to the progress of GPT-1, 2, 3, 4 and the dates at which those progress points occurred. Like, we should expect a period less than 1 year until > human intelligence and generality after the recursive improvement process starts. This is not just my opinion, this has been mentioned in Anthropic’s expectations and by Tom Davidson’s report.
Quote from Anthropic: “These models could begin to automate large portions of the economy. … We believe that companies that train the best 2025⁄26 models will be too far ahead for anyone to catch up in subsequent cycles.”
I want to distinguish between a discontinuity in inputs, and a discontinuity in response to small changes in inputs. In that quote, I meant that I don’t expect the generality of models to shoot up at some point as we scale from 10^25 FLOP to 10^26, 10^27 and so on, at least, holding algorithms constant. I agree that AI automation could increase growth, which would allow us to scale AI more quickly, but that’s different from the idea that generality will suddenly appear at some scale, rather than appearing smoothly as we move through the orders of magnitude of compute.
I think that misses the point. Why would you assume that the algorithms would remain constant and only compute would scale? Sam Altman explicitly said that he thinks they are in a regime of diminishing returns from scaling compute and that the primary effort being put into the next version of GPT will be on finding algorithmic improvements. Additionally, he said that there were algorithmic improvements between v2 and v3, as well as between v3 and v4. Anthropic recently said that they made an algorithmic advance allowing for much larger context sizes, larger than was thought possible at this level of compute a year ago. Thus, the algorithms have already improved substantially even since the very recent release of GPT-4. Why would that stop now?
I don’t think algorithms will stay constant. Recursive AI R&D could speed up the rate of algorithmic progress too, but I mostly think that’s just another “input” to the “AI production function”. Since I already agree with you that AI automation could speed up the pace of AI progress, I’m not sure exactly what you disagree with. My claim was about sharp changes in output in response to small changes in inputs.
Sam Altman explicitly said that he thinks they are in a regime of diminishing returns from scaling compute and that the primary effort being put into the next version of GPT will be on finding algorithmic improvements.
Did he really say this? I thought he was talking about the size of models, not the size of training compute. It is expected under the Chinchilla scaling law that it will take a while for models to get much larger, mostly because we were under training them for a few years. I suspect that’s what he was referring to instead.
Since you agree with me about algorithmic progress being a thing, and that AI automation will speed up AI development.… then I also am confused about where our disagreement is. My current model (which has changed a bit over the couple years, mainly in terms of feeling more confident in pinning down specific year and speed predictions) is that I most expect a gradual speed up. Something like we get GPT-5, which is finally good enough that it is able to make substantial contributions, like a semi-directed automated search for algorithmic improvements. This could include mining of open source repositories for existing code that hasn’t been tested at scale in combination with SotA techniques. I have a list of examples in mind of techniques I’ve seen published that I think would improve on transformers if they could be successfully integrated. I expect that this process will kick off a self-reinforcing cycle of finding algorithmic improvements that make training models substantially cheaper and increase the peak efficacy, such that before 2030 the SotA models have either gone thoroughly super-human and general, or that we have avoided that through active restraint on the part of the top labs.
But others seem to be imagining that there won’t be this acceleration which builds on itself and takes things to a crazy new level. Like, that instead it will stay kind of in the same regime, but be just a bit better as forecasted by the expected increases in compute, data, and only minor algorithmic improvements. I’m not sure we really have enough empirical data about this to settle things one way or another, so there’s a lot of gesturing at intuitions and analogies and pointing out that there aren’t any clear physical limits to stop some runaway improvement process at ‘approximately human level’.
Here’s some recent quotes from Sam Altman. Let me know if you think my interpretation of his view is correct. Note: I think that Sam Altman’s forecast of the improvement trend falls somewhere in-between yours and mine. He says he expects there will be a ‘long time’ of humans being ‘in the loop’ and guiding development of future AI versions. My expectation of ‘under two years after it begins doing a significant portion of the work, it will be doing nearly all of the work (or at least capable of doing so if allowed)’ seems faster than Sam’s description of a ‘long time’.
Altman: GPT-4 has enough nuance to be able to help you explore that and treat you like an adult in the process. GPT-3, I think, just wasn’t capable of getting that right.
Lex Fridman: By the way, if you could just speak to the leap from GPT-3.5 to 4. Is there a technical leap or is it really just focused on the alignment?
Altman: No, it’s a lot of technical leaps in the base model. One of the thingswe are good at at OpenAI is finding a lot of small wins. And each of them maybe is a pretty big secret in some sense, but it really is the multiplicative impact of all of them. And the detail and care we put into it that gets us these big leaps. And then it looks like from the outside like ‘oh, they probably just did one thing to get from GPT 3 to 3.5 to 4’, but it’s like hundreds of complicated things.
Lex: So, tiny little thing, like the training, like everything, with the data organization.
Altman: Yeah, like how we collect the data, how we clean the data, how we do the training, how we do the optimizer, how we do the architecture. Like, so many things.
Altman: I think that there is going to be a big new algorithmic idea, a different way that we train or use or tweak these models, different architecture perhaps. So I think we’ll find that at some point.
Swisher: Meaning what, for the non-techy?
Altman: Well, it could be a lot of things. You could say a different algorithm, but just some different idea of the way that we create or use these models that encourages, during training or inference time when you’re using it, that encourages the models to really ground themselves in truth, be able to cite sources. Microsoft has done some good work there. We’re working on some things.
...
Swisher: What do you think the most viable threat to OpenAI is? I hear you’re watching Claude very carefully. This is the bot from Anthropic, a company that’s founded by former OpenAI folks and backed by Alphabet. Is that it? We’re recording this on Tuesday. BARD launched today; I’m sure you’ve been discussing it internally. Talk about those two to start.
Altman: I try to pay some attention to what’s happening with all these other things. It’s going to be an unbelievably competitive space. I think this is the first new technological platform in a long period of time. The thing I worry about the most is not any of those, because I think there’s room for a lot of people, and also I think we’ll just continue to offer the best product. The thing I worry about the most is that we’re somehow missing a better approach. Everyone’s chasing us right now on large language models, kind of trained in the same way. I don’t worry about them, I worry about the person who has some very different idea about how to make a more useful system.
Swisher: But is there one that you’re watching more carefully?
Altman: Not especially.
Swisher: Really? I kind of don’t believe you, but really?
Altman: The things that I pay the most attention to are not, like, language model, start-up number 217. It’s when I hear, “These are three smart people in a garage with some very different theory of how to build AGI.” And that’s when I pay attention.
Swisher: Is there one that you’re paying attention to now?
interviewer: are we getting close to the day when the thing is so rapidly self-improving that it hits some [regime of rapid takeoff]?
Altman: I think that it is going to be a much fuzzier boundary for ‘getting to self-improvement’ or not. I think that what will happen is that more and more of the Improvement Loop will be aided by AIs, but humans will still be driving it and it’s going to go like that for a long time.
There are a whole bunch of other things that I’ve never believed in like one day or one month takeoff, for a bunch of reasons.
I don’t presume that at all. I think that is in fact something we should not expect. I think that as soon as we cross the threshold where, as you say above “it will begin to meaningfully feed into itself, increasing AI R&D itself, accelerating the rate of technological progress” we should expect that the thing we will see is “shooting up discontinuously past human level”. This ‘shooting up’ seems likely to me to be continuous in the sense that there will be multiple model iterations, each a bit better than the last, rather than a single training run which goes all the way above human level from near current GPT-4 level. It will not be continuous in a historical sense, it will look like a clear departure from a straight line fit to the progress of GPT-1, 2, 3, 4 and the dates at which those progress points occurred. Like, we should expect a period less than 1 year until > human intelligence and generality after the recursive improvement process starts. This is not just my opinion, this has been mentioned in Anthropic’s expectations and by Tom Davidson’s report.
Quote from Anthropic: “These models could begin to automate large portions of the economy. … We believe that companies that train the best 2025⁄26 models will be too far ahead for anyone to catch up in subsequent cycles.”
I want to distinguish between a discontinuity in inputs, and a discontinuity in response to small changes in inputs. In that quote, I meant that I don’t expect the generality of models to shoot up at some point as we scale from 10^25 FLOP to 10^26, 10^27 and so on, at least, holding algorithms constant. I agree that AI automation could increase growth, which would allow us to scale AI more quickly, but that’s different from the idea that generality will suddenly appear at some scale, rather than appearing smoothly as we move through the orders of magnitude of compute.
I think that misses the point. Why would you assume that the algorithms would remain constant and only compute would scale? Sam Altman explicitly said that he thinks they are in a regime of diminishing returns from scaling compute and that the primary effort being put into the next version of GPT will be on finding algorithmic improvements. Additionally, he said that there were algorithmic improvements between v2 and v3, as well as between v3 and v4. Anthropic recently said that they made an algorithmic advance allowing for much larger context sizes, larger than was thought possible at this level of compute a year ago. Thus, the algorithms have already improved substantially even since the very recent release of GPT-4. Why would that stop now?
I don’t think algorithms will stay constant. Recursive AI R&D could speed up the rate of algorithmic progress too, but I mostly think that’s just another “input” to the “AI production function”. Since I already agree with you that AI automation could speed up the pace of AI progress, I’m not sure exactly what you disagree with. My claim was about sharp changes in output in response to small changes in inputs.
Did he really say this? I thought he was talking about the size of models, not the size of training compute. It is expected under the Chinchilla scaling law that it will take a while for models to get much larger, mostly because we were under training them for a few years. I suspect that’s what he was referring to instead.
Since you agree with me about algorithmic progress being a thing, and that AI automation will speed up AI development.… then I also am confused about where our disagreement is. My current model (which has changed a bit over the couple years, mainly in terms of feeling more confident in pinning down specific year and speed predictions) is that I most expect a gradual speed up. Something like we get GPT-5, which is finally good enough that it is able to make substantial contributions, like a semi-directed automated search for algorithmic improvements. This could include mining of open source repositories for existing code that hasn’t been tested at scale in combination with SotA techniques. I have a list of examples in mind of techniques I’ve seen published that I think would improve on transformers if they could be successfully integrated. I expect that this process will kick off a self-reinforcing cycle of finding algorithmic improvements that make training models substantially cheaper and increase the peak efficacy, such that before 2030 the SotA models have either gone thoroughly super-human and general, or that we have avoided that through active restraint on the part of the top labs.
But others seem to be imagining that there won’t be this acceleration which builds on itself and takes things to a crazy new level. Like, that instead it will stay kind of in the same regime, but be just a bit better as forecasted by the expected increases in compute, data, and only minor algorithmic improvements. I’m not sure we really have enough empirical data about this to settle things one way or another, so there’s a lot of gesturing at intuitions and analogies and pointing out that there aren’t any clear physical limits to stop some runaway improvement process at ‘approximately human level’.
Here’s some recent quotes from Sam Altman. Let me know if you think my interpretation of his view is correct. Note: I think that Sam Altman’s forecast of the improvement trend falls somewhere in-between yours and mine. He says he expects there will be a ‘long time’ of humans being ‘in the loop’ and guiding development of future AI versions. My expectation of ‘under two years after it begins doing a significant portion of the work, it will be doing nearly all of the work (or at least capable of doing so if allowed)’ seems faster than Sam’s description of a ‘long time’.
https://youtube.com/clip/UgkxvBNObQ03EbpISIdek2rxSwPRwPm1hiWl
Altman: GPT-4 has enough nuance to be able to help you explore that and treat you like an adult in the process. GPT-3, I think, just wasn’t capable of getting that right.
Lex Fridman: By the way, if you could just speak to the leap from GPT-3.5 to 4. Is there a technical leap or is it really just focused on the alignment?
Altman: No, it’s a lot of technical leaps in the base model. One of the thingswe are good at at OpenAI is finding a lot of small wins. And each of them maybe is a pretty big secret in some sense, but it really is the multiplicative impact of all of them. And the detail and care we put into it that gets us these big leaps. And then it looks like from the outside like ‘oh, they probably just did one thing to get from GPT 3 to 3.5 to 4’, but it’s like hundreds of complicated things.
Lex: So, tiny little thing, like the training, like everything, with the data organization.
Altman: Yeah, like how we collect the data, how we clean the data, how we do the training, how we do the optimizer, how we do the architecture. Like, so many things.
----------------------------------------
https://nymag.com/intelligencer/2023/03/on-with-kara-swisher-sam-altman-on-the-ai-revolution.html
Altman: I think that there is going to be a big new algorithmic idea, a different way that we train or use or tweak these models, different architecture perhaps. So I think we’ll find that at some point.
Swisher: Meaning what, for the non-techy?
Altman: Well, it could be a lot of things. You could say a different algorithm, but just some different idea of the way that we create or use these models that encourages, during training or inference time when you’re using it, that encourages the models to really ground themselves in truth, be able to cite sources. Microsoft has done some good work there. We’re working on some things.
...
Swisher: What do you think the most viable threat to OpenAI is? I hear you’re watching Claude very carefully. This is the bot from Anthropic, a company that’s founded by former OpenAI folks and backed by Alphabet. Is that it? We’re recording this on Tuesday. BARD launched today; I’m sure you’ve been discussing it internally. Talk about those two to start.
Altman: I try to pay some attention to what’s happening with all these other things. It’s going to be an unbelievably competitive space. I think this is the first new technological platform in a long period of time. The thing I worry about the most is not any of those, because I think there’s room for a lot of people, and also I think we’ll just continue to offer the best product. The thing I worry about the most is that we’re somehow missing a better approach. Everyone’s chasing us right now on large language models, kind of trained in the same way. I don’t worry about them, I worry about the person who has some very different idea about how to make a more useful system.
Swisher: But is there one that you’re watching more carefully?
Altman: Not especially.
Swisher: Really? I kind of don’t believe you, but really?
Altman: The things that I pay the most attention to are not, like, language model, start-up number 217. It’s when I hear, “These are three smart people in a garage with some very different theory of how to build AGI.” And that’s when I pay attention.
Swisher: Is there one that you’re paying attention to now?
Altman: There is one; I don’t want to say.
------------------------------------
https://youtube.com/clip/UgkxxQjTOhqpUCQY_ItrLKkLznBSPqd-rd_X
interviewer: are we getting close to the day when the thing is so rapidly self-improving that it hits some [regime of rapid takeoff]?
Altman: I think that it is going to be a much fuzzier boundary for ‘getting to self-improvement’ or not. I think that what will happen is that more and more of the Improvement Loop will be aided by AIs, but humans will still be driving it and it’s going to go like that for a long time.
There are a whole bunch of other things that I’ve never believed in like one day or one month takeoff, for a bunch of reasons.