One thing conspicuously absent, IMO, is discussion of misalignment risk. I’d argue that GPT-2030 will be situationally aware, strategically aware, and (at least when plugged into fancy future versions of AutoGPT etc.) agentic/goal-directed. If you think it wouldn’t be a powerful adversary of humanity, why not? Because it’ll be ‘just following instructions’ and people will put benign instructions in the prompt? Because HFDT will ensure that it’ll robustly avoid POUDA? Or will it in fact be a powerful adversary of humanity, but one that is unable to cause a point-of-no-return or achieve powerbase ability, due to various capability limitations or safety procedures?
Another thing worth mentioning (though less important) is that if you think GPT-2030 would massively accelerate AI R&D, then you should probably think that GPT-2027 will substantially accelerate AI R&D, meaning that AIs with the abilities of GPT-2030 will actually appear sooner than 2030. You are no doubt familiar with Tom’s takeoff model.
Finally, I’d be curious to hear where you got your rough guess about how long (in subjective time) GPT-2030 could run before getting stuck and needing human feedback. My prediction is that by the time it can reliably run for a subjective day across many diverse tasks, it’ll get ‘transfer learning’ and ‘generalization’ benefits that cause it to very rapidly learn to go for subjective weeks, months, years, infinity. See this table explaining my view by comparison to Richard Ngo’s view. What do you think?
I don’t like the number of links that you put into your first paragraph. The point of developing a vocabulary for a field is to make communication more efficient so that the field can advance. Do you need an acronym and associated article for ‘pretty obviously unintended/destructive actions,’ or in practice is that just insularizing the discussion?
I hear people complaining about how AI safety only has ~300 people working about it, and how nobody is developing object level understandings and everyone’s thinking from authority, but the more sentences you write like: “Because HFDT will ensure that it’ll robustly avoid POUDA?” the more true that becomes.
Thanks for the feedback, I’ll try to keep this in mind in the future. I imagine you’d prefer me to keep the links, but make the text use common-sense language instead of acronyms so that people don’t need to click on the links to understand what I’m saying?
I also think there’s an important distinction between using links in a debate frame and in a sharing frame.
I wouldn’t be bothered at all by a comment using acronyms and links, no matter how insular, if the context was just ‘hey this reminds me of HDFT and POUDA,’ a beginner can jump off of that and get down a rabbit hole of interesting concepts.
But if you’re in a debate frame, you’re introducing unnecessary barriers to discussion which feel unfair and disqualifying. At its worst it would be like saying: ‘youre not qualified to debate until you read these five articles.’
In a debate frame I don’t think you should use any unnecessary links or acronyms at all. If you’re linking a whole article it should be because it’s necessary for them to read and understand the whole article for the discussion to continue and it cannot be summarized.
I think I have this principle because in my mind you cannot not debate so therefore you have to read all the links and content included, meaning that links in a sharing context are optional but in a debate context they’re required.
I think on a second read your comment might have been more in the ‘sharing’ frame than I originally thought, but to the extent you were presenting arguments I think you should maximize legibility, to the point of only including links if you make clear contextually or explicitly to what degree the link is optional or just for reference.
Thanks for that feedback as well—I think I didn’t realize how much my comment comes across as ‘debate’ framing, which now on second read seems obvious. I genuinely didn’t intend my comment to be a criticism of the post at all; I genuinely was thinking something like “This is a great post. But other than that, what should I say? I should have something useful to add. Ooh, here’s something: Why no talk of misalignment? Seems like a big omission. I wonder what he thinks about that stuff.” But on reread it comes across as more of a “nyah nyah why didn’t you talk about my hobbyhorse” unfortunately.
As I read this post, I found myself puzzled by the omission of the potential of AI-research-acceleration by SotA AI models, as Daniel mentions in his comment. I think it’s worth pointing out that this has been explicitly discussed by leading individuals in the big AI labs. For instance, Sam Altman saying that scaling is no longer the primary path forward in their work, that instead algorithmic advances are.
Think about your intuitions of what a smart and motivated human is capable of. The computations that that human brain is running represent an algorithm. From neuroscience we have a lot of information about the invariant organization of human brains, the architectural constraints and priors established during fetal development under guidance from the genome. The human brain’s macroscale connectome is complicated and hacky, with a lot of weirdly specific and not fully understood aspects. The architectures currently used in ML are comparatively simple and have more uniform repeating structures. Some of the hacky specific brain connectome details are apparently very helpful though, considering how quickly humans do, in practice, learn. Figuring out what aspects of the brain would be useful to incorporate into ML models is exactly the sort of constrained engineering problem that ML excels at. Take in neuroscience data, design a large number of automated experiments, run the experiments in parallel, analyze and summarize the results, reward model for successes, repeat. The better our models get, and the more compute we have with which to do this automated search, the more likely we are to find something. The more advances we find, the faster and cheaper the process becomes and the more investment will be put into pursuing it. The combination of all of these factors implies a strong accelerating trend beyond a certain threshold. This trend, unlike scaling compute or data, is not expected to sigmoid-out before exceeding human intelligence. That’s what makes this truely a big deal. Without this potential meta-growth, ML would be just a big deal instead of absolutely pivotal. Trying to project ML development without taking this into account is like watching a fuse burn towards a bomb, and trying to model the bomb like a somewhat unusually vigorous campfire.
Where does this “transfer learning across timespans” come from? The main reason I see for checking back in after 3 days is the model’s losing the thread of what the human currently wants, rather than being incapable of pursuing something for longer stretches. A direct parallel is a human worker reporting to a manager on a project—the worker could keep going without check-ins, but their mental model of the larger project goes out of sync within a few days so de facto they’re rate limited by manager check-ins.
Well said.
One thing conspicuously absent, IMO, is discussion of misalignment risk. I’d argue that GPT-2030 will be situationally aware, strategically aware, and (at least when plugged into fancy future versions of AutoGPT etc.) agentic/goal-directed. If you think it wouldn’t be a powerful adversary of humanity, why not? Because it’ll be ‘just following instructions’ and people will put benign instructions in the prompt? Because HFDT will ensure that it’ll robustly avoid POUDA? Or will it in fact be a powerful adversary of humanity, but one that is unable to cause a point-of-no-return or achieve powerbase ability, due to various capability limitations or safety procedures?
Another thing worth mentioning (though less important) is that if you think GPT-2030 would massively accelerate AI R&D, then you should probably think that GPT-2027 will substantially accelerate AI R&D, meaning that AIs with the abilities of GPT-2030 will actually appear sooner than 2030. You are no doubt familiar with Tom’s takeoff model.
Finally, I’d be curious to hear where you got your rough guess about how long (in subjective time) GPT-2030 could run before getting stuck and needing human feedback. My prediction is that by the time it can reliably run for a subjective day across many diverse tasks, it’ll get ‘transfer learning’ and ‘generalization’ benefits that cause it to very rapidly learn to go for subjective weeks, months, years, infinity. See this table explaining my view by comparison to Richard Ngo’s view. What do you think?
I don’t like the number of links that you put into your first paragraph. The point of developing a vocabulary for a field is to make communication more efficient so that the field can advance. Do you need an acronym and associated article for ‘pretty obviously unintended/destructive actions,’ or in practice is that just insularizing the discussion?
I hear people complaining about how AI safety only has ~300 people working about it, and how nobody is developing object level understandings and everyone’s thinking from authority, but the more sentences you write like: “Because HFDT will ensure that it’ll robustly avoid POUDA?” the more true that becomes.
I feel very strongly about this.
Thanks for the feedback, I’ll try to keep this in mind in the future. I imagine you’d prefer me to keep the links, but make the text use common-sense language instead of acronyms so that people don’t need to click on the links to understand what I’m saying?
That seems like a useful heuristic-
I also think there’s an important distinction between using links in a debate frame and in a sharing frame.
I wouldn’t be bothered at all by a comment using acronyms and links, no matter how insular, if the context was just ‘hey this reminds me of HDFT and POUDA,’ a beginner can jump off of that and get down a rabbit hole of interesting concepts.
But if you’re in a debate frame, you’re introducing unnecessary barriers to discussion which feel unfair and disqualifying. At its worst it would be like saying: ‘youre not qualified to debate until you read these five articles.’
In a debate frame I don’t think you should use any unnecessary links or acronyms at all. If you’re linking a whole article it should be because it’s necessary for them to read and understand the whole article for the discussion to continue and it cannot be summarized.
I think I have this principle because in my mind you cannot not debate so therefore you have to read all the links and content included, meaning that links in a sharing context are optional but in a debate context they’re required.
I think on a second read your comment might have been more in the ‘sharing’ frame than I originally thought, but to the extent you were presenting arguments I think you should maximize legibility, to the point of only including links if you make clear contextually or explicitly to what degree the link is optional or just for reference.
Thanks for that feedback as well—I think I didn’t realize how much my comment comes across as ‘debate’ framing, which now on second read seems obvious. I genuinely didn’t intend my comment to be a criticism of the post at all; I genuinely was thinking something like “This is a great post. But other than that, what should I say? I should have something useful to add. Ooh, here’s something: Why no talk of misalignment? Seems like a big omission. I wonder what he thinks about that stuff.” But on reread it comes across as more of a “nyah nyah why didn’t you talk about my hobbyhorse” unfortunately.
As I read this post, I found myself puzzled by the omission of the potential of AI-research-acceleration by SotA AI models, as Daniel mentions in his comment. I think it’s worth pointing out that this has been explicitly discussed by leading individuals in the big AI labs. For instance, Sam Altman saying that scaling is no longer the primary path forward in their work, that instead algorithmic advances are.
Think about your intuitions of what a smart and motivated human is capable of. The computations that that human brain is running represent an algorithm. From neuroscience we have a lot of information about the invariant organization of human brains, the architectural constraints and priors established during fetal development under guidance from the genome. The human brain’s macroscale connectome is complicated and hacky, with a lot of weirdly specific and not fully understood aspects. The architectures currently used in ML are comparatively simple and have more uniform repeating structures. Some of the hacky specific brain connectome details are apparently very helpful though, considering how quickly humans do, in practice, learn. Figuring out what aspects of the brain would be useful to incorporate into ML models is exactly the sort of constrained engineering problem that ML excels at. Take in neuroscience data, design a large number of automated experiments, run the experiments in parallel, analyze and summarize the results, reward model for successes, repeat. The better our models get, and the more compute we have with which to do this automated search, the more likely we are to find something. The more advances we find, the faster and cheaper the process becomes and the more investment will be put into pursuing it. The combination of all of these factors implies a strong accelerating trend beyond a certain threshold. This trend, unlike scaling compute or data, is not expected to sigmoid-out before exceeding human intelligence. That’s what makes this truely a big deal. Without this potential meta-growth, ML would be just a big deal instead of absolutely pivotal. Trying to project ML development without taking this into account is like watching a fuse burn towards a bomb, and trying to model the bomb like a somewhat unusually vigorous campfire.
Where does this “transfer learning across timespans” come from? The main reason I see for checking back in after 3 days is the model’s losing the thread of what the human currently wants, rather than being incapable of pursuing something for longer stretches. A direct parallel is a human worker reporting to a manager on a project—the worker could keep going without check-ins, but their mental model of the larger project goes out of sync within a few days so de facto they’re rate limited by manager check-ins.
Responded in DM.