Hmm, for example, given that the language translation industry is supposedly $60B/yr, and given that we have known for decades that AI can take at least some significant chunk out of this industry at the low-quality end [e.g. tourists were using babelfish.altavista.com in the 1990s despite it sucking], I think someone would have to have been very unreasonable indeed to predict in advance that there will be an eternal plateau in the non-AGI AI market that’s lower than $1B/yr. (And that’s just one industry!) (Of course, that’s not a real prediction in advance ¯\_(ツ)_/¯ )
What I was getting at with “That’s a wild claim!” is that your theory makes an a-priori-obvious prediction (AI systems will grow to a >$1B industry pre-FOOM) and a controversial prediction (>$100T industry), and I think common sense in that situation is to basically ignore the obvious prediction and focus on the controversial one. And Bayesian updating says the same thing. The crux here is whether or not it has always been basically obvious to everyone, long in advance, that there’s at least $1B of work on our planet that can be done by non-FOOM-related AI, which is what I’m claiming in the previous paragraph where I brought up language translation. (Yeah I know, I am speculating about what was obvious to past people without checking what they said at the time—a fraught activity!)
Yeah deep learning can “automate real human cognitive work”, but so can pocket calculators, right? Anyway, I’d have to think more about what my actual plateau prediction is and why. I might reply again later. :)
I feel like your thinking here is actually mostly coming from “hey look at all the cool useful things that deep learning can do and is doing right now”, and is coming much less from the specific figure “$1B/year in 2023 and going up”. Is that fair?
I don’t think it’s obvious a priori that training deep learning to imitate human behavior can predict general behavior well enough to carry on customer support conversations, write marketing copy, or write code well enough to be helpful to software engineers. Similarly it’s not obvious whether it will be able to automate non-trivial debugging, prepare diagrams for a research paper, or generate plausible ML architectures. Perhaps to some people it’s obvious there is a divide here, but to me it’s just not obvious so I need to talk about broad probability distributions over where the divide sits.
I think ~$1B/year is a reasonable indicator of the generality and extent of current automation. I really do care about that number (though I wish I knew it better) and watching it go up is a big deal. If it can just keep being more useful with each passing year, I will become more skeptical of claims about fundamental divides, even if after the fact you can look at each thing and say “well it’s not real strong cognition.” I think you’ll plausibly be able to do that up through the end of days, if you are shameless enough.
I think the big ambiguity is about how you mark out a class of systems that benefit strongly from scale (i.e. such that doubling compute more than doubles economic value) and whether that’s being done correctly here. I think it’s fairly clear that the current crop of systems are much more general and are benefiting much more strongly from scale than previous systems. But it’s up for debate.
I think that everyone (including me) who wasn’t expecting LLMs to do all the cool impressive things that they can in fact do, or who wasn’t expecting LLMs to improve as rapidly as they are in fact improving, is obligated to update on that.
Once I do so update, it’s not immediately obvious to me that I learn anything more from the $1B/yr number. Yes, $1B/yr is plenty of money, but still a drop in the bucket of the >$1T/yr IT industry, and in particular, is dwarfed by a ton of random things like “corporate Linux support contracts”. Mostly I’m surprised that the number is so low!! (…For now!!)
But whatever, I’m not sure that matters for anything.
Anyway…
I did spend considerable time last week pondering where & whether I expect LLMs to plateau. It was a useful exercise; I appreciate your prodding. :)
I don’t really have great confidence in my answers, and I’m mostly redacting the details anyway. But if you care, here are some high-level takeaways of my current thinking:
(1) I expect there to be future systems that centrally incorporate LLMs, but also have other components, and I expect these future systems to be importantly more capable, less safe, and more superficially / straightforwardly agent-y than is an LLM by itself as we think of them today.
IF “LLMs scale to AGI”, I expect that this is how, and I expect that my own research will turn out to be pretty relevant in such a world. More generally, I expect that, in such systems, we’ll find the “traditional LLM alignment discourse” (RLHF fine-tuning, shoggoths, etc.) to be pretty irrelevant, and we’ll find the “traditional agent alignment discourse” (instrumental convergence, goal mis-generalization, etc.) to be obviously & straightforwardly relevant.
(2) One argument that pushes me towards fast takeoff is pretty closely tied to what I wrote in my recent post:
Two different perspectives are:
AGI is about knowing how to do lots of things
AGI is about not knowing how to do something, and then being able to figure it out.
I’m strongly in the second camp.…
The following is a bit crude and not entirely accurate, but to a first approximation I want to say that LLMs have a suite of abstract “concepts” that it has seen in its training data (and that were in the brains of the humans who created that training data), and LLMs are really good at doing mix-and-match compositionality and pattern-match search to build a combinatorial explosion of interesting fresh outputs out of that massive preexisting web of interconnected concepts.
But I think there are some types of possible processes along the lines of:
“invent new useful concepts from scratch—even concepts that have never occurred to any human—and learn them permanently, such that they can be built on the future”
“notice inconsistencies in existing concepts / beliefs, find ways to resolve them, and learn them permanently, such that those mistakes will not be repeated in the future”
etc.
I think LLMs can do things like this a little bit, but not so well that you can repeat them in an infinite loop. For example, I suspect that if you took this technique and put it in an infinite loop, it would go off the rails pretty quickly. But I expect that future systems (of some sort) will eventually be able to do these kinds of things well enough to form a stable loop, i.e. the system will be able to keep running this process (whatever it is) over and over, and not go off the rails, but rather keep “figuring out” more and more things, thus rocketing off to outer space, in a way that’s loosely analogous to self-play in AlphaZero, or to a smart human gradually honing in on a better and better understanding of a complicated machine.
I think this points to an upcoming “discontinuity”, in the sense that I think right now we don’t have systems that can do the above bullet points (at least, not well enough to repeat them in an infinite loop), and I think we will have such systems in the future, and I think we won’t get TAI until we do. And it feels pretty plausible to me (admittedly not based on much!) that it would only take a couple years or less between “widespread knowledge of how to build such systems” and “someone gets an implementation working well enough that they can run it in an infinite loop and it just keeps “figuring out” more and more things, correctly, and thus it rockets off to radically superhuman intelligence and capabilities.”
(3) I’m still mostly expecting LLMs (and more broadly, LLM-based systems) to not be able to do the above bullet point things, and (relatedly) to plateau at a level where they mainly assist rather than replace smart humans. This is tied to fundamental architectural limitations that I believe transformers have (and indeed, that I believe DNNs more generally have), which I don’t want to talk about…
(4) …but I could totally be wrong. ¯\_(ツ)_/¯ And I think that, for various reasons, my current day-to-day research program is not too sensitive to the possibility that I’m wrong about that.
Steven: as someone who has read all your posts agrees with you on almost everything, this is a point where I have a clear disagreement with you. When I switched from neuroscience to doing ML full-time, some of the stuff I read to get up to speed was people theorizing about impossibly large (infinite or practically so) neural networks. I think that the literature on this does a pretty good job of establishing that, in the limit, neural networks can compute any sort of function. Which means that they can compute all the functions in a human brain, or a set of human brains. Meaning, it’s not a question of whether scaling CAN get us to AGI. It certainly can. It’s a question of when. There is inefficiency in trying to scale an algorithm which tries to brute force learn the relevant functions rather than have them hardcoded in via genetics. I think that you are right that there are certain functions the human brain does quite well that current SoTA LLMs do very poorly. I don’t think this means that scaling LLMs can’t lead to a point where the relevant capabilities suddenly emerge. I think we are already in a regime of substantial compute and data overhang for AGI, and that the thing holding us back is the proper design and integration of modules which emulate the functions of parts of the brain not currently well imitated by LLMs. Like the reward and valence systems of the basal ganglia, for instance. It’s still an open question to me whether we will get to AGI via scaling or algorithmic improvement. Imagine for a moment that I am correct that scaling LLMs could get us there, but also that a vastly more efficient system which borrows more functions from the human brain is possible. What might this scenario look like? Perhaps an LLM gets strong enough to, upon human prompting and with human assistance, analyze the computational neuroscience literature and open source code, and extract useful functions, and then do some combination of intuitively improve their efficiency and brute force test them in new experimental ML architectures. This is not so big a leap from what GPT-4 is capable of. I think that that’s plausibly even a GPT-5 level of skill. Suppose also that these new architectures can be added onto existing LLM base models, rather than needed the base model to be retrained from scratch. As some critical amount of discoveries accumulate, GPT-5 suddenly takes a jump forward in efficacy, enabling it to process the rest of the potential improvements much faster, and then it takes another big jump forward, and then is able to rapidly self-improve with no further need for studying existing published research. In such a scenario, we’d have a foom over the course of a few days which could take us by surprise and lead to a rapid loss of control. This is exactly why I think Astera’s work is risky, even though their current code seems quite harmless on its own. I think it is focused on (some of) the places where LLMs do poorly, but also that there’s nothing stopping the work from being effectively integrated with existing models for substantial capability gains. This is why I got so upset with Astera when I realized during my interview process with them that they were open-sourcing their code, and also when I carefully read through their code and saw great potential there for integrating it with more mainstream ML to the empowerment of both.
literature examples of the sort of thing I’m talking about with ‘enough scaling will eventually get us there’, even though I haven’t read this particular paper: https://arxiv.org/abs/2112.15577
Paul: I think you are making a valid point here. I think your point is (sadly) getting obscured by the fact our assumptions have shifted under our feet since the time when you began to make your point about slow vs fast takeoff.
I’d like to explain what I think the point you are right on is, and then try to describe how I think we need a new framing for the next set of predictions.
Several years ago, Eliezer and MIRI generally were frequently emphasizing the idea of a fast take-off that snuck up on us before the world had been much changed by narrow AI. You correctly predicted that the world would indeed be transformed in a very noticeable way by narrow AI before AGI. Eliezer in discussions with you has failed to acknowledge ways in which his views shifted from what they were ~10 years ago towards your views. I think this reflects poorly on him, but I still think he has a lot of good ideas, and made a lot of important predictions well in advance of other people realizing how important this was all going to be. As I’ve stated before, I often find my own views seeming to be located somewhere in-between your views and Eliezer’s wherever you two disagree.
I think we should acknowledge your point that the world being changed in a very noticeable way by AI before true AGI, just as you have acknowledged Eliezer’s point that once a runaway out-of-human-control human-out-of-the-loop recursive-self-improvement process gets started it could potentially proceed shockingly fast and lead to a loss of humanity’s ability to regain control of the resulting AGI even once we realized what is happening. [I say Eliezer’s point here, not to suggest that you disagreed with him on this point, but simply that he was making this a central part of his predictions from fairly early on.]
I think the framing we need now is: how can we predict, detect, and halt such a runaway RSI process before it is too late? This is important to consider from multiple angles. I mostly think that the big AI labs are being reasonably wary about this (although they certainly could do better). What concerns me more is the sort of people out in the wild who will take open source code and do dumb or evil stuff with it, Chaos-GPT-style, for personal gain or amusement. I think the biggest danger we face is that affordable open-source models seem to be lagging only a few years behind SotA models, and that the world is full of chaotic people who could (knowingly or not) trigger a runaway RSI process if such a thing is cheap and easy to do.
In such a strategic landscape, it could be crucially important to figure out how to:
a) slow down the progress of open source models, to keep dangerous runaway RSI from becoming cheap and easy to trigger
b) use SotA models to develop better methods of monitoring and preventing anyone outside a reasonably-safe-behaving org from doing this dangerous thing.
c) improving the ability of the reasonable orgs to self-monitor and notice the danger before they blow themselves up
I think that it does not make strategic sense to actively hinder the big AI labs. I think our best move is to help them move more safely, while also trying to build tools and regulation for monitoring the world’s compute. I do not think there is any feasible solution for this which doesn’t utilize powerful AI tools to help with the monitoring process. These AI tools could be along the lines of SotA LLMs, or something different like an internet police force made up of something like Conjecture’s CogEms. Or perhaps some sort of BCI or gene-mod upgraded humans (though I doubt we have time for this).
My view is that algorithmic progress, pointed to by neuroscience, is on the cusp of being discovered, and if those insights are published, will make powerful AGI cheap and available to all competent programmers everywhere in the world. With so many people searching, and the necessary knowledge so widely distributed, I don’t think we can count on keeping this under wraps forever. Rather than have these insights get discovered and immediately shared widely (e.g. by some academic eager to publish an exciting paper who didn’t realize the full power and implications of their discovery), I think it would be far better to have a safety-conscious lab discover these, have a way to safely monitor themselves to notice the danger and potential power of what they’ve discovered. They can then keep the discoveries secret and collaborate with other safety-conscious groups and with governments to set up the worldwide monitoring we need to prevent a rogue AGI scenario. Once we have that, we can move safely to the long reflection and take our time figuring out better solutions to alignment. [An important crux for me here is that I believe that if we have control of an AGI which we know is potentially capable of recursively self-improving beyond our bounds to control it, we can successfully utilize this AGI at its current level of ability without letting it self-improve. If someone convinced me that this was untenable, it would change my strategic recommendations.]
As you can see from this prediction market I made, a lot of people currently disagree with me. I expect this will be a different looking distribution a year from now.
Here’s an intuition pump analogy for how I’ve been thinking about this. Imagine that I, as someone with a background in neuroscience and ML was granted the following set of abilities. Would you bet that I, with this set of abilities, would be able to do RSI? I would.
Abilities that I would have if I were an ML model trying to self-improve:
Make many copies of myself, and checkpoints throughout the process.
Work at high speed and in parallel with copies of myself.
Read all the existing scientific literature that seemed potentially relevant.
Observe all the connections between my neurons, all the activations of my clones as I expose them to various stimuli or run them through simulations.
Ability to edit these weights and connections.
Ability to add neurons (up to a point) where they seemed most needed, connected in any way I see fit, initialized with whatever weights I choose.
Ability to assemble new datasets and build new simulations to do additional training with.
Ability to freeze some subsection of a clone’s model and thus more rapidly train the remaining unfrozen section.
Ability to take notes and write collaborative documents with my clones working in parallel with me.
Ok. Thinking about that set of abilities, doesn’t it seem like a sufficiently creative, intelligent, determined general agent could successfully self-improve? I think so. I agree it’s unclear where the threshold is exactly, and when a transformer-based ML model will cross that threshold. I’ve made a bet at ‘GPT-5’, but honestly I’m not certain. Could be longer. Could be sooner...
Sorry @the gears to ascension . I know your view is that it would be better for me to be quiet about this, but I think the benefits of speaking up in this case outweigh the potential costs.
Hmm, for example, given that the language translation industry is supposedly $60B/yr, and given that we have known for decades that AI can take at least some significant chunk out of this industry at the low-quality end [e.g. tourists were using babelfish.altavista.com in the 1990s despite it sucking], I think someone would have to have been very unreasonable indeed to predict in advance that there will be an eternal plateau in the non-AGI AI market that’s lower than $1B/yr. (And that’s just one industry!) (Of course, that’s not a real prediction in advance ¯\_(ツ)_/¯ )
What I was getting at with “That’s a wild claim!” is that your theory makes an a-priori-obvious prediction (AI systems will grow to a >$1B industry pre-FOOM) and a controversial prediction (>$100T industry), and I think common sense in that situation is to basically ignore the obvious prediction and focus on the controversial one. And Bayesian updating says the same thing. The crux here is whether or not it has always been basically obvious to everyone, long in advance, that there’s at least $1B of work on our planet that can be done by non-FOOM-related AI, which is what I’m claiming in the previous paragraph where I brought up language translation. (Yeah I know, I am speculating about what was obvious to past people without checking what they said at the time—a fraught activity!)
Yeah deep learning can “automate real human cognitive work”, but so can pocket calculators, right? Anyway, I’d have to think more about what my actual plateau prediction is and why. I might reply again later. :)
I feel like your thinking here is actually mostly coming from “hey look at all the cool useful things that deep learning can do and is doing right now”, and is coming much less from the specific figure “$1B/year in 2023 and going up”. Is that fair?
I don’t think it’s obvious a priori that training deep learning to imitate human behavior can predict general behavior well enough to carry on customer support conversations, write marketing copy, or write code well enough to be helpful to software engineers. Similarly it’s not obvious whether it will be able to automate non-trivial debugging, prepare diagrams for a research paper, or generate plausible ML architectures. Perhaps to some people it’s obvious there is a divide here, but to me it’s just not obvious so I need to talk about broad probability distributions over where the divide sits.
I think ~$1B/year is a reasonable indicator of the generality and extent of current automation. I really do care about that number (though I wish I knew it better) and watching it go up is a big deal. If it can just keep being more useful with each passing year, I will become more skeptical of claims about fundamental divides, even if after the fact you can look at each thing and say “well it’s not real strong cognition.” I think you’ll plausibly be able to do that up through the end of days, if you are shameless enough.
I think the big ambiguity is about how you mark out a class of systems that benefit strongly from scale (i.e. such that doubling compute more than doubles economic value) and whether that’s being done correctly here. I think it’s fairly clear that the current crop of systems are much more general and are benefiting much more strongly from scale than previous systems. But it’s up for debate.
Hmm. I think we’re talking past each other a bit.
I think that everyone (including me) who wasn’t expecting LLMs to do all the cool impressive things that they can in fact do, or who wasn’t expecting LLMs to improve as rapidly as they are in fact improving, is obligated to update on that.
Once I do so update, it’s not immediately obvious to me that I learn anything more from the $1B/yr number. Yes, $1B/yr is plenty of money, but still a drop in the bucket of the >$1T/yr IT industry, and in particular, is dwarfed by a ton of random things like “corporate Linux support contracts”. Mostly I’m surprised that the number is so low!! (…For now!!)
But whatever, I’m not sure that matters for anything.
Anyway…
I did spend considerable time last week pondering where & whether I expect LLMs to plateau. It was a useful exercise; I appreciate your prodding. :)
I don’t really have great confidence in my answers, and I’m mostly redacting the details anyway. But if you care, here are some high-level takeaways of my current thinking:
(1) I expect there to be future systems that centrally incorporate LLMs, but also have other components, and I expect these future systems to be importantly more capable, less safe, and more superficially / straightforwardly agent-y than is an LLM by itself as we think of them today.
IF “LLMs scale to AGI”, I expect that this is how, and I expect that my own research will turn out to be pretty relevant in such a world. More generally, I expect that, in such systems, we’ll find the “traditional LLM alignment discourse” (RLHF fine-tuning, shoggoths, etc.) to be pretty irrelevant, and we’ll find the “traditional agent alignment discourse” (instrumental convergence, goal mis-generalization, etc.) to be obviously & straightforwardly relevant.
(2) One argument that pushes me towards fast takeoff is pretty closely tied to what I wrote in my recent post:
The following is a bit crude and not entirely accurate, but to a first approximation I want to say that LLMs have a suite of abstract “concepts” that it has seen in its training data (and that were in the brains of the humans who created that training data), and LLMs are really good at doing mix-and-match compositionality and pattern-match search to build a combinatorial explosion of interesting fresh outputs out of that massive preexisting web of interconnected concepts.
But I think there are some types of possible processes along the lines of:
“invent new useful concepts from scratch—even concepts that have never occurred to any human—and learn them permanently, such that they can be built on the future”
“notice inconsistencies in existing concepts / beliefs, find ways to resolve them, and learn them permanently, such that those mistakes will not be repeated in the future”
etc.
I think LLMs can do things like this a little bit, but not so well that you can repeat them in an infinite loop. For example, I suspect that if you took this technique and put it in an infinite loop, it would go off the rails pretty quickly. But I expect that future systems (of some sort) will eventually be able to do these kinds of things well enough to form a stable loop, i.e. the system will be able to keep running this process (whatever it is) over and over, and not go off the rails, but rather keep “figuring out” more and more things, thus rocketing off to outer space, in a way that’s loosely analogous to self-play in AlphaZero, or to a smart human gradually honing in on a better and better understanding of a complicated machine.
I think this points to an upcoming “discontinuity”, in the sense that I think right now we don’t have systems that can do the above bullet points (at least, not well enough to repeat them in an infinite loop), and I think we will have such systems in the future, and I think we won’t get TAI until we do. And it feels pretty plausible to me (admittedly not based on much!) that it would only take a couple years or less between “widespread knowledge of how to build such systems” and “someone gets an implementation working well enough that they can run it in an infinite loop and it just keeps “figuring out” more and more things, correctly, and thus it rockets off to radically superhuman intelligence and capabilities.”
(3) I’m still mostly expecting LLMs (and more broadly, LLM-based systems) to not be able to do the above bullet point things, and (relatedly) to plateau at a level where they mainly assist rather than replace smart humans. This is tied to fundamental architectural limitations that I believe transformers have (and indeed, that I believe DNNs more generally have), which I don’t want to talk about…
(4) …but I could totally be wrong. ¯\_(ツ)_/¯ And I think that, for various reasons, my current day-to-day research program is not too sensitive to the possibility that I’m wrong about that.
Steven: as someone who has read all your posts agrees with you on almost everything, this is a point where I have a clear disagreement with you. When I switched from neuroscience to doing ML full-time, some of the stuff I read to get up to speed was people theorizing about impossibly large (infinite or practically so) neural networks. I think that the literature on this does a pretty good job of establishing that, in the limit, neural networks can compute any sort of function. Which means that they can compute all the functions in a human brain, or a set of human brains. Meaning, it’s not a question of whether scaling CAN get us to AGI. It certainly can. It’s a question of when. There is inefficiency in trying to scale an algorithm which tries to brute force learn the relevant functions rather than have them hardcoded in via genetics. I think that you are right that there are certain functions the human brain does quite well that current SoTA LLMs do very poorly. I don’t think this means that scaling LLMs can’t lead to a point where the relevant capabilities suddenly emerge. I think we are already in a regime of substantial compute and data overhang for AGI, and that the thing holding us back is the proper design and integration of modules which emulate the functions of parts of the brain not currently well imitated by LLMs. Like the reward and valence systems of the basal ganglia, for instance. It’s still an open question to me whether we will get to AGI via scaling or algorithmic improvement. Imagine for a moment that I am correct that scaling LLMs could get us there, but also that a vastly more efficient system which borrows more functions from the human brain is possible. What might this scenario look like? Perhaps an LLM gets strong enough to, upon human prompting and with human assistance, analyze the computational neuroscience literature and open source code, and extract useful functions, and then do some combination of intuitively improve their efficiency and brute force test them in new experimental ML architectures. This is not so big a leap from what GPT-4 is capable of. I think that that’s plausibly even a GPT-5 level of skill. Suppose also that these new architectures can be added onto existing LLM base models, rather than needed the base model to be retrained from scratch. As some critical amount of discoveries accumulate, GPT-5 suddenly takes a jump forward in efficacy, enabling it to process the rest of the potential improvements much faster, and then it takes another big jump forward, and then is able to rapidly self-improve with no further need for studying existing published research. In such a scenario, we’d have a foom over the course of a few days which could take us by surprise and lead to a rapid loss of control. This is exactly why I think Astera’s work is risky, even though their current code seems quite harmless on its own. I think it is focused on (some of) the places where LLMs do poorly, but also that there’s nothing stopping the work from being effectively integrated with existing models for substantial capability gains. This is why I got so upset with Astera when I realized during my interview process with them that they were open-sourcing their code, and also when I carefully read through their code and saw great potential there for integrating it with more mainstream ML to the empowerment of both.
literature examples of the sort of thing I’m talking about with ‘enough scaling will eventually get us there’, even though I haven’t read this particular paper: https://arxiv.org/abs/2112.15577
https://openreview.net/forum?id=HyGBdo0qFm
Paul: I think you are making a valid point here. I think your point is (sadly) getting obscured by the fact our assumptions have shifted under our feet since the time when you began to make your point about slow vs fast takeoff.
I’d like to explain what I think the point you are right on is, and then try to describe how I think we need a new framing for the next set of predictions.
Several years ago, Eliezer and MIRI generally were frequently emphasizing the idea of a fast take-off that snuck up on us before the world had been much changed by narrow AI. You correctly predicted that the world would indeed be transformed in a very noticeable way by narrow AI before AGI. Eliezer in discussions with you has failed to acknowledge ways in which his views shifted from what they were ~10 years ago towards your views. I think this reflects poorly on him, but I still think he has a lot of good ideas, and made a lot of important predictions well in advance of other people realizing how important this was all going to be. As I’ve stated before, I often find my own views seeming to be located somewhere in-between your views and Eliezer’s wherever you two disagree.
I think we should acknowledge your point that the world being changed in a very noticeable way by AI before true AGI, just as you have acknowledged Eliezer’s point that once a runaway out-of-human-control human-out-of-the-loop recursive-self-improvement process gets started it could potentially proceed shockingly fast and lead to a loss of humanity’s ability to regain control of the resulting AGI even once we realized what is happening. [I say Eliezer’s point here, not to suggest that you disagreed with him on this point, but simply that he was making this a central part of his predictions from fairly early on.]
I think the framing we need now is: how can we predict, detect, and halt such a runaway RSI process before it is too late? This is important to consider from multiple angles. I mostly think that the big AI labs are being reasonably wary about this (although they certainly could do better). What concerns me more is the sort of people out in the wild who will take open source code and do dumb or evil stuff with it, Chaos-GPT-style, for personal gain or amusement. I think the biggest danger we face is that affordable open-source models seem to be lagging only a few years behind SotA models, and that the world is full of chaotic people who could (knowingly or not) trigger a runaway RSI process if such a thing is cheap and easy to do.
In such a strategic landscape, it could be crucially important to figure out how to:
a) slow down the progress of open source models, to keep dangerous runaway RSI from becoming cheap and easy to trigger
b) use SotA models to develop better methods of monitoring and preventing anyone outside a reasonably-safe-behaving org from doing this dangerous thing.
c) improving the ability of the reasonable orgs to self-monitor and notice the danger before they blow themselves up
I think that it does not make strategic sense to actively hinder the big AI labs. I think our best move is to help them move more safely, while also trying to build tools and regulation for monitoring the world’s compute. I do not think there is any feasible solution for this which doesn’t utilize powerful AI tools to help with the monitoring process. These AI tools could be along the lines of SotA LLMs, or something different like an internet police force made up of something like Conjecture’s CogEms. Or perhaps some sort of BCI or gene-mod upgraded humans (though I doubt we have time for this).
My view is that algorithmic progress, pointed to by neuroscience, is on the cusp of being discovered, and if those insights are published, will make powerful AGI cheap and available to all competent programmers everywhere in the world. With so many people searching, and the necessary knowledge so widely distributed, I don’t think we can count on keeping this under wraps forever. Rather than have these insights get discovered and immediately shared widely (e.g. by some academic eager to publish an exciting paper who didn’t realize the full power and implications of their discovery), I think it would be far better to have a safety-conscious lab discover these, have a way to safely monitor themselves to notice the danger and potential power of what they’ve discovered. They can then keep the discoveries secret and collaborate with other safety-conscious groups and with governments to set up the worldwide monitoring we need to prevent a rogue AGI scenario. Once we have that, we can move safely to the long reflection and take our time figuring out better solutions to alignment. [An important crux for me here is that I believe that if we have control of an AGI which we know is potentially capable of recursively self-improving beyond our bounds to control it, we can successfully utilize this AGI at its current level of ability without letting it self-improve. If someone convinced me that this was untenable, it would change my strategic recommendations.]
As @Jed McCaleb said in his recent post, ‘The only way forward is through!’. https://www.lesswrong.com/posts/vEtdjWuFrRwffWBiP/we-have-to-upgrade
As you can see from this prediction market I made, a lot of people currently disagree with me. I expect this will be a different looking distribution a year from now.
https://manifold.markets/NathanHelmBurger/will-gpt5-be-capable-of-recursive-s?r=TmF0aGFuSGVsbUJ1cmdlcg
Here’s an intuition pump analogy for how I’ve been thinking about this. Imagine that I, as someone with a background in neuroscience and ML was granted the following set of abilities. Would you bet that I, with this set of abilities, would be able to do RSI? I would.
Abilities that I would have if I were an ML model trying to self-improve:
Make many copies of myself, and checkpoints throughout the process.
Work at high speed and in parallel with copies of myself.
Read all the existing scientific literature that seemed potentially relevant.
Observe all the connections between my neurons, all the activations of my clones as I expose them to various stimuli or run them through simulations.
Ability to edit these weights and connections.
Ability to add neurons (up to a point) where they seemed most needed, connected in any way I see fit, initialized with whatever weights I choose.
Ability to assemble new datasets and build new simulations to do additional training with.
Ability to freeze some subsection of a clone’s model and thus more rapidly train the remaining unfrozen section.
Ability to take notes and write collaborative documents with my clones working in parallel with me.
Ok. Thinking about that set of abilities, doesn’t it seem like a sufficiently creative, intelligent, determined general agent could successfully self-improve? I think so. I agree it’s unclear where the threshold is exactly, and when a transformer-based ML model will cross that threshold. I’ve made a bet at ‘GPT-5’, but honestly I’m not certain. Could be longer. Could be sooner...
Sorry @the gears to ascension . I know your view is that it would be better for me to be quiet about this, but I think the benefits of speaking up in this case outweigh the potential costs.
oh, no worries, this part is obvious