But if non-AGI systems in fact transform the world before AGI is built, then I don’t think I should care nearly as much about your concept of “AGI.”
Fair enough! I do in fact expect that AI will not be transformative-in-the-OpenPhil-sense (i.e. as much or more than the agricultural or industrial revolution) unless that AI is importantly different from today’s LLMs (e.g. advanced model-based RL). But I don’t think we’ve gotten much evidence on this hypothesis either way so far, right?
For example: I think if you walk up to some normal person and say “We already today have direct (albeit weak) evidence that LLMs will evolve into something that transforms the world much much more than electrification + airplanes + integrated circuits + the internet combined”, I think they would say “WTF?? That is a totally wild claim, and we do NOT already have direct evidence for it”. Right?
If you are mean “transformative” in a weaker-than-OpenPhil sense, well the internet “transformed the world” according to everyday usage, and the impact of the internet on the economy is (AFAICT) >$10T. I suppose that the fact that the internet exists is somewhat relevant to AGI x-risk, but I don’t think it’s very relevant. I think that people trying to make AGI go well in a hypothetical world where the internet doesn’t exist would be mostly doing pretty similar things as we are.
The most realistic view I see that implies fast takeoff without making predictions about existing systems is:
You have very short AGI timelines based on your reasoning about AGI.
Non-AGI impact simply can’t grow quickly enough to be large prior to AGI.
Why not “non-AGI AI systems will eventually be (at most) comparably impactful to the internet or automobiles or the printing press, before plateauing, and this is ridiculously impactful by everyday standards, but it doesn’t strongly change the story of how we should be thinking about AGI”?
BTW, why do we care about slow takeoff anyway?
Slow takeoff suggests that we see earlier smaller failures that have important structural similarity to later x-risk-level failures
Slow takeoff means the world that AGI will appear in will be so different from the current world that it totally changes what makes sense to do right now about AGI x-risk.
(Anything else?)
I can believe that “LLMs will transform the world” comparably to how the internet or the integrated circuit has transformed the world, without expecting either of those bullets to be true, right?
I think we are getting significant evidence about the plausibility that deep learning is able to automate real human cognitive work, and we are seeing extremely rapid increases in revenue and investment. I think those observations have extremely high probability if big deep learning systems will be transformative (this is practically necessary to see!), and fairly low base rate (not clear exactly how low but I think 25% seems reasonable and generous).
So yeah, I think that we have gotten considerable evidence about this, more than a factor of 4. I’ve personally updated my views by about a factor of 2, from a 25% chance to a 50% chance that scaling up deep learning is the real deal and leads to transformation soon. I don’t think “That’s a wild claim!” means you don’t have evidence, that’s not how evidence works.
I think that normal people who follow tech have also moved their views massively. They take the possibility of crazy transformations from deep leaning much more seriously than they did 5 years ago. They are much more likely to view deep learning as producing systems analogous to humans in economically relevant ways. And so on.
Why not “non-AGI AI systems will eventually be (at most) comparably impactful to the internet or automobiles or the printing press, before plateauing, and this is ridiculously impactful by everyday standards, but it doesn’t strongly change the story of how we should be thinking about AGI”?
That view is fine, but now I’m just asking what your probability distribution is over the location of that plateau. Is it no evidence to see LMs at $10 billion? $100 billion? Is your probability distribution just concentrated with 100% of its mass between $1 trillion and $10 trillion? (And if so: why?)
It’s maybe plausible to say that your hypothesis is just like mine but with strong cutoffs at some particular large level like $1 trillion. But why? What principle makes an impact of $1 trillion possible but $10 trillion impossible?
Incidentally, I don’t think the internet adds $10 trillion of value. I agree that as I usually operationalize it the soft takeoff view is not technically falsified until AI gets to ~$100 trillion per year (though by $10 trillion I think fast takeoff has gotten considerably less feasible in addition to this update, and the world will probably be significantly changed and prepared for AI in a way that is a large part of what matters about slow takeoff), so we could replace the upper end of that interval with $100 trillion if we wanted to be more generous.
Hmm, for example, given that the language translation industry is supposedly $60B/yr, and given that we have known for decades that AI can take at least some significant chunk out of this industry at the low-quality end [e.g. tourists were using babelfish.altavista.com in the 1990s despite it sucking], I think someone would have to have been very unreasonable indeed to predict in advance that there will be an eternal plateau in the non-AGI AI market that’s lower than $1B/yr. (And that’s just one industry!) (Of course, that’s not a real prediction in advance ¯\_(ツ)_/¯ )
What I was getting at with “That’s a wild claim!” is that your theory makes an a-priori-obvious prediction (AI systems will grow to a >$1B industry pre-FOOM) and a controversial prediction (>$100T industry), and I think common sense in that situation is to basically ignore the obvious prediction and focus on the controversial one. And Bayesian updating says the same thing. The crux here is whether or not it has always been basically obvious to everyone, long in advance, that there’s at least $1B of work on our planet that can be done by non-FOOM-related AI, which is what I’m claiming in the previous paragraph where I brought up language translation. (Yeah I know, I am speculating about what was obvious to past people without checking what they said at the time—a fraught activity!)
Yeah deep learning can “automate real human cognitive work”, but so can pocket calculators, right? Anyway, I’d have to think more about what my actual plateau prediction is and why. I might reply again later. :)
I feel like your thinking here is actually mostly coming from “hey look at all the cool useful things that deep learning can do and is doing right now”, and is coming much less from the specific figure “$1B/year in 2023 and going up”. Is that fair?
I don’t think it’s obvious a priori that training deep learning to imitate human behavior can predict general behavior well enough to carry on customer support conversations, write marketing copy, or write code well enough to be helpful to software engineers. Similarly it’s not obvious whether it will be able to automate non-trivial debugging, prepare diagrams for a research paper, or generate plausible ML architectures. Perhaps to some people it’s obvious there is a divide here, but to me it’s just not obvious so I need to talk about broad probability distributions over where the divide sits.
I think ~$1B/year is a reasonable indicator of the generality and extent of current automation. I really do care about that number (though I wish I knew it better) and watching it go up is a big deal. If it can just keep being more useful with each passing year, I will become more skeptical of claims about fundamental divides, even if after the fact you can look at each thing and say “well it’s not real strong cognition.” I think you’ll plausibly be able to do that up through the end of days, if you are shameless enough.
I think the big ambiguity is about how you mark out a class of systems that benefit strongly from scale (i.e. such that doubling compute more than doubles economic value) and whether that’s being done correctly here. I think it’s fairly clear that the current crop of systems are much more general and are benefiting much more strongly from scale than previous systems. But it’s up for debate.
I think that everyone (including me) who wasn’t expecting LLMs to do all the cool impressive things that they can in fact do, or who wasn’t expecting LLMs to improve as rapidly as they are in fact improving, is obligated to update on that.
Once I do so update, it’s not immediately obvious to me that I learn anything more from the $1B/yr number. Yes, $1B/yr is plenty of money, but still a drop in the bucket of the >$1T/yr IT industry, and in particular, is dwarfed by a ton of random things like “corporate Linux support contracts”. Mostly I’m surprised that the number is so low!! (…For now!!)
But whatever, I’m not sure that matters for anything.
Anyway…
I did spend considerable time last week pondering where & whether I expect LLMs to plateau. It was a useful exercise; I appreciate your prodding. :)
I don’t really have great confidence in my answers, and I’m mostly redacting the details anyway. But if you care, here are some high-level takeaways of my current thinking:
(1) I expect there to be future systems that centrally incorporate LLMs, but also have other components, and I expect these future systems to be importantly more capable, less safe, and more superficially / straightforwardly agent-y than is an LLM by itself as we think of them today.
IF “LLMs scale to AGI”, I expect that this is how, and I expect that my own research will turn out to be pretty relevant in such a world. More generally, I expect that, in such systems, we’ll find the “traditional LLM alignment discourse” (RLHF fine-tuning, shoggoths, etc.) to be pretty irrelevant, and we’ll find the “traditional agent alignment discourse” (instrumental convergence, goal mis-generalization, etc.) to be obviously & straightforwardly relevant.
(2) One argument that pushes me towards fast takeoff is pretty closely tied to what I wrote in my recent post:
Two different perspectives are:
AGI is about knowing how to do lots of things
AGI is about not knowing how to do something, and then being able to figure it out.
I’m strongly in the second camp.…
The following is a bit crude and not entirely accurate, but to a first approximation I want to say that LLMs have a suite of abstract “concepts” that it has seen in its training data (and that were in the brains of the humans who created that training data), and LLMs are really good at doing mix-and-match compositionality and pattern-match search to build a combinatorial explosion of interesting fresh outputs out of that massive preexisting web of interconnected concepts.
But I think there are some types of possible processes along the lines of:
“invent new useful concepts from scratch—even concepts that have never occurred to any human—and learn them permanently, such that they can be built on the future”
“notice inconsistencies in existing concepts / beliefs, find ways to resolve them, and learn them permanently, such that those mistakes will not be repeated in the future”
etc.
I think LLMs can do things like this a little bit, but not so well that you can repeat them in an infinite loop. For example, I suspect that if you took this technique and put it in an infinite loop, it would go off the rails pretty quickly. But I expect that future systems (of some sort) will eventually be able to do these kinds of things well enough to form a stable loop, i.e. the system will be able to keep running this process (whatever it is) over and over, and not go off the rails, but rather keep “figuring out” more and more things, thus rocketing off to outer space, in a way that’s loosely analogous to self-play in AlphaZero, or to a smart human gradually honing in on a better and better understanding of a complicated machine.
I think this points to an upcoming “discontinuity”, in the sense that I think right now we don’t have systems that can do the above bullet points (at least, not well enough to repeat them in an infinite loop), and I think we will have such systems in the future, and I think we won’t get TAI until we do. And it feels pretty plausible to me (admittedly not based on much!) that it would only take a couple years or less between “widespread knowledge of how to build such systems” and “someone gets an implementation working well enough that they can run it in an infinite loop and it just keeps “figuring out” more and more things, correctly, and thus it rockets off to radically superhuman intelligence and capabilities.”
(3) I’m still mostly expecting LLMs (and more broadly, LLM-based systems) to not be able to do the above bullet point things, and (relatedly) to plateau at a level where they mainly assist rather than replace smart humans. This is tied to fundamental architectural limitations that I believe transformers have (and indeed, that I believe DNNs more generally have), which I don’t want to talk about…
(4) …but I could totally be wrong. ¯\_(ツ)_/¯ And I think that, for various reasons, my current day-to-day research program is not too sensitive to the possibility that I’m wrong about that.
Steven: as someone who has read all your posts agrees with you on almost everything, this is a point where I have a clear disagreement with you. When I switched from neuroscience to doing ML full-time, some of the stuff I read to get up to speed was people theorizing about impossibly large (infinite or practically so) neural networks. I think that the literature on this does a pretty good job of establishing that, in the limit, neural networks can compute any sort of function. Which means that they can compute all the functions in a human brain, or a set of human brains. Meaning, it’s not a question of whether scaling CAN get us to AGI. It certainly can. It’s a question of when. There is inefficiency in trying to scale an algorithm which tries to brute force learn the relevant functions rather than have them hardcoded in via genetics. I think that you are right that there are certain functions the human brain does quite well that current SoTA LLMs do very poorly. I don’t think this means that scaling LLMs can’t lead to a point where the relevant capabilities suddenly emerge. I think we are already in a regime of substantial compute and data overhang for AGI, and that the thing holding us back is the proper design and integration of modules which emulate the functions of parts of the brain not currently well imitated by LLMs. Like the reward and valence systems of the basal ganglia, for instance. It’s still an open question to me whether we will get to AGI via scaling or algorithmic improvement. Imagine for a moment that I am correct that scaling LLMs could get us there, but also that a vastly more efficient system which borrows more functions from the human brain is possible. What might this scenario look like? Perhaps an LLM gets strong enough to, upon human prompting and with human assistance, analyze the computational neuroscience literature and open source code, and extract useful functions, and then do some combination of intuitively improve their efficiency and brute force test them in new experimental ML architectures. This is not so big a leap from what GPT-4 is capable of. I think that that’s plausibly even a GPT-5 level of skill. Suppose also that these new architectures can be added onto existing LLM base models, rather than needed the base model to be retrained from scratch. As some critical amount of discoveries accumulate, GPT-5 suddenly takes a jump forward in efficacy, enabling it to process the rest of the potential improvements much faster, and then it takes another big jump forward, and then is able to rapidly self-improve with no further need for studying existing published research. In such a scenario, we’d have a foom over the course of a few days which could take us by surprise and lead to a rapid loss of control. This is exactly why I think Astera’s work is risky, even though their current code seems quite harmless on its own. I think it is focused on (some of) the places where LLMs do poorly, but also that there’s nothing stopping the work from being effectively integrated with existing models for substantial capability gains. This is why I got so upset with Astera when I realized during my interview process with them that they were open-sourcing their code, and also when I carefully read through their code and saw great potential there for integrating it with more mainstream ML to the empowerment of both.
literature examples of the sort of thing I’m talking about with ‘enough scaling will eventually get us there’, even though I haven’t read this particular paper: https://arxiv.org/abs/2112.15577
Paul: I think you are making a valid point here. I think your point is (sadly) getting obscured by the fact our assumptions have shifted under our feet since the time when you began to make your point about slow vs fast takeoff.
I’d like to explain what I think the point you are right on is, and then try to describe how I think we need a new framing for the next set of predictions.
Several years ago, Eliezer and MIRI generally were frequently emphasizing the idea of a fast take-off that snuck up on us before the world had been much changed by narrow AI. You correctly predicted that the world would indeed be transformed in a very noticeable way by narrow AI before AGI. Eliezer in discussions with you has failed to acknowledge ways in which his views shifted from what they were ~10 years ago towards your views. I think this reflects poorly on him, but I still think he has a lot of good ideas, and made a lot of important predictions well in advance of other people realizing how important this was all going to be. As I’ve stated before, I often find my own views seeming to be located somewhere in-between your views and Eliezer’s wherever you two disagree.
I think we should acknowledge your point that the world being changed in a very noticeable way by AI before true AGI, just as you have acknowledged Eliezer’s point that once a runaway out-of-human-control human-out-of-the-loop recursive-self-improvement process gets started it could potentially proceed shockingly fast and lead to a loss of humanity’s ability to regain control of the resulting AGI even once we realized what is happening. [I say Eliezer’s point here, not to suggest that you disagreed with him on this point, but simply that he was making this a central part of his predictions from fairly early on.]
I think the framing we need now is: how can we predict, detect, and halt such a runaway RSI process before it is too late? This is important to consider from multiple angles. I mostly think that the big AI labs are being reasonably wary about this (although they certainly could do better). What concerns me more is the sort of people out in the wild who will take open source code and do dumb or evil stuff with it, Chaos-GPT-style, for personal gain or amusement. I think the biggest danger we face is that affordable open-source models seem to be lagging only a few years behind SotA models, and that the world is full of chaotic people who could (knowingly or not) trigger a runaway RSI process if such a thing is cheap and easy to do.
In such a strategic landscape, it could be crucially important to figure out how to:
a) slow down the progress of open source models, to keep dangerous runaway RSI from becoming cheap and easy to trigger
b) use SotA models to develop better methods of monitoring and preventing anyone outside a reasonably-safe-behaving org from doing this dangerous thing.
c) improving the ability of the reasonable orgs to self-monitor and notice the danger before they blow themselves up
I think that it does not make strategic sense to actively hinder the big AI labs. I think our best move is to help them move more safely, while also trying to build tools and regulation for monitoring the world’s compute. I do not think there is any feasible solution for this which doesn’t utilize powerful AI tools to help with the monitoring process. These AI tools could be along the lines of SotA LLMs, or something different like an internet police force made up of something like Conjecture’s CogEms. Or perhaps some sort of BCI or gene-mod upgraded humans (though I doubt we have time for this).
My view is that algorithmic progress, pointed to by neuroscience, is on the cusp of being discovered, and if those insights are published, will make powerful AGI cheap and available to all competent programmers everywhere in the world. With so many people searching, and the necessary knowledge so widely distributed, I don’t think we can count on keeping this under wraps forever. Rather than have these insights get discovered and immediately shared widely (e.g. by some academic eager to publish an exciting paper who didn’t realize the full power and implications of their discovery), I think it would be far better to have a safety-conscious lab discover these, have a way to safely monitor themselves to notice the danger and potential power of what they’ve discovered. They can then keep the discoveries secret and collaborate with other safety-conscious groups and with governments to set up the worldwide monitoring we need to prevent a rogue AGI scenario. Once we have that, we can move safely to the long reflection and take our time figuring out better solutions to alignment. [An important crux for me here is that I believe that if we have control of an AGI which we know is potentially capable of recursively self-improving beyond our bounds to control it, we can successfully utilize this AGI at its current level of ability without letting it self-improve. If someone convinced me that this was untenable, it would change my strategic recommendations.]
As you can see from this prediction market I made, a lot of people currently disagree with me. I expect this will be a different looking distribution a year from now.
Here’s an intuition pump analogy for how I’ve been thinking about this. Imagine that I, as someone with a background in neuroscience and ML was granted the following set of abilities. Would you bet that I, with this set of abilities, would be able to do RSI? I would.
Abilities that I would have if I were an ML model trying to self-improve:
Make many copies of myself, and checkpoints throughout the process.
Work at high speed and in parallel with copies of myself.
Read all the existing scientific literature that seemed potentially relevant.
Observe all the connections between my neurons, all the activations of my clones as I expose them to various stimuli or run them through simulations.
Ability to edit these weights and connections.
Ability to add neurons (up to a point) where they seemed most needed, connected in any way I see fit, initialized with whatever weights I choose.
Ability to assemble new datasets and build new simulations to do additional training with.
Ability to freeze some subsection of a clone’s model and thus more rapidly train the remaining unfrozen section.
Ability to take notes and write collaborative documents with my clones working in parallel with me.
Ok. Thinking about that set of abilities, doesn’t it seem like a sufficiently creative, intelligent, determined general agent could successfully self-improve? I think so. I agree it’s unclear where the threshold is exactly, and when a transformer-based ML model will cross that threshold. I’ve made a bet at ‘GPT-5’, but honestly I’m not certain. Could be longer. Could be sooner...
Sorry @the gears to ascension . I know your view is that it would be better for me to be quiet about this, but I think the benefits of speaking up in this case outweigh the potential costs.
Is your probability distribution just concentrated with 100% of its mass between $1 trillion and $10 trillion? (And if so: why?)
To specifically answer the question in the parenthetical (without commenting on the dollar numbers; I don’t actually currently have an intuition strongly mapping [the thing I’m about to discuss] to dollar amounts—meaning that although I do currently think the numbers you give are in the right ballpark, I reserve the right to reconsider that as further discussion and/or development occurs):
The reason someone might concentrate their probability mass at or within a certain impact range, is if they believe that it makes conceptual sense to divide cognitive work into two (or more) distinct categories, one of which is much weaker in the impact it can have. Then the question of how this division affects one’s probability distribution is determined almost entirely by the question of what level at which they think the impact of the weaker category will saturate. And that question, in turn, has a lot more to do with the concrete properties they expect (or don’t expect) to see from the weaker cognition type, than it has to do with dollar quantities directly. You can translate the former into the latter, but only via an additional series of calculations and assumptions; the actual object-level model—which is where any update would occur—contains no gear directly corresponding to “dollar value of impact”.
So when this kind of model encounters LLMs doing unusual and exciting things that score very highly on metrics like revenue, investment, and overall “buzz”… well, these metrics don’t directly lead the model to update. What instead the model considers relevant is whether, when you look at the LLM’s output, that output seems to exhibit properties of cognition that are strongly prohibited by the model’s existing expectations about weak versus strong cognitive work—and if it doesn’t, then the model simply doesn’t update; it wasn’t, in fact, surprised by the level of cognition it observed—even if (perhaps) the larger model embedding it, which does track things like how the automation of certain tasks might translate into revenue/profit, was surprised.
And in fact, I do think this is what we observe from Eliezer (and from like-minded folk): he’s updated in the sense of becoming less certain about how much economic value can be generated by “weak” cognition (although one should also note that he’s never claimed to be particularly certain about this metric to begin with); meanwhile, he has not updated about the existence of a conceptual divide between “weak” and “strong” cognition, because the evidence he’s been presented with legitimately does not have much to say on that topic. In other words, I think he would say that the statement
I think we are getting significant evidence about the plausibility that deep learning is able to automate real human cognitive work
is true, but that its relevance to his model is limited, because “real human cognitive work” is a category spanning (loosely speaking) both “cognitive work that scales into generality”, and “cognitive work that doesn’t”, and that by agglomerating them together into a single category, you’re throwing away a key component of his model.
Incidentally, I want to make one thing clear: this does not mean I’m saying the rise of the Transformers provides no evidence at all in favor of [a model that assigns a more direct correspondence between cognitive work and impact, and postulates a smooth conversion from the former to the latter]. That model concentrates more probability mass in advance on the observations we’ve seen, and hence does receive Bayes credit for its predictions. However, I would argue that the updates in favor of this model are not particularly extreme, because the model against which it’s competing didn’t actually strongly prohibit the observations in question, only assign less probability to them (and not hugely less, since “slow takeoff” models don’t generally attempt to concentrate probability mass to extreme amounts, either)!
All of which is to say, I suppose, that I don’t really disagree with numerical likelihoods you give here:
I think that we have gotten considerable evidence about this, more than a factor of 4. I’ve personally updated my views by about a factor of 2, from a 25% chance to a 50% chance that scaling up deep learning is the real deal and leads to transformation soon.
but that I’m confused that you consider this “considerable”, and would write up a comment chastising Eliezer and the other “fast takeoff” folk because they… weren’t hugely moved by, like, ~2 bits’ worth of evidence? Like, I don’t see why he couldn’t just reply, “Sure, I updated by around 2 bits, which means that now I’ve gone from holding fast takeoff as my dominant hypothesis to holding fast takeoff as my dominant hypothesis.” And that seems like that degree of update would basically produce the kind of external behavior that might look like “not owning up” to evidence, because, well… it’s not a huge update to begin with?
(And to be clear, this does require that his prior look quite different from yours. But that’s already been amply established, I think, and while you can criticize his prior for being overconfident—and I actually find myself quite sympathetic to that line of argument—criticizing him for failing to properly update given that prior is, I think, a false charge.)
Yes, I’m saying that each $ increment the “qualitative division” model fares worse and worse. I think that people who hold onto this qualitative division have generally been qualitatively surprised by the accomplishments of LMs, that when they make concrete forecasts those forecasts have mismatched reality, and that they should be updating strongly about whether such a division is real.
What instead the model considers relevant is whether, when you look at the LLM’s output, that output seems to exhibit properties of cognition that are strongly prohibited by the model’s existing expectations about weak versus strong cognitive work—and if it doesn’t, then the model simply doesn’t update; it wasn’t, in fact, surprised by the level of cognition it observed—even if (perhaps) the larger model embedding it, which does track things like how the automation of certain tasks might translate into revenue/profit, was surprised.
I’m most of all wondering how you get high level of confidence in the distinction and its relevance. I’ve seen only really vague discussion. The view that LM cognition doesn’t scale into generality seems wacky to me. I want to see the description of tasks it can’t do.
In general if someone won’t state any predictions of their view I’m just going to update about your view based on my understanding of what it predicts (which is after all what I’d ultimately be doing if I took a given view seriously). I’ll also try to update about your view as operated by you, and so e.g. if you were generally showing a good predictive track record or achieving things in the world then I would be happy to acknowledge there is probably some good view there that I can’t understand.
I’m confused that you consider this “considerable”, and would write up a comment chastising Eliezer and the other “fast takeoff” folk because they… weren’t hugely moved by, like, ~2 bits’ worth of evidence? Like, I don’t see why he couldn’t just reply, “Sure, I updated by around 2 bits, which means that now I’ve gone from holding fast takeoff as my dominant hypothesis to holding fast takeoff as my dominant hypothesis.”
I do think that a factor of two is significant evidence. In practice in my experience that’s about as much evidence as you normally get between realistic alternative perspectives in messy domains. The kind of forecasting approach that puts 99.9% probability on things and so doesn’t move until it gets 10 bits is just not something that works in practice.
On the slip side, it’s enough evidence that Eliezer is endlessly condescending about it (e.g. about those who only assigned a 50% probability to the covid response being as inept as it was). Which I think is fine (but annoying), a factor of 2 is real evidence. And if I went around saying “Maybe our response to AI will be great” and then just replied to this observation with “whatever covid isn’t the kind of thing I’m talking about” without giving some kind of more precise model that distinguishes, then you would be right to chastise me.
Perhaps more importantly, I just don’t know where someone with this view would give ground. Even if you think any given factor of two isn’t a big deal, ten factors of two is what gets you from 99.9% to 50%. So you can’t just go around ignoring a couple of them every few years!
And rhetorically, I’m not complaining about people ultimately thinking fast takeoff is more plausible. I’m complaining about not expressing the view in such a way that we can learn about it based on what appears to me to be multiple bits of evidence, or acknowledging that evidence. This isn’t the only evidence we’ve gotten, I’m generally happy to acknowledge many bits of ways in which my views have moved towards other people’s.
So one claim is that a theory of post-AGI effects often won’t say things about pre-AGI AI, so mostly doesn’t get updated from pre-AGI observations. My takeon LLM alignment asks to distinguish human-like LLM AGIs from stronger AGIs (or weirder LLMs), with theories of stronger AGIs not naturally characterizing issues with human-like LLMs. Like, they aren’t concerned with optimizing for LLM superstimuli while their behavior remains in human imitation regime, where caring for LLM-specific things didn’t have a chance to gain influence. When the mostly faithful imitation nature of LLMs breaks with enough AI tinkering, the way human nature is breaking now towards influence of AGIs, we get another phase change to stronger AGIs.
This seems like a pattern, theories of extremal later phases being bounded within their scopes, saying little of preceding phases that transition into them. If the phase transition boundaries get muddled in thinking about this, we get misleading impressions about how the earlier phases work, while their navigation is instrumental for managing transitions into the much more concerning later phases.
Thanks!
Fair enough! I do in fact expect that AI will not be transformative-in-the-OpenPhil-sense (i.e. as much or more than the agricultural or industrial revolution) unless that AI is importantly different from today’s LLMs (e.g. advanced model-based RL). But I don’t think we’ve gotten much evidence on this hypothesis either way so far, right?
For example: I think if you walk up to some normal person and say “We already today have direct (albeit weak) evidence that LLMs will evolve into something that transforms the world much much more than electrification + airplanes + integrated circuits + the internet combined”, I think they would say “WTF?? That is a totally wild claim, and we do NOT already have direct evidence for it”. Right?
If you are mean “transformative” in a weaker-than-OpenPhil sense, well the internet “transformed the world” according to everyday usage, and the impact of the internet on the economy is (AFAICT) >$10T. I suppose that the fact that the internet exists is somewhat relevant to AGI x-risk, but I don’t think it’s very relevant. I think that people trying to make AGI go well in a hypothetical world where the internet doesn’t exist would be mostly doing pretty similar things as we are.
Why not “non-AGI AI systems will eventually be (at most) comparably impactful to the internet or automobiles or the printing press, before plateauing, and this is ridiculously impactful by everyday standards, but it doesn’t strongly change the story of how we should be thinking about AGI”?
BTW, why do we care about slow takeoff anyway?
Slow takeoff suggests that we see earlier smaller failures that have important structural similarity to later x-risk-level failures
Slow takeoff means the world that AGI will appear in will be so different from the current world that it totally changes what makes sense to do right now about AGI x-risk.
(Anything else?)
I can believe that “LLMs will transform the world” comparably to how the internet or the integrated circuit has transformed the world, without expecting either of those bullets to be true, right?
I think we are getting significant evidence about the plausibility that deep learning is able to automate real human cognitive work, and we are seeing extremely rapid increases in revenue and investment. I think those observations have extremely high probability if big deep learning systems will be transformative (this is practically necessary to see!), and fairly low base rate (not clear exactly how low but I think 25% seems reasonable and generous).
So yeah, I think that we have gotten considerable evidence about this, more than a factor of 4. I’ve personally updated my views by about a factor of 2, from a 25% chance to a 50% chance that scaling up deep learning is the real deal and leads to transformation soon. I don’t think “That’s a wild claim!” means you don’t have evidence, that’s not how evidence works.
I think that normal people who follow tech have also moved their views massively. They take the possibility of crazy transformations from deep leaning much more seriously than they did 5 years ago. They are much more likely to view deep learning as producing systems analogous to humans in economically relevant ways. And so on.
That view is fine, but now I’m just asking what your probability distribution is over the location of that plateau. Is it no evidence to see LMs at $10 billion? $100 billion? Is your probability distribution just concentrated with 100% of its mass between $1 trillion and $10 trillion? (And if so: why?)
It’s maybe plausible to say that your hypothesis is just like mine but with strong cutoffs at some particular large level like $1 trillion. But why? What principle makes an impact of $1 trillion possible but $10 trillion impossible?
Incidentally, I don’t think the internet adds $10 trillion of value. I agree that as I usually operationalize it the soft takeoff view is not technically falsified until AI gets to ~$100 trillion per year (though by $10 trillion I think fast takeoff has gotten considerably less feasible in addition to this update, and the world will probably be significantly changed and prepared for AI in a way that is a large part of what matters about slow takeoff), so we could replace the upper end of that interval with $100 trillion if we wanted to be more generous.
Hmm, for example, given that the language translation industry is supposedly $60B/yr, and given that we have known for decades that AI can take at least some significant chunk out of this industry at the low-quality end [e.g. tourists were using babelfish.altavista.com in the 1990s despite it sucking], I think someone would have to have been very unreasonable indeed to predict in advance that there will be an eternal plateau in the non-AGI AI market that’s lower than $1B/yr. (And that’s just one industry!) (Of course, that’s not a real prediction in advance ¯\_(ツ)_/¯ )
What I was getting at with “That’s a wild claim!” is that your theory makes an a-priori-obvious prediction (AI systems will grow to a >$1B industry pre-FOOM) and a controversial prediction (>$100T industry), and I think common sense in that situation is to basically ignore the obvious prediction and focus on the controversial one. And Bayesian updating says the same thing. The crux here is whether or not it has always been basically obvious to everyone, long in advance, that there’s at least $1B of work on our planet that can be done by non-FOOM-related AI, which is what I’m claiming in the previous paragraph where I brought up language translation. (Yeah I know, I am speculating about what was obvious to past people without checking what they said at the time—a fraught activity!)
Yeah deep learning can “automate real human cognitive work”, but so can pocket calculators, right? Anyway, I’d have to think more about what my actual plateau prediction is and why. I might reply again later. :)
I feel like your thinking here is actually mostly coming from “hey look at all the cool useful things that deep learning can do and is doing right now”, and is coming much less from the specific figure “$1B/year in 2023 and going up”. Is that fair?
I don’t think it’s obvious a priori that training deep learning to imitate human behavior can predict general behavior well enough to carry on customer support conversations, write marketing copy, or write code well enough to be helpful to software engineers. Similarly it’s not obvious whether it will be able to automate non-trivial debugging, prepare diagrams for a research paper, or generate plausible ML architectures. Perhaps to some people it’s obvious there is a divide here, but to me it’s just not obvious so I need to talk about broad probability distributions over where the divide sits.
I think ~$1B/year is a reasonable indicator of the generality and extent of current automation. I really do care about that number (though I wish I knew it better) and watching it go up is a big deal. If it can just keep being more useful with each passing year, I will become more skeptical of claims about fundamental divides, even if after the fact you can look at each thing and say “well it’s not real strong cognition.” I think you’ll plausibly be able to do that up through the end of days, if you are shameless enough.
I think the big ambiguity is about how you mark out a class of systems that benefit strongly from scale (i.e. such that doubling compute more than doubles economic value) and whether that’s being done correctly here. I think it’s fairly clear that the current crop of systems are much more general and are benefiting much more strongly from scale than previous systems. But it’s up for debate.
Hmm. I think we’re talking past each other a bit.
I think that everyone (including me) who wasn’t expecting LLMs to do all the cool impressive things that they can in fact do, or who wasn’t expecting LLMs to improve as rapidly as they are in fact improving, is obligated to update on that.
Once I do so update, it’s not immediately obvious to me that I learn anything more from the $1B/yr number. Yes, $1B/yr is plenty of money, but still a drop in the bucket of the >$1T/yr IT industry, and in particular, is dwarfed by a ton of random things like “corporate Linux support contracts”. Mostly I’m surprised that the number is so low!! (…For now!!)
But whatever, I’m not sure that matters for anything.
Anyway…
I did spend considerable time last week pondering where & whether I expect LLMs to plateau. It was a useful exercise; I appreciate your prodding. :)
I don’t really have great confidence in my answers, and I’m mostly redacting the details anyway. But if you care, here are some high-level takeaways of my current thinking:
(1) I expect there to be future systems that centrally incorporate LLMs, but also have other components, and I expect these future systems to be importantly more capable, less safe, and more superficially / straightforwardly agent-y than is an LLM by itself as we think of them today.
IF “LLMs scale to AGI”, I expect that this is how, and I expect that my own research will turn out to be pretty relevant in such a world. More generally, I expect that, in such systems, we’ll find the “traditional LLM alignment discourse” (RLHF fine-tuning, shoggoths, etc.) to be pretty irrelevant, and we’ll find the “traditional agent alignment discourse” (instrumental convergence, goal mis-generalization, etc.) to be obviously & straightforwardly relevant.
(2) One argument that pushes me towards fast takeoff is pretty closely tied to what I wrote in my recent post:
The following is a bit crude and not entirely accurate, but to a first approximation I want to say that LLMs have a suite of abstract “concepts” that it has seen in its training data (and that were in the brains of the humans who created that training data), and LLMs are really good at doing mix-and-match compositionality and pattern-match search to build a combinatorial explosion of interesting fresh outputs out of that massive preexisting web of interconnected concepts.
But I think there are some types of possible processes along the lines of:
“invent new useful concepts from scratch—even concepts that have never occurred to any human—and learn them permanently, such that they can be built on the future”
“notice inconsistencies in existing concepts / beliefs, find ways to resolve them, and learn them permanently, such that those mistakes will not be repeated in the future”
etc.
I think LLMs can do things like this a little bit, but not so well that you can repeat them in an infinite loop. For example, I suspect that if you took this technique and put it in an infinite loop, it would go off the rails pretty quickly. But I expect that future systems (of some sort) will eventually be able to do these kinds of things well enough to form a stable loop, i.e. the system will be able to keep running this process (whatever it is) over and over, and not go off the rails, but rather keep “figuring out” more and more things, thus rocketing off to outer space, in a way that’s loosely analogous to self-play in AlphaZero, or to a smart human gradually honing in on a better and better understanding of a complicated machine.
I think this points to an upcoming “discontinuity”, in the sense that I think right now we don’t have systems that can do the above bullet points (at least, not well enough to repeat them in an infinite loop), and I think we will have such systems in the future, and I think we won’t get TAI until we do. And it feels pretty plausible to me (admittedly not based on much!) that it would only take a couple years or less between “widespread knowledge of how to build such systems” and “someone gets an implementation working well enough that they can run it in an infinite loop and it just keeps “figuring out” more and more things, correctly, and thus it rockets off to radically superhuman intelligence and capabilities.”
(3) I’m still mostly expecting LLMs (and more broadly, LLM-based systems) to not be able to do the above bullet point things, and (relatedly) to plateau at a level where they mainly assist rather than replace smart humans. This is tied to fundamental architectural limitations that I believe transformers have (and indeed, that I believe DNNs more generally have), which I don’t want to talk about…
(4) …but I could totally be wrong. ¯\_(ツ)_/¯ And I think that, for various reasons, my current day-to-day research program is not too sensitive to the possibility that I’m wrong about that.
Steven: as someone who has read all your posts agrees with you on almost everything, this is a point where I have a clear disagreement with you. When I switched from neuroscience to doing ML full-time, some of the stuff I read to get up to speed was people theorizing about impossibly large (infinite or practically so) neural networks. I think that the literature on this does a pretty good job of establishing that, in the limit, neural networks can compute any sort of function. Which means that they can compute all the functions in a human brain, or a set of human brains. Meaning, it’s not a question of whether scaling CAN get us to AGI. It certainly can. It’s a question of when. There is inefficiency in trying to scale an algorithm which tries to brute force learn the relevant functions rather than have them hardcoded in via genetics. I think that you are right that there are certain functions the human brain does quite well that current SoTA LLMs do very poorly. I don’t think this means that scaling LLMs can’t lead to a point where the relevant capabilities suddenly emerge. I think we are already in a regime of substantial compute and data overhang for AGI, and that the thing holding us back is the proper design and integration of modules which emulate the functions of parts of the brain not currently well imitated by LLMs. Like the reward and valence systems of the basal ganglia, for instance. It’s still an open question to me whether we will get to AGI via scaling or algorithmic improvement. Imagine for a moment that I am correct that scaling LLMs could get us there, but also that a vastly more efficient system which borrows more functions from the human brain is possible. What might this scenario look like? Perhaps an LLM gets strong enough to, upon human prompting and with human assistance, analyze the computational neuroscience literature and open source code, and extract useful functions, and then do some combination of intuitively improve their efficiency and brute force test them in new experimental ML architectures. This is not so big a leap from what GPT-4 is capable of. I think that that’s plausibly even a GPT-5 level of skill. Suppose also that these new architectures can be added onto existing LLM base models, rather than needed the base model to be retrained from scratch. As some critical amount of discoveries accumulate, GPT-5 suddenly takes a jump forward in efficacy, enabling it to process the rest of the potential improvements much faster, and then it takes another big jump forward, and then is able to rapidly self-improve with no further need for studying existing published research. In such a scenario, we’d have a foom over the course of a few days which could take us by surprise and lead to a rapid loss of control. This is exactly why I think Astera’s work is risky, even though their current code seems quite harmless on its own. I think it is focused on (some of) the places where LLMs do poorly, but also that there’s nothing stopping the work from being effectively integrated with existing models for substantial capability gains. This is why I got so upset with Astera when I realized during my interview process with them that they were open-sourcing their code, and also when I carefully read through their code and saw great potential there for integrating it with more mainstream ML to the empowerment of both.
literature examples of the sort of thing I’m talking about with ‘enough scaling will eventually get us there’, even though I haven’t read this particular paper: https://arxiv.org/abs/2112.15577
https://openreview.net/forum?id=HyGBdo0qFm
Paul: I think you are making a valid point here. I think your point is (sadly) getting obscured by the fact our assumptions have shifted under our feet since the time when you began to make your point about slow vs fast takeoff.
I’d like to explain what I think the point you are right on is, and then try to describe how I think we need a new framing for the next set of predictions.
Several years ago, Eliezer and MIRI generally were frequently emphasizing the idea of a fast take-off that snuck up on us before the world had been much changed by narrow AI. You correctly predicted that the world would indeed be transformed in a very noticeable way by narrow AI before AGI. Eliezer in discussions with you has failed to acknowledge ways in which his views shifted from what they were ~10 years ago towards your views. I think this reflects poorly on him, but I still think he has a lot of good ideas, and made a lot of important predictions well in advance of other people realizing how important this was all going to be. As I’ve stated before, I often find my own views seeming to be located somewhere in-between your views and Eliezer’s wherever you two disagree.
I think we should acknowledge your point that the world being changed in a very noticeable way by AI before true AGI, just as you have acknowledged Eliezer’s point that once a runaway out-of-human-control human-out-of-the-loop recursive-self-improvement process gets started it could potentially proceed shockingly fast and lead to a loss of humanity’s ability to regain control of the resulting AGI even once we realized what is happening. [I say Eliezer’s point here, not to suggest that you disagreed with him on this point, but simply that he was making this a central part of his predictions from fairly early on.]
I think the framing we need now is: how can we predict, detect, and halt such a runaway RSI process before it is too late? This is important to consider from multiple angles. I mostly think that the big AI labs are being reasonably wary about this (although they certainly could do better). What concerns me more is the sort of people out in the wild who will take open source code and do dumb or evil stuff with it, Chaos-GPT-style, for personal gain or amusement. I think the biggest danger we face is that affordable open-source models seem to be lagging only a few years behind SotA models, and that the world is full of chaotic people who could (knowingly or not) trigger a runaway RSI process if such a thing is cheap and easy to do.
In such a strategic landscape, it could be crucially important to figure out how to:
a) slow down the progress of open source models, to keep dangerous runaway RSI from becoming cheap and easy to trigger
b) use SotA models to develop better methods of monitoring and preventing anyone outside a reasonably-safe-behaving org from doing this dangerous thing.
c) improving the ability of the reasonable orgs to self-monitor and notice the danger before they blow themselves up
I think that it does not make strategic sense to actively hinder the big AI labs. I think our best move is to help them move more safely, while also trying to build tools and regulation for monitoring the world’s compute. I do not think there is any feasible solution for this which doesn’t utilize powerful AI tools to help with the monitoring process. These AI tools could be along the lines of SotA LLMs, or something different like an internet police force made up of something like Conjecture’s CogEms. Or perhaps some sort of BCI or gene-mod upgraded humans (though I doubt we have time for this).
My view is that algorithmic progress, pointed to by neuroscience, is on the cusp of being discovered, and if those insights are published, will make powerful AGI cheap and available to all competent programmers everywhere in the world. With so many people searching, and the necessary knowledge so widely distributed, I don’t think we can count on keeping this under wraps forever. Rather than have these insights get discovered and immediately shared widely (e.g. by some academic eager to publish an exciting paper who didn’t realize the full power and implications of their discovery), I think it would be far better to have a safety-conscious lab discover these, have a way to safely monitor themselves to notice the danger and potential power of what they’ve discovered. They can then keep the discoveries secret and collaborate with other safety-conscious groups and with governments to set up the worldwide monitoring we need to prevent a rogue AGI scenario. Once we have that, we can move safely to the long reflection and take our time figuring out better solutions to alignment. [An important crux for me here is that I believe that if we have control of an AGI which we know is potentially capable of recursively self-improving beyond our bounds to control it, we can successfully utilize this AGI at its current level of ability without letting it self-improve. If someone convinced me that this was untenable, it would change my strategic recommendations.]
As @Jed McCaleb said in his recent post, ‘The only way forward is through!’. https://www.lesswrong.com/posts/vEtdjWuFrRwffWBiP/we-have-to-upgrade
As you can see from this prediction market I made, a lot of people currently disagree with me. I expect this will be a different looking distribution a year from now.
https://manifold.markets/NathanHelmBurger/will-gpt5-be-capable-of-recursive-s?r=TmF0aGFuSGVsbUJ1cmdlcg
Here’s an intuition pump analogy for how I’ve been thinking about this. Imagine that I, as someone with a background in neuroscience and ML was granted the following set of abilities. Would you bet that I, with this set of abilities, would be able to do RSI? I would.
Abilities that I would have if I were an ML model trying to self-improve:
Make many copies of myself, and checkpoints throughout the process.
Work at high speed and in parallel with copies of myself.
Read all the existing scientific literature that seemed potentially relevant.
Observe all the connections between my neurons, all the activations of my clones as I expose them to various stimuli or run them through simulations.
Ability to edit these weights and connections.
Ability to add neurons (up to a point) where they seemed most needed, connected in any way I see fit, initialized with whatever weights I choose.
Ability to assemble new datasets and build new simulations to do additional training with.
Ability to freeze some subsection of a clone’s model and thus more rapidly train the remaining unfrozen section.
Ability to take notes and write collaborative documents with my clones working in parallel with me.
Ok. Thinking about that set of abilities, doesn’t it seem like a sufficiently creative, intelligent, determined general agent could successfully self-improve? I think so. I agree it’s unclear where the threshold is exactly, and when a transformer-based ML model will cross that threshold. I’ve made a bet at ‘GPT-5’, but honestly I’m not certain. Could be longer. Could be sooner...
Sorry @the gears to ascension . I know your view is that it would be better for me to be quiet about this, but I think the benefits of speaking up in this case outweigh the potential costs.
oh, no worries, this part is obvious
To specifically answer the question in the parenthetical (without commenting on the dollar numbers; I don’t actually currently have an intuition strongly mapping [the thing I’m about to discuss] to dollar amounts—meaning that although I do currently think the numbers you give are in the right ballpark, I reserve the right to reconsider that as further discussion and/or development occurs):
The reason someone might concentrate their probability mass at or within a certain impact range, is if they believe that it makes conceptual sense to divide cognitive work into two (or more) distinct categories, one of which is much weaker in the impact it can have. Then the question of how this division affects one’s probability distribution is determined almost entirely by the question of what level at which they think the impact of the weaker category will saturate. And that question, in turn, has a lot more to do with the concrete properties they expect (or don’t expect) to see from the weaker cognition type, than it has to do with dollar quantities directly. You can translate the former into the latter, but only via an additional series of calculations and assumptions; the actual object-level model—which is where any update would occur—contains no gear directly corresponding to “dollar value of impact”.
So when this kind of model encounters LLMs doing unusual and exciting things that score very highly on metrics like revenue, investment, and overall “buzz”… well, these metrics don’t directly lead the model to update. What instead the model considers relevant is whether, when you look at the LLM’s output, that output seems to exhibit properties of cognition that are strongly prohibited by the model’s existing expectations about weak versus strong cognitive work—and if it doesn’t, then the model simply doesn’t update; it wasn’t, in fact, surprised by the level of cognition it observed—even if (perhaps) the larger model embedding it, which does track things like how the automation of certain tasks might translate into revenue/profit, was surprised.
And in fact, I do think this is what we observe from Eliezer (and from like-minded folk): he’s updated in the sense of becoming less certain about how much economic value can be generated by “weak” cognition (although one should also note that he’s never claimed to be particularly certain about this metric to begin with); meanwhile, he has not updated about the existence of a conceptual divide between “weak” and “strong” cognition, because the evidence he’s been presented with legitimately does not have much to say on that topic. In other words, I think he would say that the statement
is true, but that its relevance to his model is limited, because “real human cognitive work” is a category spanning (loosely speaking) both “cognitive work that scales into generality”, and “cognitive work that doesn’t”, and that by agglomerating them together into a single category, you’re throwing away a key component of his model.
Incidentally, I want to make one thing clear: this does not mean I’m saying the rise of the Transformers provides no evidence at all in favor of [a model that assigns a more direct correspondence between cognitive work and impact, and postulates a smooth conversion from the former to the latter]. That model concentrates more probability mass in advance on the observations we’ve seen, and hence does receive Bayes credit for its predictions. However, I would argue that the updates in favor of this model are not particularly extreme, because the model against which it’s competing didn’t actually strongly prohibit the observations in question, only assign less probability to them (and not hugely less, since “slow takeoff” models don’t generally attempt to concentrate probability mass to extreme amounts, either)!
All of which is to say, I suppose, that I don’t really disagree with numerical likelihoods you give here:
but that I’m confused that you consider this “considerable”, and would write up a comment chastising Eliezer and the other “fast takeoff” folk because they… weren’t hugely moved by, like, ~2 bits’ worth of evidence? Like, I don’t see why he couldn’t just reply, “Sure, I updated by around 2 bits, which means that now I’ve gone from holding fast takeoff as my dominant hypothesis to holding fast takeoff as my dominant hypothesis.” And that seems like that degree of update would basically produce the kind of external behavior that might look like “not owning up” to evidence, because, well… it’s not a huge update to begin with?
(And to be clear, this does require that his prior look quite different from yours. But that’s already been amply established, I think, and while you can criticize his prior for being overconfident—and I actually find myself quite sympathetic to that line of argument—criticizing him for failing to properly update given that prior is, I think, a false charge.)
Yes, I’m saying that each $ increment the “qualitative division” model fares worse and worse. I think that people who hold onto this qualitative division have generally been qualitatively surprised by the accomplishments of LMs, that when they make concrete forecasts those forecasts have mismatched reality, and that they should be updating strongly about whether such a division is real.
I’m most of all wondering how you get high level of confidence in the distinction and its relevance. I’ve seen only really vague discussion. The view that LM cognition doesn’t scale into generality seems wacky to me. I want to see the description of tasks it can’t do.
In general if someone won’t state any predictions of their view I’m just going to update about your view based on my understanding of what it predicts (which is after all what I’d ultimately be doing if I took a given view seriously). I’ll also try to update about your view as operated by you, and so e.g. if you were generally showing a good predictive track record or achieving things in the world then I would be happy to acknowledge there is probably some good view there that I can’t understand.
I do think that a factor of two is significant evidence. In practice in my experience that’s about as much evidence as you normally get between realistic alternative perspectives in messy domains. The kind of forecasting approach that puts 99.9% probability on things and so doesn’t move until it gets 10 bits is just not something that works in practice.
On the slip side, it’s enough evidence that Eliezer is endlessly condescending about it (e.g. about those who only assigned a 50% probability to the covid response being as inept as it was). Which I think is fine (but annoying), a factor of 2 is real evidence. And if I went around saying “Maybe our response to AI will be great” and then just replied to this observation with “whatever covid isn’t the kind of thing I’m talking about” without giving some kind of more precise model that distinguishes, then you would be right to chastise me.
Perhaps more importantly, I just don’t know where someone with this view would give ground. Even if you think any given factor of two isn’t a big deal, ten factors of two is what gets you from 99.9% to 50%. So you can’t just go around ignoring a couple of them every few years!
And rhetorically, I’m not complaining about people ultimately thinking fast takeoff is more plausible. I’m complaining about not expressing the view in such a way that we can learn about it based on what appears to me to be multiple bits of evidence, or acknowledging that evidence. This isn’t the only evidence we’ve gotten, I’m generally happy to acknowledge many bits of ways in which my views have moved towards other people’s.
So one claim is that a theory of post-AGI effects often won’t say things about pre-AGI AI, so mostly doesn’t get updated from pre-AGI observations. My take on LLM alignment asks to distinguish human-like LLM AGIs from stronger AGIs (or weirder LLMs), with theories of stronger AGIs not naturally characterizing issues with human-like LLMs. Like, they aren’t concerned with optimizing for LLM superstimuli while their behavior remains in human imitation regime, where caring for LLM-specific things didn’t have a chance to gain influence. When the mostly faithful imitation nature of LLMs breaks with enough AI tinkering, the way human nature is breaking now towards influence of AGIs, we get another phase change to stronger AGIs.
This seems like a pattern, theories of extremal later phases being bounded within their scopes, saying little of preceding phases that transition into them. If the phase transition boundaries get muddled in thinking about this, we get misleading impressions about how the earlier phases work, while their navigation is instrumental for managing transitions into the much more concerning later phases.