I don’t think talking about “timelines” is useful anymore without specifying what the timeline is until (in more detail than “AGI” or “transformative AI”). It’s not like there’s a specific time in the future when a “game over” screen shows with our score. And for the “the last time that humans can meaningfully impact the course of the future” definition, that too seems to depend on the question of how: the answer is already in the past for “prevent the proliferation of AI smart enough to understand and predict human language”, but significantly in the future for “prevent end-to-end automation of the production of computing infrastructure from raw inputs”.
I very much agree that talking about time to AGI or TAI is causing a lot of confusion because people don’t share a common definition of those terms. I asked What’s a better term now that “AGI” is too vague?, arguing that the original use of AGI was very much the right term, but it’s been watered down from fully general to fairly general, making the definition utterly vague and perhaps worse-than-useless.
I didn’t really get any great suggestions for better terminology, including my own. Thinking about it since then, I wonder if the best term (when there’s not space to carefully define it) is artifical superintelligence, ASI. That has the intuitive sense of “something that outclasses us”. The alignment community has long been using it for something well past AGI, to the nearly-omniscient level, but it technically just means smarter than a human—which is something that intuition says we should be very worried about.
There are arguments that AI doesn’t need to be smarter than human to worry about it, but I personally worry most about “real” AGI, as defined in that linked post and I think in Yudkowsky’s original usage: AI that can think about and learn about anything.
You could also say that ASI already exists, because AI is narrowly superhuman, but superintelligence does intuitively suggest smarter than human in every way.
My runners-up were parahuman AI and superhuman entities.
I don’t think it’s an issue of pure terminology. Rather, I expect the issue is expecting to have a single discrete point in time at which some specific AI is better than every human at every useful task. Possibly there will ever be such a point in time, but I don’t see any reason to expect “AI is better than all humans at developing new euv lithography techniques”, “AI is better than all humans at equipment repair in the field”, and “AI is better than all humans at proving mathematical theorems” to happen at similar times.
Put another way, is an instance of an LLM that has an affordance for “fine-tune itself on a given dataset” an ASI? Going by your rubric:
Can think about any topic, including topics outside of their training set:Yep, though it’s probably not very good at it
Can do self-directed, online learning: Yep, though this may cause it to perform worse on other tasks if it does too much of it
Alignment may shift as knowledge and beliefs shift w/ learning: To the extent that “alignment” is a meaningful thing to talk about with regards to only a model rather than a model plus its environment, yep
Their own beliefs and goals: Yes, at least for definitions of “beliefs” and “goals” such that humans have beliefs and goals
Alignment must be reflexively stable: ¯_(ツ)_/¯ seems likely that some possible configuration is relatively stable
Alignment must be sufficient for contextual awareness and potential self-improvement: ¯_(ツ)_/¯ even modern LLM chat interfaces like Claude are pretty contextually aware these days
Actions: Yep, LLMs can already perform actions if you give them affordances to do so (e.g. tools)
Agency is implied or trivial to add: ¯_(ツ)_/¯, depends what you mean by “agency” but in the sense of “can break down large goals into subgoals somewhat reliably” I’d say yes
Still, I don’t think e.g. Claude Opus is “an ASI” in the sense that people who talk about timelines mean it, and I don’t think this is only because it doesn’t have any affordances for self-directed online learning.
Rather, I expect the issue is expecting to have a single discrete point in time at which some specific AI is better than every human at every useful task. Possibly there will ever be such a point in time, but I don’t see any reason to expect “AI is better than all humans at developing new euv lithography techniques”, “AI is better than all humans at equipment repair in the field”, and “AI is better than all humans at proving mathematical theorems” to happen at similar times.
In particular, here are the most relevant quotes on this subject:
“But for the more important insight: The history of AI is littered with the skulls of people who claimed that some task is AI-complete, when in retrospect this has been obviously false. And while I would have definitely denied that getting IMO gold would be AI-complete, I was surprised by the narrowness of the system DeepMind used.”
“I think I was too much in the far-mode headspace of one needing Real Intelligence—namely, a foundation model stronger than current ones—to do well on the IMO, rather than thinking near-mode “okay, imagine DeepMind took a stab at the IMO; what kind of methods would they use, and how well would those work?”
“I also updated away from a “some tasks are AI-complete” type of view, towards “often the first system to do X will not be the first systems to do Y”.
I’ve come to realize that being “superhuman” at something is often much more mundane than I’ve thought. (Maybe focusing on full superintelligence—something better than humanity on practically any task of interest—has thrown me off.)”
Like:
“In chess, you can just look a bit more ahead, be a bit better at weighting factors, make a bit sharper tradeoffs, make just a bit fewer errors.
If I showed you a video of a robot that was superhuman at juggling, it probably wouldn’t look all that impressive to you (or me, despite being a juggler). It would just be a robot juggling a couple balls more than a human can, throwing a bit higher, moving a bit faster, with just a bit more accuracy.
The first language models to be superhuman at persuasion won’t rely on any wildly incomprehensible pathways that break the human user (c.f. List of Lethalities, items 18 and 20). They just choose their words a bit more carefully, leverage a bit more information about the user in a bit more useful way, have a bit more persuasive writing style, being a bit more subtle in their ways.
(Indeed, already GPT-4 is better than your average study participant in persuasiveness.)
You don’t need any fundamental breakthroughs in AI to reach superhuman programming skills. Language models just know a lot more stuff, are a lot faster and cheaper, are a lot more consistent, make fewer simple bugs, can keep track of more information at once.
(Indeed, current best models are already useful for programming.)
(Maybe these systems are subhuman or merely human-level in some aspects, but they can compensate for that by being a lot better on other dimensions.)”
“As a consequence, I now think that the first transformatively useful AIs could look behaviorally quite mundane.”
I agree with all of that. My definition isn’t crisp enough; doing crappy general thinking and learning isn’t good enough. It probably needs to be roughly human level or above at those things before it’s takeover-capable and therefore really dangerous.
I didn’t intend to add the alignment definitions to the definition of AGI.
I’d argue that LLMs actually can’t think about anything outside of their training set, and it’s just that everything humans have thought about so far is inside their training set. But I don’t think that discussion matters here.
I agree that Claude isn’t an ASI by that definition. even if it did have longer-term goal-directed agency and self-directed online learning added, it would still be far subhuman in some important areas, arguably in general reasoning that’s critical for complex novel tasks like taking over the world or the economy. ASI needs to mean superhuman in every important way. And of course important is vague.
I guess a more reasonable goal is working toward the minimum description length that gets across all of those considerations. And a big problem is that timeline predictions to important/dangerous AI are mixed in with theories about what will make it important/dangerous. One terminological move I’ve been trying is the word “competent” to invoke intuitions about getting useful (and therefore potentially dangerous) stuff done.
I think the unstated assumption (when timeline-predictors don’t otherwise specify) is “the time when there are no significant deniers”, or “the time when things are so clearly different that nobody (at least nobody the predictor respects) is using the past as any indication of the future on any relevant dimension.
Some people may CLAIM it’s about the point of no return, after which changes can’t be undone or slowed in order to maintain anything near status quo or historical expectations. This is pretty difficult to work with, since it could happen DECADES before it’s obvious to most people.
That said, I’m not sure talking about timelines was EVER all that useful or concrete. There are too many unknowns, and too many anti-inductive elements (where humans or other agents change their behavior based on others’ decisions and their predictions of decisions, in a chaotic recursion). “short”, “long”, or “never” are good at giving a sense of someone’s thinking, but anything more granular is delusional.
I don’t think talking about “timelines” is useful anymore without specifying what the timeline is until (in more detail than “AGI” or “transformative AI”). It’s not like there’s a specific time in the future when a “game over” screen shows with our score. And for the “the last time that humans can meaningfully impact the course of the future” definition, that too seems to depend on the question of how: the answer is already in the past for “prevent the proliferation of AI smart enough to understand and predict human language”, but significantly in the future for “prevent end-to-end automation of the production of computing infrastructure from raw inputs”.
I very much agree that talking about time to AGI or TAI is causing a lot of confusion because people don’t share a common definition of those terms. I asked What’s a better term now that “AGI” is too vague?, arguing that the original use of AGI was very much the right term, but it’s been watered down from fully general to fairly general, making the definition utterly vague and perhaps worse-than-useless.
I didn’t really get any great suggestions for better terminology, including my own. Thinking about it since then, I wonder if the best term (when there’s not space to carefully define it) is artifical superintelligence, ASI. That has the intuitive sense of “something that outclasses us”. The alignment community has long been using it for something well past AGI, to the nearly-omniscient level, but it technically just means smarter than a human—which is something that intuition says we should be very worried about.
There are arguments that AI doesn’t need to be smarter than human to worry about it, but I personally worry most about “real” AGI, as defined in that linked post and I think in Yudkowsky’s original usage: AI that can think about and learn about anything.
You could also say that ASI already exists, because AI is narrowly superhuman, but superintelligence does intuitively suggest smarter than human in every way.
My runners-up were parahuman AI and superhuman entities.
I don’t think it’s an issue of pure terminology. Rather, I expect the issue is expecting to have a single discrete point in time at which some specific AI is better than every human at every useful task. Possibly there will ever be such a point in time, but I don’t see any reason to expect “AI is better than all humans at developing new euv lithography techniques”, “AI is better than all humans at equipment repair in the field”, and “AI is better than all humans at proving mathematical theorems” to happen at similar times.
Put another way, is an instance of an LLM that has an affordance for “fine-tune itself on a given dataset” an ASI? Going by your rubric:
Can think about any topic, including topics outside of their training set:Yep, though it’s probably not very good at it
Can do self-directed, online learning: Yep, though this may cause it to perform worse on other tasks if it does too much of it
Alignment may shift as knowledge and beliefs shift w/ learning: To the extent that “alignment” is a meaningful thing to talk about with regards to only a model rather than a model plus its environment, yep
Their own beliefs and goals: Yes, at least for definitions of “beliefs” and “goals” such that humans have beliefs and goals
Alignment must be reflexively stable: ¯_(ツ)_/¯ seems likely that some possible configuration is relatively stable
Alignment must be sufficient for contextual awareness and potential self-improvement: ¯_(ツ)_/¯ even modern LLM chat interfaces like Claude are pretty contextually aware these days
Actions: Yep, LLMs can already perform actions if you give them affordances to do so (e.g. tools)
Agency is implied or trivial to add: ¯_(ツ)_/¯, depends what you mean by “agency” but in the sense of “can break down large goals into subgoals somewhat reliably” I’d say yes
Still, I don’t think e.g. Claude Opus is “an ASI” in the sense that people who talk about timelines mean it, and I don’t think this is only because it doesn’t have any affordances for self-directed online learning.
Olli Järviniemi made something like this point:
in the post Near-mode thinking on AI:
https://www.lesswrong.com/posts/ASLHfy92vCwduvBRZ/near-mode-thinking-on-ai
In particular, here are the most relevant quotes on this subject:
I agree with all of that. My definition isn’t crisp enough; doing crappy general thinking and learning isn’t good enough. It probably needs to be roughly human level or above at those things before it’s takeover-capable and therefore really dangerous.
I didn’t intend to add the alignment definitions to the definition of AGI.
I’d argue that LLMs actually can’t think about anything outside of their training set, and it’s just that everything humans have thought about so far is inside their training set. But I don’t think that discussion matters here.
I agree that Claude isn’t an ASI by that definition. even if it did have longer-term goal-directed agency and self-directed online learning added, it would still be far subhuman in some important areas, arguably in general reasoning that’s critical for complex novel tasks like taking over the world or the economy. ASI needs to mean superhuman in every important way. And of course important is vague.
I guess a more reasonable goal is working toward the minimum description length that gets across all of those considerations. And a big problem is that timeline predictions to important/dangerous AI are mixed in with theories about what will make it important/dangerous. One terminological move I’ve been trying is the word “competent” to invoke intuitions about getting useful (and therefore potentially dangerous) stuff done.
I think the unstated assumption (when timeline-predictors don’t otherwise specify) is “the time when there are no significant deniers”, or “the time when things are so clearly different that nobody (at least nobody the predictor respects) is using the past as any indication of the future on any relevant dimension.
Some people may CLAIM it’s about the point of no return, after which changes can’t be undone or slowed in order to maintain anything near status quo or historical expectations. This is pretty difficult to work with, since it could happen DECADES before it’s obvious to most people.
That said, I’m not sure talking about timelines was EVER all that useful or concrete. There are too many unknowns, and too many anti-inductive elements (where humans or other agents change their behavior based on others’ decisions and their predictions of decisions, in a chaotic recursion). “short”, “long”, or “never” are good at giving a sense of someone’s thinking, but anything more granular is delusional.