I don’t think it’s an issue of pure terminology. Rather, I expect the issue is expecting to have a single discrete point in time at which some specific AI is better than every human at every useful task. Possibly there will ever be such a point in time, but I don’t see any reason to expect “AI is better than all humans at developing new euv lithography techniques”, “AI is better than all humans at equipment repair in the field”, and “AI is better than all humans at proving mathematical theorems” to happen at similar times.
Put another way, is an instance of an LLM that has an affordance for “fine-tune itself on a given dataset” an ASI? Going by your rubric:
Can think about any topic, including topics outside of their training set:Yep, though it’s probably not very good at it
Can do self-directed, online learning: Yep, though this may cause it to perform worse on other tasks if it does too much of it
Alignment may shift as knowledge and beliefs shift w/ learning: To the extent that “alignment” is a meaningful thing to talk about with regards to only a model rather than a model plus its environment, yep
Their own beliefs and goals: Yes, at least for definitions of “beliefs” and “goals” such that humans have beliefs and goals
Alignment must be reflexively stable: ¯_(ツ)_/¯ seems likely that some possible configuration is relatively stable
Alignment must be sufficient for contextual awareness and potential self-improvement: ¯_(ツ)_/¯ even modern LLM chat interfaces like Claude are pretty contextually aware these days
Actions: Yep, LLMs can already perform actions if you give them affordances to do so (e.g. tools)
Agency is implied or trivial to add: ¯_(ツ)_/¯, depends what you mean by “agency” but in the sense of “can break down large goals into subgoals somewhat reliably” I’d say yes
Still, I don’t think e.g. Claude Opus is “an ASI” in the sense that people who talk about timelines mean it, and I don’t think this is only because it doesn’t have any affordances for self-directed online learning.
Rather, I expect the issue is expecting to have a single discrete point in time at which some specific AI is better than every human at every useful task. Possibly there will ever be such a point in time, but I don’t see any reason to expect “AI is better than all humans at developing new euv lithography techniques”, “AI is better than all humans at equipment repair in the field”, and “AI is better than all humans at proving mathematical theorems” to happen at similar times.
In particular, here are the most relevant quotes on this subject:
“But for the more important insight: The history of AI is littered with the skulls of people who claimed that some task is AI-complete, when in retrospect this has been obviously false. And while I would have definitely denied that getting IMO gold would be AI-complete, I was surprised by the narrowness of the system DeepMind used.”
“I think I was too much in the far-mode headspace of one needing Real Intelligence—namely, a foundation model stronger than current ones—to do well on the IMO, rather than thinking near-mode “okay, imagine DeepMind took a stab at the IMO; what kind of methods would they use, and how well would those work?”
“I also updated away from a “some tasks are AI-complete” type of view, towards “often the first system to do X will not be the first systems to do Y”.
I’ve come to realize that being “superhuman” at something is often much more mundane than I’ve thought. (Maybe focusing on full superintelligence—something better than humanity on practically any task of interest—has thrown me off.)”
Like:
“In chess, you can just look a bit more ahead, be a bit better at weighting factors, make a bit sharper tradeoffs, make just a bit fewer errors.
If I showed you a video of a robot that was superhuman at juggling, it probably wouldn’t look all that impressive to you (or me, despite being a juggler). It would just be a robot juggling a couple balls more than a human can, throwing a bit higher, moving a bit faster, with just a bit more accuracy.
The first language models to be superhuman at persuasion won’t rely on any wildly incomprehensible pathways that break the human user (c.f. List of Lethalities, items 18 and 20). They just choose their words a bit more carefully, leverage a bit more information about the user in a bit more useful way, have a bit more persuasive writing style, being a bit more subtle in their ways.
(Indeed, already GPT-4 is better than your average study participant in persuasiveness.)
You don’t need any fundamental breakthroughs in AI to reach superhuman programming skills. Language models just know a lot more stuff, are a lot faster and cheaper, are a lot more consistent, make fewer simple bugs, can keep track of more information at once.
(Indeed, current best models are already useful for programming.)
(Maybe these systems are subhuman or merely human-level in some aspects, but they can compensate for that by being a lot better on other dimensions.)”
“As a consequence, I now think that the first transformatively useful AIs could look behaviorally quite mundane.”
I agree with all of that. My definition isn’t crisp enough; doing crappy general thinking and learning isn’t good enough. It probably needs to be roughly human level or above at those things before it’s takeover-capable and therefore really dangerous.
I didn’t intend to add the alignment definitions to the definition of AGI.
I’d argue that LLMs actually can’t think about anything outside of their training set, and it’s just that everything humans have thought about so far is inside their training set. But I don’t think that discussion matters here.
I agree that Claude isn’t an ASI by that definition. even if it did have longer-term goal-directed agency and self-directed online learning added, it would still be far subhuman in some important areas, arguably in general reasoning that’s critical for complex novel tasks like taking over the world or the economy. ASI needs to mean superhuman in every important way. And of course important is vague.
I guess a more reasonable goal is working toward the minimum description length that gets across all of those considerations. And a big problem is that timeline predictions to important/dangerous AI are mixed in with theories about what will make it important/dangerous. One terminological move I’ve been trying is the word “competent” to invoke intuitions about getting useful (and therefore potentially dangerous) stuff done.
I don’t think it’s an issue of pure terminology. Rather, I expect the issue is expecting to have a single discrete point in time at which some specific AI is better than every human at every useful task. Possibly there will ever be such a point in time, but I don’t see any reason to expect “AI is better than all humans at developing new euv lithography techniques”, “AI is better than all humans at equipment repair in the field”, and “AI is better than all humans at proving mathematical theorems” to happen at similar times.
Put another way, is an instance of an LLM that has an affordance for “fine-tune itself on a given dataset” an ASI? Going by your rubric:
Can think about any topic, including topics outside of their training set:Yep, though it’s probably not very good at it
Can do self-directed, online learning: Yep, though this may cause it to perform worse on other tasks if it does too much of it
Alignment may shift as knowledge and beliefs shift w/ learning: To the extent that “alignment” is a meaningful thing to talk about with regards to only a model rather than a model plus its environment, yep
Their own beliefs and goals: Yes, at least for definitions of “beliefs” and “goals” such that humans have beliefs and goals
Alignment must be reflexively stable: ¯_(ツ)_/¯ seems likely that some possible configuration is relatively stable
Alignment must be sufficient for contextual awareness and potential self-improvement: ¯_(ツ)_/¯ even modern LLM chat interfaces like Claude are pretty contextually aware these days
Actions: Yep, LLMs can already perform actions if you give them affordances to do so (e.g. tools)
Agency is implied or trivial to add: ¯_(ツ)_/¯, depends what you mean by “agency” but in the sense of “can break down large goals into subgoals somewhat reliably” I’d say yes
Still, I don’t think e.g. Claude Opus is “an ASI” in the sense that people who talk about timelines mean it, and I don’t think this is only because it doesn’t have any affordances for self-directed online learning.
Olli Järviniemi made something like this point:
in the post Near-mode thinking on AI:
https://www.lesswrong.com/posts/ASLHfy92vCwduvBRZ/near-mode-thinking-on-ai
In particular, here are the most relevant quotes on this subject:
I agree with all of that. My definition isn’t crisp enough; doing crappy general thinking and learning isn’t good enough. It probably needs to be roughly human level or above at those things before it’s takeover-capable and therefore really dangerous.
I didn’t intend to add the alignment definitions to the definition of AGI.
I’d argue that LLMs actually can’t think about anything outside of their training set, and it’s just that everything humans have thought about so far is inside their training set. But I don’t think that discussion matters here.
I agree that Claude isn’t an ASI by that definition. even if it did have longer-term goal-directed agency and self-directed online learning added, it would still be far subhuman in some important areas, arguably in general reasoning that’s critical for complex novel tasks like taking over the world or the economy. ASI needs to mean superhuman in every important way. And of course important is vague.
I guess a more reasonable goal is working toward the minimum description length that gets across all of those considerations. And a big problem is that timeline predictions to important/dangerous AI are mixed in with theories about what will make it important/dangerous. One terminological move I’ve been trying is the word “competent” to invoke intuitions about getting useful (and therefore potentially dangerous) stuff done.