Thane Ruthenis comments on TurnTrout’s shortform feed

Thane Ruthenis 17 Dec 2023 5:31 UTC
6 points
1
I’d like to know what’s the least impressive task which cannot be done by a ‘non-agentic’ system, that you are very confident cannot be done safely and non-agentically in the next two years.
That’s incredibly difficult to predict, because minimal things only a general intelligence could do are things like “deriving a few novel abstractions and building on them”, but from the outside this would be indistinguishable from it recognizing a cached pattern that it learned in-training and re-applying it, or merely-interpolating between a few such patterns. The only way you could distinguish between the two is if you have a firm grasp of every pattern in the AI’s training data, and what lies in the conceptual neighbourhood of these patterns, so that you could see if it’s genuinely venturing far from its starting ontology.
Or here’s a more precise operationalization from my old reply to Rohin Shah:
Consider a scheme like the following:
- Let $T_{2}$ be the current date.
- Train an AI on all of humanity’s knowledge up to a point in time $T_{1}$ , where $T_{1} < T_{2}$ .
- Assemble a list $D$ of all scientific discoveries made in the time period $(T_{1}; T_{2}]$ .
- See if the AI can replicate these discoveries.
At face value, if the AI can do that, it should be considered able to “do science” and therefore AGI, right?
I would dispute that. If the period $(T_{1}; T_{2}]$ is short enough, then it’s likely that most of the cognitive work needed to make the leap to any discovery in $D$ is already present in the data up to $T_{1}$ . Making a discovery from that starting point doesn’t necessarily require developing new abstractions/doing science — it’s possible that it may be done just by interpolating between a few already-known concepts. And here, some asymmetry between humans and e. g. SOTA LLMs becomes relevant:
- No human knows everything the entire humanity knows. Imagine if making some discovery in $D$ by interpolation required combining two very “distant” concepts, like a physics insight and advanced biology knowledge. It’s unlikely that there’d be a human with sufficient expertise in both, so a human will likely do it by actual-science (e. g., a biologist would re-derive the physics insight from first principles).
- An LLM, however, has a bird’s eye view on the entire human concept-space up to $T_{1}$ . It directly sees both the physics insight and the biology knowledge, at once. So it can just do an interpolation between them, without doing truly-novel research.
Thus, the ability to produce marginal scientific insights may mean either the ability to “do science”, or that the particular scientific insight is just a simple interpolation between already-known but distant concepts.
On the other hand, now imagine that the period $(T_{1}; T_{2}]$ is very large, e. g. from 1940 to 2020. We’d then be asking our AI to make very significant discoveries, such that they surely can’t be done by simple interpolation, only by actually building chains of novel abstractions. But… well, most humans can’t do that either, right? Not all generally-intelligent entities are scientific geniuses. Thus, this is a challenge a “weak” AGI would not be able to meet, only a genius/superintelligent AGI — i. e., only an AGI that’s already an extinction threat.
In theory, there should be a pick of $(T_{1}; T_{2}]$ that fits between the two extremes. A set of discoveries such that they can’t be done by interpolation, but also don’t require dangerous genius to solve.
But how exactly are we supposed to figure out what the right interval is? (I suppose it may not be an unsolvable problem, and I’m open to ideas, but skeptical on priors.)
I can absolutely make strong predictions regarding what non-AGI AIs would be unable to do. But these predictions are, due to the aforementioned problem, necessarily a high bar, higher than the “minimal” capability. (Also I expect an AI that can meet this high bar to also be the AI that quickly ends the world, so.)
Here’s my recent reply to Garrett, for example. tl;dr: non-GI AIs would not be widely known to be able to derive whole multi-layer novel mathematical frameworks if tasked with designing software products that require this. I’m a bit wary of reality somehow Goodharting on this prediction as well, but it seems robust enough, so I’m tentatively venturing it.
I currently think it’s about as well as you can do, regarding “minimal incapability predictions”.
What links here?
- Thane Ruthenis's comment on Current AIs Provide Nearly No Data Relevant to AGI Alignment by Thane Ruthenis (17 Dec 2023 5:49 UTC; 4 points)