Thane Ruthenis comments on Transcript of Sam Altman’s interview touching on AI safety

Thane Ruthenis 20 Jan 2023 19:26 UTC
6 points
1
Yeah, I think there’s a sharp-ish discontinuity at the point where we get to AGI. “General intelligence” is, to wit, general — it implements some cognition that can efficiently derive novel heuristics for solving any problem/navigating arbitrary novel problem domains. And a system that can’t do that is, well, not an AGI.
Conceptually, the distinction between an AGI and a pre-AGI system feels similar to the distinction between a system that’s Turing-complete and one that isn’t:
- Any Turing-complete system implements a set of rules that suffices to represent any mathematical structure/run any program. A system that’s just “slightly below” Turing-completeness, however, is dramatically more limited.
- Similarly, an AGI has a complete set of some cognitive features that make it truly universal — features it can use to bootstrap any other capability it needs, from scratch. By contrast, even a slightly “pre-AGI” system would be qualitatively inferior, not simply quantitatively so.
There’s still some fuzziness around the edges, like whether any significantly useful R&D capabilities only happen at post-AGI cognition, or to what extent being an AGI is the sufficient and not only the necessary condition for an omnicidal explosion.
But I do think there’s a meaningful sense in which AGI-ness is a binary, not a continuum. (I’m also hopeful regarding nailing all of this down mathematically, instead of just vaguely gesturing at it like this.)
- paulfchristiano 20 Jan 2023 19:58 UTC
  14 points
  6
  Parent
  It seems like a human is universal in the sense that they can think about new problem solving strategies, evaluate them, adopt successful ones, etc. Most of those new problem-solving strategies are developed by long trial and error and cultural imitation of successful strategies.
  If a language model could do the same thing with chain of thought, would you say that it is an AGI? So would the existence of such systems, without an immediate intelligence explosion, falsify your view?
  If such a system seemed intuitively universal but wasn’t exploding, what kind of observation would tell you that it isn’t universal after all, and therefore salvage your claim? Or maybe to put this more sharply: how did you decide that text-davinci-003, prompted to pursue an open-ended goal and given the ability to instantiate and delegate to new copies of itself, isn’t an AGI?
  It seems to me that you will probably have an “AGI” in the sense you are gesturing at here well before the beginning of explosive R&D growth. I don’t really see the argument for why an intelligence explosion would follow quickly from that point. (Indeed, my view is that you could probably build an AGI in this sense out of text-davinci-003, though the result would be uneconomical.)
  - Thane Ruthenis 20 Jan 2023 20:46 UTC
    10 points
    4
    Parent
    If such a system seemed intuitively universal but wasn’t exploding, what kind of observation would tell you that it isn’t universal after all, and therefore salvage your claim?
    Given that we don’t understand how current LLMs work, and how the “space of problems” generally looks like, it’s difficult to come up with concrete tests that I’m confident I won’t goalpost-move on. A prospective one might be something like this:
    If you invent or find a board game of similar complexity to chess that [the ML model] has never seen before and explain the rules using only text (and, if [the ML model] is multimodal, also images), [a pre-AGI model] will not be able to perform as well at the game as an average human who has never seen the game before and is learning it for the first time in the same way.
    I. e., an AGI would be able to learn problem-solving in completely novel, “basically-off-distribution” domains. And if a system that has capabilities like this doesn’t explode (and it’s not deliberately trained to be myopic or something), that would falsify my view.
    But for me to be confident in putting weight on that test, we’d need to clarify some specific details about the “minimum level of complexity” of the new board game, and that it’s “different enough” from all known board games for the AI to be unable to just generalize from them… And given that it’s unclear in which directions it’s easy to generalize, I expect I wouldn’t be confident in any metric we’d be able to come up with.
    I guess a sufficient condition for AGI would be “is able to invent a new scientific field out of whole cloth, with no human steering, as an instrumental goal towards solving some other task”. But that’s obviously an overly high bar.
    As far as empirical tests for AGI-ness go, I’m hoping for interpretability-based ones instead. I. e., that we’re able to formalize what “general intelligence” means, then search for search in our models.
    As far as my epistemic position, I expect three scenarios here:
    We develop powerful interpretability tools, and they directly show whether my claims about general intelligence hold.
    We don’t develop powerful interpretability tools before an AGI explodes like I fear and kills us all.
    AI capabilities gradually improve until we get to scientific-field-inventing AGIs, and my mind changes only then.
    In scenarios where I’m wrong, I mostly don’t expect to ever encounter any black-boxy test like you’re suggesting which seems convincing to me, before I encounter overwhelming evidence that makes convincing me a moot point.
    (Which is not to say my position on this can’t be moved at all — I’m open to mechanical arguments about why cognition doesn’t work how I’m saying it does, etc. But I don’t expect to ever observe an LLM whose mix of capabilities and incapabilities makes me go “oh, I guess that meets my minimal AGI standard but it doesn’t explode, guess I was wrong on that”.)