Thane Ruthenis comments on Transcript of Sam Altman’s interview touching on AI safety

Thane Ruthenis 20 Jan 2023 18:23 UTC
6 points
5
A lot of slow takeoff, gradual capabilities ramp-up, multipolar AGI world type of thinking. Personally, I agree with him this sort of scenario seems both more desirable and more likely.
I think the operative word in “seems more likely” here is “seems”. It seems like a more sophisticated, more realistic, more modern and satisfyingly nuanced view, compared to “the very first AGI we train explodes like a nuclear bomb and unilaterally sets the atmosphere on fire, killing everyone instantly”. The latter seems like an old view, a boringly simplistic retrofuturistic plot. It feels like there’s a relationship between these two scenarios, and that the latter one is a rough first-order approximation someone lifted out of e. g. The Terminator to get people interested in the whole “AI apocalypse” idea at the onset of it all. Then we gained a better understanding, sketched out detailed possibilities that take into account how AI and AI research actually work in practice, and refined that rough scenario. As the result, we got that picture of a slower multipolar catastrophe.
A pleasingly complicated view! One that respectfully takes into account all of these complicated systems of society and stuff. It sure feels like how these things work in real life! “It’s not like the AI wakes up and decides to be evil,” perish the thought.
That seeming has very little to do with reality. The unilateral-explosion isn’t the old, outdated scenario — it’s simply a different scenarios that’s operating on a different model of how intelligence explosions proceed. And as far as its proponents are concerned, its arguments haven’t been overturned at all, and nothing about how DL works rules it out.
But it sure seems like the rough naive view that the Real Experts have grown out of a while ago; and that those who refuse to update simply haven’t done that growing-up, haven’t realized there’s a world outside their chosen field with all these Complicated Factors you need to take into account.
It makes it pretty hard to argue against. It’s so low-status.
… At least, that’s how that argument feels to me, on a social level.
(Edit: Uh, to be clear, I’m not saying that there’s no other reasons to buy the multipolar scenario except “it seems shiny”; that a reasonable person could not come to believe it for valid reasons. I think it’s incorrect, and that there are some properties that unfairly advantage it in the social context, but I’m not saying it’s totally illegitimate.)
- paulfchristiano 20 Jan 2023 18:59 UTC
  18 points
  5
  Parent
  the very first AGI we train explodes like a nuclear bomb and unilaterally sets the atmosphere on fire, killing everyone instantly
  To understand whether this is the kind of thing that could be true or false, it seems like you should say what “the very first AGI” means. What makes a system an AGI?
  I feel like this view is gradually looking less plausible as we build increasingly intelligent and general systems, and they persistently don’t explode (though only gradually because it’s unclear what the view means).
  It looks to me like what is going to happen is that AI systems will gradually get better at R&D. They can help with R&D by some unknown combination of complementing and replacing human labor, either way leading to acceleration. The key question is the timeline for that acceleration—how long between “AIs are good enough at R&D that their help increases the rate of progress (on average, over relevant domains) by more than doubling the speed of human labor would” and “dyson sphere.” I feel that plausible views range from 2 months to 20 years, based on quantitative questions of returns curves and complementarity between AI and humans and the importance of capital. I’d overall guess 2-8 years.
  - Thane Ruthenis 20 Jan 2023 19:26 UTC
    6 points
    1
    Parent
    Yeah, I think there’s a sharp-ish discontinuity at the point where we get to AGI. “General intelligence” is, to wit, general — it implements some cognition that can efficiently derive novel heuristics for solving any problem/navigating arbitrary novel problem domains. And a system that can’t do that is, well, not an AGI.
    Conceptually, the distinction between an AGI and a pre-AGI system feels similar to the distinction between a system that’s Turing-complete and one that isn’t:
    Any Turing-complete system implements a set of rules that suffices to represent any mathematical structure/run any program. A system that’s just “slightly below” Turing-completeness, however, is dramatically more limited.
    Similarly, an AGI has a complete set of some cognitive features that make it truly universal — features it can use to bootstrap any other capability it needs, from scratch. By contrast, even a slightly “pre-AGI” system would be qualitatively inferior, not simply quantitatively so.
    There’s still some fuzziness around the edges, like whether any significantly useful R&D capabilities only happen at post-AGI cognition, or to what extent being an AGI is the sufficient and not only the necessary condition for an omnicidal explosion.
    But I do think there’s a meaningful sense in which AGI-ness is a binary, not a continuum. (I’m also hopeful regarding nailing all of this down mathematically, instead of just vaguely gesturing at it like this.)
    - paulfchristiano 20 Jan 2023 19:58 UTC
      14 points
      6
      Parent
      It seems like a human is universal in the sense that they can think about new problem solving strategies, evaluate them, adopt successful ones, etc. Most of those new problem-solving strategies are developed by long trial and error and cultural imitation of successful strategies.
      If a language model could do the same thing with chain of thought, would you say that it is an AGI? So would the existence of such systems, without an immediate intelligence explosion, falsify your view?
      If such a system seemed intuitively universal but wasn’t exploding, what kind of observation would tell you that it isn’t universal after all, and therefore salvage your claim? Or maybe to put this more sharply: how did you decide that text-davinci-003, prompted to pursue an open-ended goal and given the ability to instantiate and delegate to new copies of itself, isn’t an AGI?
      It seems to me that you will probably have an “AGI” in the sense you are gesturing at here well before the beginning of explosive R&D growth. I don’t really see the argument for why an intelligence explosion would follow quickly from that point. (Indeed, my view is that you could probably build an AGI in this sense out of text-davinci-003, though the result would be uneconomical.)
      - Thane Ruthenis 20 Jan 2023 20:46 UTC
        10 points
        4
        Parent
        If such a system seemed intuitively universal but wasn’t exploding, what kind of observation would tell you that it isn’t universal after all, and therefore salvage your claim?
        Given that we don’t understand how current LLMs work, and how the “space of problems” generally looks like, it’s difficult to come up with concrete tests that I’m confident I won’t goalpost-move on. A prospective one might be something like this:
        If you invent or find a board game of similar complexity to chess that [the ML model] has never seen before and explain the rules using only text (and, if [the ML model] is multimodal, also images), [a pre-AGI model] will not be able to perform as well at the game as an average human who has never seen the game before and is learning it for the first time in the same way.
        I. e., an AGI would be able to learn problem-solving in completely novel, “basically-off-distribution” domains. And if a system that has capabilities like this doesn’t explode (and it’s not deliberately trained to be myopic or something), that would falsify my view.
        But for me to be confident in putting weight on that test, we’d need to clarify some specific details about the “minimum level of complexity” of the new board game, and that it’s “different enough” from all known board games for the AI to be unable to just generalize from them… And given that it’s unclear in which directions it’s easy to generalize, I expect I wouldn’t be confident in any metric we’d be able to come up with.
        I guess a sufficient condition for AGI would be “is able to invent a new scientific field out of whole cloth, with no human steering, as an instrumental goal towards solving some other task”. But that’s obviously an overly high bar.
        As far as empirical tests for AGI-ness go, I’m hoping for interpretability-based ones instead. I. e., that we’re able to formalize what “general intelligence” means, then search for search in our models.
        As far as my epistemic position, I expect three scenarios here:
        We develop powerful interpretability tools, and they directly show whether my claims about general intelligence hold.
        We don’t develop powerful interpretability tools before an AGI explodes like I fear and kills us all.
        AI capabilities gradually improve until we get to scientific-field-inventing AGIs, and my mind changes only then.
        In scenarios where I’m wrong, I mostly don’t expect to ever encounter any black-boxy test like you’re suggesting which seems convincing to me, before I encounter overwhelming evidence that makes convincing me a moot point.
        (Which is not to say my position on this can’t be moved at all — I’m open to mechanical arguments about why cognition doesn’t work how I’m saying it does, etc. But I don’t expect to ever observe an LLM whose mix of capabilities and incapabilities makes me go “oh, I guess that meets my minimal AGI standard but it doesn’t explode, guess I was wrong on that”.)
- Andy_McKenzie 20 Jan 2023 18:37 UTC
  2 points
  0
  Parent
  You’re right that the operative word in “seems more likely” is “seems”! I used the word “seems” because I find this whole topic really confusing and I have a lot of uncertainty.
  It sounds like there may be a concern that I am using the absurdity heuristic or something similar against the idea of fast take-off and associated AI apocalypse. Just to be clear, I most certainly do not buy absurdity heuristic arguments in this space, would not use them, and find them extremely annoying. We’ve never seen anything like AI before, so our intuition (which might suggest that the situation seems absurd) is liable to be very wrong.
  - Thane Ruthenis 20 Jan 2023 19:07 UTC
    4 points
    0
    Parent
    Oh, I think I should’ve made clearer that I wasn’t aiming that rant at you specifically. Just outlining my general impression of how the two views feel socially.