Rob Bensinger comments on Pivotal outcomes and pivotal processes

Rob Bensinger 29 Jun 2022 8:37 UTC
LW: 19 AF: 11
3
AF
- With pretty high confidence, you expect sharp left turn to happen (in almost all trajectories)
- This is to a large extent based on the belief that at some point “systems start to work really well in domains really far beyond the environments of their training” which is roughly the same as “discovering a core of generality” and few other formulations. These systems will be in some meaningful sense fundamentally different from eg Gato
That’s right, though the phrasing “discovering a core of generality” here sounds sort of mystical and mysterious to me, which makes me wonder whether you can see the perspective from which this is a very obvious and normal belief. I get a similar vibe when people talk about a “secret sauce” and say they can’t understand why MIRI thinks there might be a secret sauce—treating generalizability as a sort of occult property.
The way I would phrase it is in very plain, concrete terms:
- If a machine can multiply two-digit numbers together as well as four-digit numbers together, then it can probably multiply three-digit numbers together. The structure of these problems is similar enough that it’s easier to build a generalist that can handle ‘multiplication’ than to solve two-digit and four-digit multiplication using fundamentally different techniques.
- Similarly, it’s easier to teach a human or AI how to navigate physical environments in general, than to teach them how to navigate all physical environments except parking garages. Parking garages aren’t different enough from other physical environments, and the techniques for modeling and navigating physical spaces work too well, when they work at all.
- Similarly, it’s easier to build an AI that is an excellent physicist and has the potential to be a passable or great chemist and/or biologist, than to build an excellent physicist that just can’t do chemistry or biology, no matter how many chemistry experiments or chemistry textbooks it sees. The problems have too much overlap.
We can see that the latter is true just by reflecting on what kinds of mental operations go into generating hypotheses about ontologies/carvings on the world, generating hypothesis about the state of the world given some ontology, fitting hypotheses about different levels/scales into a single cohesive world-model, calculating value of information, strategically directing attention toward more fruitful directions of thought, coming up with experiments, thinking about possible experimental outcomes, noticing anomalies, deducing implications and logical relationships, coming up with new heuristics and trying them out, etc. These clearly overlap enormously across the relevant domains.
We can also observe that this is in fact what happened with humans. We have zero special-purpose brain machinery for any science, or indeed for science as a category; we just evolved to be able to model physical environments well, and this generalized to all sciences once it generalized to any.
For things to not go this way would be quite weird.
- From your perspective, this is based on thinking deeply about the nature of such system (note that this mostly based on hypothetical systems, and an analogy with evolution)
Doesn’t seem to pass my ITT. Like, it’s true in a sense that I’m ‘thinking about hypothetical systems’, because I only care about human cognition inasmuch as it seems likely to generalize to AGI cognition. But this still seems like it’s treating generality as a mysterious occult property, and not as something coextensive with all our observations of general intelligences.
- My claim roughly is this is only part of what’s going on, where the actual think is: people start with a deep prior on “continuity in the space of intelligent systems”. Looking into a specific question about hypothetical systems, their search in argument space is guided by this prior, and they end up mostly sampling arguments supporting their prior. (This is not to say the arguments are wrong.)
Seems to me that my core intuition is about there being common structure shared between physics research, biology research, chemistry research, etc.; plus the simple observation that humans don’t have specialized evolved modules for chemistry vs physics vs biology. Discontinuity is an implication of those views, not a generator of those views.
Like, sure, if I had a really incredibly strong prior in favor of continuity, then maybe I would try really hard to do a mental search for reasons not to accept those prime-facie sources of discontinuity. And since I don’t have a super strong prior like that, I guess you could call my absence of a super-continuity assumption a ‘discontinuity assumption’.
But it seems like a weird and unnatural way of trying to make sense of my reasoning: I don’t have an extremely strong prior that everything must be continuous, but I also don’t have an extremely strong prior that everything must be spherical, or that everything must be purple. I’m not arriving at any particular conclusions via a generator that keeps saying ‘not everything is spherical!’ or ‘not everything is purple!’; I’m not a non-sphere-ist or an anti-purple-ist; the deep secret heart and generator for all my views is not that I have a deep and abiding faith in “there exist non-spheres”. And putting me in a room with some weird person who does think everything is a sphere doesn’t change any of that.
You probably don’t agree with the above point, but notice the correlations:
- You expect sharp left turn due to discontinuity in “architectures” dimensions (which is the crux according to you)
- But you also expect jumps in capabilities of individual systems (at least I think so)
- Also, you expect majority of hope in a “sharp right turn” histories (in contrast to smooth right turn histories)
I would say that there are two relevant sources of discontinuity here:
1. AGI is an invention, and inventions happen at particular times. This inherently involves a 0-to-1 transition when the system goes from ‘not working’ to ‘working’. Paul and I believe equally in discontinuities like this, though we may disagree about whether AGI has already been ‘invented’ (such that we just need to iterate and improve on it), vs. whether the invention lies in the future.
2. General intelligence is powerful and widely applicable. This is another category of discontinuity Paul believes can happen (e.g., washing machines are allowed to have capabilities that non-washing-machines lack; nukes are allowed to have capabilities that non-nukes lack), though Paul may be somewhat less impressed than me with general intelligence overall (resulting in a smaller gap/discontinuity). Separately, Paul’s belief in AGI development predictability, AI research efficiency, and ‘AGI is already solved’ (see 1, above), each serve to reduce the importance of this discontinuity.
‘AGI is an invention’ and ‘General intelligence is powerful’ aren’t weird enough beliefs, I think, to call for some special explanation like ‘Rob B thinks the world is very discontinuous’. Those are obvious first-pass beliefs to have about the domain, regardless of whether they shake out as correct on further analysis.
‘We need a pivotal act’ is a consequence of 1 and 2, not a separate discontinuity. If AGI is a sudden huge dangerous deal (because 1 and 2 is true), then we’ll need to act fast or we’ll die, and there are viable paths to quickly ending the acute risk period. The discontinuity in the one case implies the discontinuity in this new case. There’s no need for a further explanation.