Adele Lopez comments on Views on when AGI comes and on strategy to reduce existential risk

Adele Lopez 9 Jul 2023 14:30 UTC
LW: 26 AF: 17
0
AF
Is there a specific thing you think LLMs won’t be able to do soon, such that you would make a substantial update toward shorter timelines if there was an LLM able to do it within 3 years from now?
- TsviBT 10 Jul 2023 6:42 UTC
  LW: 17 AF: 10
  9
  AF Parent
  Well, making it pass people’s “specific” bar seems frustrating, as I mentioned in the post, but: understand stuff deeply—such that it can find new analogies / instances of the thing, reshape its idea of the thing when given propositions about the thing taken as constraints, draw out relevant implications of new evidence for the ideas.
  
  Like, someone’s going to show me an example of an LLM applying modus ponens, or making an analogy. And I’m not going to care, unless there’s more context; what I’m interested in is [that phenomenon which I understand at most pre-theoretically, certainly not explicitly, which I call “understanding”, and which has as one of its sense-experience emanations the behavior of making certain “relevant” applications of modus ponens, and as another sense-experience emanation the behavior of making analogies in previously unseen domains that bring over rich stuff from the metaphier].
  - Adele Lopez 10 Jul 2023 6:59 UTC
    LW: 4 AF: 3
    0
    AF Parent
    Alright, to check if I understand, would these be the sorts of things that your model is surprised by?
    
    An LLM solves a mathematical problem by introducing a novel definition which humans can interpret as a compelling and useful concept.
    An LLM which can be introduced to a wide variety of new concepts not in its training data, and after a few examples and/or clarifying questions is able to correctly use the concept to reason about something.
    A image diffusion model which is shown to have a detailed understanding of anatomy and 3D space, such that you can use it to transform an photo of a person into an image of the same person in a novel pose (not in its training data) and angle with correct proportions and realistic joint angles for the person in the input photo.
    - TsviBT 10 Jul 2023 7:21 UTC
      LW: 5 AF: 3
      0
      AF Parent
      Unfortunately, more context is needed.
      
      An LLM solves a mathematical problem by introducing a novel definition which humans can interpret as a compelling and useful concept.
      
      I mean, I could just write a python script that prints out a big list of definitions of the form
      
      “A topological space where every subset with property P also has property Q”
      
      and having P and Q be anything from a big list of properties of subsets of topological spaces. I’d guess some of these will be novel and useful. I’d guess LLMs + some scripting could already take advantage of some of this. I wouldn’t be very impressed by that (though I think I would be pretty impressed by the LLM being able to actually tell the difference between valid proofs in reasonable generality). There are some versions of this I’d be impressed by, though. Like if an LLM had been the first to come up with one of the standard notions of curvature, or something, that would be pretty crazy.
      
      An LLM which can be introduced to a wide variety of new concepts not in its training data, and after a few examples and/or clarifying questions is able to correctly use the concept to reason about something.
      
      I haven’t tried this, but I’d guess if you give an LLM two lists of things where list 1 is [things that are smaller than a microwave and also red] and list 2 is [things that are either bigger than a microwave, or not red], or something like that, it would (maybe with some prompt engineering to get it to reason things out?) pick up that “concept” and then use it, e.g. sorting a new item, or deducing from “X is in list 1″ to “X is red”. That’s impressive (assuming it’s true), but not that impressive.
      
      On the other hand, if it hasn’t been trained on a bunch of statements about angular momentum, and then it can—given some examples and time to think—correctly answer questions about angular momentum, that would be surprising and impressive. Maybe this could be experimentally tested, though I guess at great cost, by training a LLM on a dataset that’s been scrubbed of all mention of stuff related to angular momentum (disallowing math about angular momentum, but allowing math and discussion about momentum and about rotation), and then trying to prompt it so that it can correctly answer questions about angular momentum. Like, the point here is that angular momentum is a “new thing under the sun” in a way that “red and smaller than microwave” is not a new thing under the sun.
      - Roman Leventov 10 Jul 2023 8:50 UTC
        3 points
        2
        Parent
        On the other hand, if it hasn’t been trained on a bunch of statements about angular momentum, and then it can—given some examples and time to think—correctly answer questions about angular momentum, that would be surprising and impressive. Maybe this could be experimentally tested, though I guess at great cost, by training a LLM on a dataset that’s been scrubbed of all mention of stuff related to angular momentum (disallowing math about angular momentum, but allowing math and discussion about momentum and about rotation), and then trying to prompt it so that it can correctly answer questions about angular momentum. Like, the point here is that angular momentum is a “new thing under the sun” in a way that “red and smaller than microwave” is not a new thing under the sun.
        Until recently, I thought that the fact that LLMs are not strong and efficient online (or quasi-online, i.e., need few examples) conceptual learners is a “big obstacle” for AGI or ASI. I no longer think so. Yes, humans evidently still have an edge in this, that is, humans can somehow relatively quickly and efficiently “surgeon” their world models to accommodate new concepts and use them efficiently in a far-ranging way. (Even though I suspect that we over-glorify this ability in humans and it more realistically takes weeks or even months for humans to fully integrate new conceptual frameworks in their thinking than hours, still, they should be able to do so without much external examples, which will be lacking if the concept is actually very new.)
        I no longer think this handicaps LLMs much. New powerful concepts that permeate practical and strategic reasoning in the real world are rarely invented and are spread through the society slowly. Just being a skillful user of existing concepts that are amptly described in books and otherwise in the training corpus of LLMs should be well enough for gaining capacity for recursive self-improvement, and quite far superhuman intelligence/strategy/agency more generally.
        Then, imagine that superhuman LLMs-based agents “won” and killed all humans. Even if they themselves don’t (or couldn’t!) invent ML paradigms for efficient online concept learning, they could still sort of hack through it, through experimenting with new concepts, trying to run a lot of simulations with them, checking these simulations against reality (filtering out incoherence/bad concepts), and then re-training themselves on the results of these simulations, and then giving text labels to the features found in their own DNNs to mark the corresponding concept.
        TsviBT 10 Jul 2023 22:54 UTC
        2 points
        0
        Parent
        
        Just being a skillful user of existing concepts
        
        I don’t think they’re skilled users of existing concepts. I’m not saying it’s an “obstacle”, I’m saying that this behavior pattern would be a significant indicator to me that the system has properties that make it scary.
  - Roman Leventov 10 Jul 2023 8:36 UTC
    1 point
    0
    Parent
    Analogies: “Emergent Analogical Reasoning in Large Language Models”
    - TsviBT 10 Jul 2023 22:55 UTC
      2 points
      0
      Parent
      Not what I mean by analogies.
- Ben Pace 18 Oct 2023 0:00 UTC
  LW: 4 AF: 3
  0
  AF Parent
  I think the argument here basically implies that language models will not produce any novel, useful concepts in any existing industries or research fields that get substantial adoption (e.g. >10% of ppl use it, or a widely cited paper) in those industries, in the next 3 years, and if it did this, then the end would be nigh (or much nigher).
  To be clear, you might get new concepts from language models about language if you nail some Chris Olah style transparency work, but the language model itself will not output ones that aren’t about language in the text.
  - TsviBT 24 Oct 2023 1:31 UTC
    LW: 4 AF: 2
    0
    AF Parent
    I roughly agree. As I mentioned to Adele, I think you could get sort of lame edge cases where the LLM kinda helped find a new concept. The thing that would make me think the end is substantially nigher is if you get a model that’s making new concepts of comparable quality at a comparable rate to a human scientist in a domain in need of concepts.
    
    if you nail some Chris Olah style transparency work
    
    Yeah that seems right. I’m not sure what you mean by “about language”. Sorta plausibly you could learn a little something new about some non-language domain that the LLM has seen a bunch of data about, if you got interpretability going pretty well. In other words, I would guess that LLMs already do lots of interesting compression in a different way than humans do it, and maybe you could extract some of that. My quasi-prediction would be that those concepts
    
    are created using way more data than humans use for many of their important concepts; and
    are weirdly flat, and aren’t suitable out of the box for a big swath of the things that human concepts are suitable for.