Thane Ruthenis comments on Agentized LLMs will change the alignment landscape

Thane Ruthenis 10 Apr 2023 9:02 UTC
3 points
0
But what about those improvements also running GPT-6?
Same reasoning: I expect that GPT-N will be omnicide-capable out of the box / with a minimal self-prompting wrapper, else not at all. For any given AI model, a marginally better wrapper isn’t going to tide it over to transformative AI. Thus, if a new model is released, and the first dumb “let’s give it agency!” idea doesn’t work, we can probably relax about that specific model entirely. (This is mainly in opposition to your original claim that Auto-GPT can “fan the sparks of AGI in GPT-4 into a flame”.)
On a larger scale, if progressively larger and more capable models based on a given architecture keep not taking off when put in a self-prompt wrapper, and they keep failing in the exact same way, that’s probably evidence that the entire architecture is safe. And I think GPT-4 is failing in the same way GPT-3 or GPT-2 would’ve.
Not to say that I am, at this point, utterly confident that GPT-N isn’t going to take off; I’m not. But inasmuch as Auto-GPT’s performance is evidence for or against that, I think it’s evidence against.
Because then you have an AI with goals stated in intuitive natural language
Yeah, that’s… part of the reason I don’t expect this to work. I don’t think any text output should be viewed as the LLM’s “thoughts”. Whatever thoughts it has happen inside forward passes, and I don’t think it natively maps them into the human-legible monologues in which the wider Auto-GPT “thinks”. I think there’s a fundamental disconnect between the two kinds of “cognition”, and the latter type is much weaker.
If GPT-N were AGI, it would’ve recognized the opportunity offered by the self-wrapper, and applied optimization from its end, figured out how to map from native-thoughts into language-thoughts, and thereby made even the dumbest wrapper work. But it didn’t do that, and I don’t think any improvement to the wrapper is going to make it do that, because it fundamentally can’t even try to figure out how. The problem is on its end, within the frozen parameters, in the lacking internal architecture. Its mental ontology doesn’t have the structures for even conceiving of performing this kind of operation, and it’s non-AGI so it can’t invent the idea on the fly.
(Man, I’m going to eat so much crow when some insultingly dumb idea on the order of “let’s think step-by-step!” gets added to the Auto-GPT wrapper next month and it takes off hard and kills everyone.)
- Seth Herd 10 Apr 2023 13:03 UTC
  5 points
  0
  Parent
  Interesting. Thanks for your thoughts. I think this difference of opinion shows me where I’m not fully explaining my thinking. And some differences between human thinking and LLM “thinking”. In humans, the serial nature of linking thoughts together is absolutely vital to our intelligence. But LLMs have a lot more seriality in the production of each utterance.
  
  I think I need to write another post that goes much further into my reasoning here to work this out. Thanks for the conversation.
  - Thane Ruthenis 10 Apr 2023 13:20 UTC
    2 points
    0
    Parent
    Glad it was productive!
    I think this difference of opinion shows me where I’m not fully explaining my thinking
    I perceive a lot of inferential distance on my end as well. My model here is informed by a number of background conclusions that I’m fairly confident in, but which haven’t actually propagated into the set of commonly-assumed background assumptions.
    - beren 10 Apr 2023 20:51 UTC
      5 points
      1
      Parent
      I perceive a lot of inferential distance on my end as well. My model here is informed by a number of background conclusions that I’m fairly confident in, but which haven’t actually propagated into the set of commonly-assumed background assumptions.
      I have found this conversation very interesting. Would be very interested if you could do a quick summary or writeup of the background conclusions you are referring to. I have my own thoughts about the feasibility of massive agency gains from AutoGPT like wrappers but would be interested to hear your thoughts
      - Thane Ruthenis 3 May 2023 15:23 UTC
        3 points
        0
        Parent
        Here’s the future post I was referring to!
        Seth Herd 4 May 2023 17:31 UTC
        3 points
        0
        Parent
        I saw it. I really like it. Despite my relative enthusiasm for LMCA alignment, I think the points you raise there mean it’s still quite a challenge to get it right enough to survive.
        
        I’ll try to give you a substantive response on that post today.
      - Thane Ruthenis 10 Apr 2023 21:02 UTC
        3 points
        0
        Parent
        I may make a post about it soon. I’ll respond to this comment with a link or a summary later on.