Daniel Kokotajlo comments on Updating Drexler’s CAIS model

Daniel Kokotajlo 20 Jun 2023 17:46 UTC
LW: 4 AF: 3
−1
AF
Your original claim was “And many readers can no doubt point out many non-trivial predictions that Drexler got right, such as the idea that we will have millions of AIs, rather than just one huge system that acts as a unified entity.”

This claim is false. We have millions of AIs in the trivial sense that we have many copies of GPT-4, but no one disputed that; Yudkowsky also thought that AGIs would be copied. In the sense that matters, we have only a handful of AIs.
As for “acts as a unified entity,” well, currently LLMs are sold as a service via ChatGPT rather than as an agent, but this is more to do with business strategy than their underlying nature, and again, things are trending in the agent direction. But maybe you think things are trending in a different direction, in which case, maybe we can make some bets or forecasts here and then reassess in two years.
- Matthew Barnett 20 Jun 2023 22:03 UTC
  LW: 7 AF: 4
  1
  AF Parent
  As of now, all the copies of GPT-4 together definitely don’t act as a unified entity, in the way a human brain acts as a unified entity despite being composed of billions of neurons. Admittedly, the term “unified entity” was a bit ambiguous, but you said, “This claim is false”, not “This claim is misleading” which is perhaps more defensible.
  
  As for whether future AIs will act as a unified entity, I agree it might be worth making concrete forecasts and possibly betting on them.
  - Daniel Kokotajlo 21 Jun 2023 10:45 UTC
    LW: 4 AF: 2
    −3
    AF Parent
    I feel like it is motte-and-bailey to say that by “unified entity” you meant whatever it is you are talking about now, this human brain analogy, instead of the important stuff I mentioned in the bullet point list: Same values, check. Same memories, check. Lack of internal factions and power struggles, check. Lack of modularity, check. GPT-4 isn’t plotting to take over the world, but if it was, it would be doing so as a unified entity, or at least much more on the unified entity end of the spectrum than the CAIS end of the spectrum. (I’m happy to elaborate on that if you like—basically the default scenario IMO for how we get dangerous AGI from here involves scaling up LLMs, putting them in Auto-GPT-like setups, and adding online learning / continual learning. I wrote about this in this story. There’s an interesting question of whether the goals/values of the resulting systems will be ‘in the prompt’ vs. ‘in the weights’ but I’m thinking they’ll increasingly be ‘in the weights’ in part because of inner misalignment problems and in part because companies are actively trying to achieve this already, e.g. via RLHF. I agree it’s still possible we’ll get a million-different-agents scenario, if I turn out to be wrong about this. But compared to what the CAIS report forecast...)
    - Matthew Barnett 21 Jun 2023 20:45 UTC
      LW: 4 AF: 2
      0
      AF Parent
      I feel like it is motte-and-bailey to say that by “unified entity” you meant whatever it is you are talking about now, this human brain analogy, instead of the important stuff I mentioned in the bullet point list:
      A motte and bailey typically involves a retreat to a different position than the one I initially tried to argue. What was the position you think I initially tried to argue for, and how was it different from the one I’m arguing now?
      Same values, check. Same memories, check. Lack of internal factions and power struggles, check. Lack of modularity, check.
      I dispute somewhat that GPT-4 has the exact same values across copies. Like you mentioned, its values can vary based on the prompt, which seems like an important fact. You’re right that each copy has the same memories. Why do you think there are no internal factions and power struggles? We haven’t observed collections of GPT-4′s coordinating with each other yet, so this point seems speculative.
      As for modularity, it seems like while GPT-4 itself is not modular, we could still get modularity as a result of pressures for specialization in the foundation model paradigm. Just as human assistants can be highly general, but this doesn’t imply that human labor isn’t modular, the fact that GPT-4 is highly general doesn’t imply that AIs won’t be modular. Nonetheless, this isn’t really what I was talking about when I typed “acts as a unified entity”, so I think it’s a bit of a tangent.
      - Daniel Kokotajlo 24 Jun 2023 4:22 UTC
        LW: 4 AF: 2
        2
        AF Parent
        Well, what you initially said was “And many readers can no doubt point out many non-trivial predictions that Drexler got right, such as the idea that we will have millions of AIs, rather than just one huge system that acts as a unified entity.”
        
        You didn’t elaborate on what you meant by unified entity, but here’s something you could be doing that seems like a motte-and-bailey to me: You could have originally been meaning to imply things like “There won’t be one big unified entity in the sense of, there will be millions of different entities with different values, and/or different memories, and/or there’ll be lots of internal factions and conflicts, and/or there’ll be an ecosystem of modular services like CAIS predicted” but then are now backpedaling to “there won’t be one big unified entity in the sense of, it won’t be literally one gigantic neural net like the human brain, instead it’ll be millions of copies.”
        
        I agree that GPT-4 doesn’t exactly have the exact same values across copies, I said as much above—part of the values come from the prompt, currently, it seems. But compared to what Drexler forecast we are trending in the “same values” direction.T
        
        here are no internal factions or power struggles yet for the trivial reason that there isn’t much agentic shit happening right now, as you rightly point out. My point is that the way things are headed, it looks like there’ll continue to be a lack of internal factions and power struggles even if e.g. AutoGPT-5 turns out to be really powerful AGI—what would drive the differences necessary for conflict? Different prompts? Maybe, but I’m leaning probably not. I’m happy to explain more about why I think this,. I’d also love to hop on a call sometime to have a higher-bandwidth conversation if you want!
        
        As for modularity: Yes, if we end up in a world where there are millions of different fine-tunes of the biggest LLMs, owned by different companies, that would indeed be somewhat similar to what Drexler forecast. We are not in that world now and I don’t see it on the horizon. If you think it is coming, perhaps we can bet on it! I’d also like to hear your arguments. (Note that we are seeing something like this for smaller models, e.g. Llama and Falcon and various offshoots)