Daniel Kokotajlo comments on AI Views Snapshots

Daniel Kokotajlo 13 Dec 2023 22:00 UTC
14 points
8
Great idea! Here’s mine:
- Gerald Monroe 14 Dec 2023 0:58 UTC
  1 point
  0
  Parent
  Daniel so you’re assuming STEM+ comes with a strong instrumental convergence drive? That is, any machine that is STEM+, by definition, aces some benchmark of “STEM” tasks better than humans by enough margin to say it’s not chance.
  And so you’re also assuming that you can’t achieve that without constructing the AI in such a way that it exhibits power seeking behavior and is always running and trying to accomplish some goal? And obviously disempowering humans is an intermediate step along the route to achieving that goal and it’s convergent independent of the goal.
  Or you think humans will do that because such a machine is more useful to humans? (is it? It’s always trying to increase it’s own power, won’t that get expensive and worrisome to humans? GPT-nT editions are what humans want, right, and those models likely have slightly worse generality for much lower inference costs.)
  - Daniel Kokotajlo 14 Dec 2023 16:03 UTC
    2 points
    0
    Parent
    I’m claiming that in practice STEM+ machines will either already be power-seeking agents or will quickly lead to the creation of such.
    
    I agree there are possible futures in which we all coordinate to build AGI in a different paradigm, a more tool-like and/or interpretable one. But we don’t seem headed for those futures.
    - Gerald Monroe 14 Dec 2023 17:23 UTC
      2 points
      0
      Parent
      Summary : your model likely isn’t factoring in a large number of STEM+ capable systems being developed around the same time period, which has happened many times in the history of past innovations, and people natural preference for more reliable tools. I think you are also neglecting the slow speed of governments or militaries or industry to update anything, which would act to keep important equipment out of the hands of the most advanced and least reliable AI models. Finally I think you are thinking of very different training and benchmark tasks from today, where power seeking is part of the task environment and is required for a competitive score. (Concrete example: “beat Minecraft”. )
      
      So just to breakdown your claims a bit.
      
      STEM+ machines will inherently seek power.
      
      Do you have any more information to give on why you believe this? Do any current models seek power? Can you explain something about how you think the training environment works that rewards power seeking? I am thinking of some huge benchmark that humans endlessly are adding fresh tasks to, how are you imagining it working? Does the model get reward globally for increased score? Did humans not include a term for efficiency in the reward function?
      
      Humans who want tasks done by AI, presumably using stem+ machines, will do things like enable session to session memory or give the machine a credit card and a budget. Humans will select power seeking agents to do their tasks vs myopic agents that just efficiently do the task.
      
      If this is true, why does everyone keep trying to optimize ai? Smaller models tradeoff everything for benchmark performance. Am I wrong to think a smaller model has likely lost generality and power seeking to fit within a weight budget? That a 7B model fundamentally has less room for unwanted behavior?
      
      The majority of the atoms in the world in the hands of AI will belong to power seeking models. So this means things like the plot of Terminator 3, where the military decides to give their skynet AI model control over their computer networks, instead of say sending soldiers to unplug and reset all the computers infected with viruses.
      
      A world where humans don’t prefer power seeking models would be one where most of the atoms belong to myopic models. Not from coordination but self interest.
      
      How do you explain how the military equipment doesn’t work this way now? A lot of it uses private dedicated networks and older technology that is well tested.
      
      It seems like all 3 terms need to be true for power seeking AI to be able to endanger the world, do you agree with that? I tried to break down your claim into sub claims, if you think there is a different breakdown let me know.
      - Daniel Kokotajlo 14 Dec 2023 19:25 UTC
        2 points
        0
        Parent
        I’m not claiming that. Current examples of power-seeking agents include ChaosGPT and more generally most versions of AutoGPT that are given ambitious goals and lots of autonomy.
        I do endorse this. I agree that smaller more optimized and specialized models are less generally intelligent and therefore less dangerous. I don’t think the fact that lots of people are trying to optimize AI seriously undermines my claims.
        Not sure what you are getting at here. I think the majority of atoms will belong to power-seeking AIs eventually, but I am not making any claims about what the military will decide to do.
        Gerald Monroe 14 Dec 2023 20:22 UTC
        2 points
        0
        Parent
        Would you agree chaos GPT has a framework where it has a long running goal and humans have provided it the resources to run to achieve that goal? The goal assigned itself leads to power seeking, you wouldn’t expect such behavior spontaneously to happen with all goals. For example, ’make me the most money possible” and “get me the most money this trading day via this trading interface” are enormously different. Do you think a STEM+ model will power seek if given the latter goal?
        
        Like is our problem actually the model scheming against us or is the issue that some humans will misuse models and they will do their assigned tasks well.
        
        It undermines your claims if there exist multiple models, A and At, where the t model costs ¹⁄₁₀ as much to run and performs almost as well on the STEM+ benchmark. You are essentially claiming either humans wont prefer the sparsest model that does the job, fairly well optimized models will still power seek, or.. maybe compute will be so cheap humans just don’t care? Like Eliezers short story where toasters and sentient. I think I agree with you in principal that bad outcomes could happen, this disagreement is whether economic forces, etc, will prevent them.
        
        I am saying that for the outcome “the majority of the atoms belong to power seekers” this requires either the military stupidly gives weapons to power seeking machines (like in T3) or a weaker but smarter network of power seeking machines will be able to defeat the military. For the latter claim you quickly end up in arguments over things like the feasibility of mnt anytime soon, since there has to be some way for a badly out-resourced AI to win. “I don’t know how it does it but it’s smarter than us” then hits the issue of “why didn’t the military see through the plan using their own AI?”.