Daniel Kokotajlo comments on A framework for thinking about AI power-seeking

Daniel Kokotajlo 30 Jul 2024 5:00 UTC
LW: 4 AF: 4
2
AF
It sounds like you are objecting to Premise 2: “Some of these AIs will be so capable that they will be able to take over the world very easily, with a very high probability of success, via a very wide variety of methods.”

Note that you were the one who introduced the “violent” qualifier; the OP just talks about the broader notion of takeover.
- Matthew Barnett 30 Jul 2024 7:28 UTC
  LW: 2 AF: 1
  0
  AF Parent
  I don’t think I’m objecting to that premise. A takeover can be both possible and easy without being rational. In my comment, I focused on whether the expected costs to attempting a takeover are greater than the benefits, not whether the AI will be able to execute a takeover with a high probability.
  
  Or, put another way, one can imagine an AI calculating that the benefit to taking over the world is negative one paperclip on net (when factoring in the expected costs and benefits of such an action), and thus decide not to do it.
  
  Separately, I focused on “violent” or “unlawful” takeovers because I think that’s straightforwardly what most people mean when they discuss world takeover plots, and I wanted to be more clear about what I’m objecting to by making my language explicit.
  
  To the extent you’re worried about a lawful and peaceful AI takeover in which we voluntarily hand control to AIs over time, I concede that my comment does not address this concern.
  - Daniel Kokotajlo 30 Jul 2024 14:12 UTC
    LW: 4 AF: 4
    4
    AF Parent
    The expected costs you describe seem like they would fall under the “very easily” and “very high probability of success” clauses of Premise 2. E.g. you talk about the costs paid for takeover, and the risk of failure. You talk about how there won’t be one AI that controls everything, presumably because that makes it harder and less likely for takeover to succeed.
    
    I think people are and should be concerned about more than just violent or unlawful takeovers. Exhibit A: Persuasion/propaganda. AIs craft a new ideology that’s as virulent as communism and christianity combined, and it basically results in submission to and worship of the AIs, to the point where humans voluntarily accept starvation to feed the growing robot economy. Exhibit B: For example, suppose the AIs make self-replicating robot factories and bribe some politicians to make said factories’ heat pollution legal. Then they self-replicate across the ocean floor and boil the oceans (they are fusion-powered), killing all humans as a side-effect, except for those they bribed who are given special protection. These are extreme examples but there are many less extreme examples which people should be afraid of as well. (Also as these examples show, ’lawful and peaceful” =/= “voluntary”)
    
    That said, I’m curious what your p(misaligned-AIs-take-over-the-world-within-my-lifetime) is, including gradual nonviolent peaceful takeovers. And what your p(misaligned-AIs-take-over-the-world-within-my-lifetime|corporations achieving AGI by 2027 and doing only basically what they are currently doing to try to align them)
    - Matthew Barnett 30 Jul 2024 15:03 UTC
      LW: 4 AF: 3
      0
      AF Parent
      I still think I was making a different point. For more clarity and some elaboration, I previously argued in a short form post that the expected costs of a violent takeover can exceed the benefits even if the costs are small. The reason is because, at the same time taking over the entire world becomes easier, the benefits of doing so can also get lower, relative to compromise. Quoting from my post,
      
      The central argument here would be premised on a model of rational agency, in which an agent tries to maximize benefits minus costs, subject to constraints. The agent would be faced with a choice: (1) Attempt to take over the world, and steal everyone’s stuff, or (2) Work within a system of compromise, trade, and law, and get very rich within that system, in order to e.g. buy lots of paperclips. The question of whether (1) is a better choice than (2) is not simply a question of whether taking over the world is “easy” or whether it could be done by the agent. Instead it is a question of whether the benefits of (1) outweigh the costs, relative to choice (2).
      
      In my comment in this thread, I meant to highlight the costs and constraints on an AI’s behavior in order to explain how these relative cost-benefits do not necessarily favor takeover. This is logically distinct from arguing that the cost alone of takeover would be high.
      
      I think people are and should be concerned about more than just violent or unlawful takeovers. Exhibit A: Persuasion/propaganda.
      
      Unfortunately I think it’s simply very difficult to reliably distinguish between genuine good-faith persuasion and propaganda over speculative future scenarios. Your example is on the extreme end of what’s possible in my view, and most realistic scenarios will likely instead be somewhere in-between, with substantial moral ambiguity. To avoid making vague or sweeping assertions about this topic, I prefer being clear about the type of takeover that I think is most worrisome. Likewise:
      
      B: For example, suppose the AIs make self-replicating robot factories and bribe some politicians to make said factories’ heat pollution legal. Then they self-replicate across the ocean floor and boil the oceans (they are fusion-powered), killing all humans as a side-effect, except for those they bribed who are given special protection.
      
      I would consider this act both violent and unlawful, unless we’re assuming that bribery is widely recognized as legal, and that boiling the oceans did not involve any violence (e.g., no one tried to stop the AIs from doing this, and there was no conflict). I certainly feel this is the type of scenario that I intended to argue against in my original comment, or at least it is very close.
      - Daniel Kokotajlo 31 Jul 2024 13:19 UTC
        LW: 4 AF: 3
        0
        AF Parent
        It seems to me that both you and Joe are thinking about this very similarly—you are modelling the AIs as akin to rational agents that consider the costs and benefits of their various possible actions and maximize-subject-to-constraints. Surely there must be a way to translate between your framework and his.
        
        As for the examples… so do you agree then? Violent or unlawful takeovers are not the only kinds people can and should be worried about? (If you think bribery is illegal, which it probably is, modify my example so that they use a lobbying method which isn’t illegal. The point is, they find some unethical but totally legal way to boil the oceans.) As for violence… we don’t consider other kinds of pollution to be violent, e.g. that done by coal companies that are (slowly) melting the icecaps and causing floods etc., so I say we shouldn’t consider this to be violent either.
        
        I’m still curious to hear your p(misaligned-AIs-take-over-the-world-within-my-lifetime) is, including gradual nonviolent peaceful takeovers. And what your p(misaligned-AIs-take-over-the-world-within-my-lifetime|corporations achieving AGI by 2027 and doing only basically what they are currently doing to try to align them)
        Unfortunately I think it’s simply very difficult to reliably distinguish between genuine good-faith persuasion and propaganda over speculative future scenarios. Your example is on the extreme end of what’s possible in my view, and most realistic scenarios will likely instead be somewhere in-between, with substantial moral ambiguity.
        I’m not sure what this paragraph is doing—I said myself they were extreme examples. What does your first sentence mean?