Ruby comments on Towards more cooperative AI safety strategies

Ruby 19 Jul 2024 4:31 UTC
12 points
10

Just block all AI projects and maintain your ability to continue doing so in the future via maintaining military supremacy.

That to me is a very very non-central case of “take over the world”, if it is one at all.

This is about “what would people think when they hear that description” and I could be wrong, but I expect “the plan is to take over the world” summary would lead people to expect “replace governments” level of interference, not “coerce/trade to ensure this specific policy”—and there’s a really really big difference between the two.
- Richard_Ngo 19 Jul 2024 18:37 UTC
  26 points
  10
  Parent
  I think this whole debate is missing the point I was trying to make. My claim was that it’s often useful to classify actions which tend to lead you to having a lot of power as “structural power-seeking” regardless of what your motivations for those actions are. Because it’s very hard to credibly signal that you’re accumulating power for the right reasons, and so the defense mechanisms will apply to you either way.
  In this case MIRI was trying to accumulate a lot of power, and claiming that they were aiming to use it in the “right way” (do a pivotal act) rather than the “wrong way” (replacing governments). But my point above is that this sort of claim is largely irrelevant to defense mechanisms against power-seeking.
  (Now, in this case, MIRI was pursuing a type of power that was too weird to trigger many defense mechanisms, though it did trigger some “this is a cult” defense mechanisms. But the point cross-applies to other types of power that they, and others in AI safety, are pursuing.)
  - habryka 19 Jul 2024 18:43 UTC
    7 points
    −14
    Parent
    I don’t super buy this. I don’t think MIRI was trying to accumulate a lot of power. In my model of the world they were trying to design a blueprint for some institution or project that would mostly have highly conditional power, that they would personally not wield.
    In the metaphor of classical governance, I think what MIRI was doing was much more “design a blueprint for a governance agency” not “put themselves in charge of a governance agency”. Designing a blueprint is not a particularly power-seeking move, especially if you expect other people to implement it.
    What links here?
    sunwillrise's comment on Towards more cooperative AI safety strategies by Richard_Ngo (19 Jul 2024 20:57 UTC; 2 points)
    - [ ]
      [deleted]
  - Ruby 19 Jul 2024 19:54 UTC
    5 points
    3
    Parent
    I got your point and think it’s valid and I don’t object to calling MIRI structurally power-seeking to the extent they wanted to execute a pivotal act themselves (Habryka claims they weren’t, I’m not knowledgeable on that front).
    
    I still think it’s important to push back against a false claim that someone had the goal of taking over the world.