johnswentworth comments on If we solve alignment, do we die anyway?

johnswentworth 23 Aug 2024 13:54 UTC
32 points
16
- If takeoff is slow-ish, a pivotal act (preventing more AGIs from being developed) will be difficult.
- If no pivotal act is performed, RSI-capable AGI proliferates. This creates an n-way non-iterated Prisoner’s Dilemma where the first to attack, wins.
These two points seem to be in direct conflict. The sorts of capabilities and winner-take-all underlying dynamics which would make “the first to attack wins” true are also exactly the sorts of capabilities and winner-take-all dynamics which would make a pivotal act tractable.
Or, to put it differently: the first “attack” (though might not look very “attack”-like) is the pivotal act; if the first attack wins, that means the pivotal act worked, and therefore wasn’t that difficult. Conversely, if a pivotal act is too hard, then even if an AI attacks first and wins, it has no ability prevent new AI from being built and displacing it; if it did have that ability, then the attack would be a pivotal act.
- Seth Herd 23 Aug 2024 15:21 UTC
  13 points
  6
  Parent
  Yes; except that a successful act can still be quite difficult.
  
  You could reframe the concern to be that pivotal acts in a slow takeoff are prone to be bloody and dangerous. And because they are, and humans are likely to retain control, a pivotal act may be put off until it’s even more bloody—like a nuclear conflict or sending the sun nova.
  
  Worse yet, the “pivotal act” may be performed by the worst (human) actor, not the best.
  - Seth Herd 25 Aug 2024 5:47 UTC
    9 points
    4
    Parent
    Just to elaborate a little:
    You are right that the same capabilities enable a pivotal act. My concern is that they won’t be used for one (where pivotal act is defined as a good act).
    Having thought about it some more, I think the biggest problem in the multipolar, human-controlled RSI-capable AGI scenario is that it tends to be the worst actor that defects first and controls the future.
    More ethical humans will tend to be more timid with committing or risking mass destruction to achieve their ends, so they’ll tend to hold off on aggressive moves that could win.
    “Hide and create a superbrain and a robot army” are not the first things a good person tells their AGI to do, let alone inducing nuclear strikes that increase one’s odds of winning at great cost. Someone with more selfish designs on the future may have much less trouble issuing those orders.