xpym comments on A Nonconstructive Existence Proof of Aligned Superintelligence

xpym 16 Sep 2024 10:20 UTC
3 points
0

We’ll say that a state is in fact reachable if a group of humans could in principle take actions with actuators—hands, vocal chords, etc—that could realize that state.

The main issue here is that groups of humans may in principle be capable of great many things, but there’s a vast chasm between “in principle” and “in practice”. A superintelligence worthy of the name would likely be able to come up with plans that we wouldn’t in practice be able to even check exhaustively, which is the sort of issue that we want alignment for.
- Roko 16 Sep 2024 12:33 UTC
  2 points
  0
  Parent
  This is not a problem for my argument. I am merely showing that any state reachable by humans, must also be reachable by AIs. It is fine if AIs can reach more states.
  - xpym 16 Sep 2024 12:57 UTC
    1 point
    0
    Parent
    Hmm, right. You only need assume that there are coherent reachable desirable outcomes. I’m doubtful that such an assumption holds, but most people probably aren’t.
    - Roko 17 Sep 2024 9:50 UTC
      2 points
      0
      Parent
      
      I’m doubtful that such an assumption holds
      
      Why?
      - xpym 19 Sep 2024 8:51 UTC
        1 point
        0
        Parent
        Because humans have incoherent preferences, and it’s unclear whether a universal resolution procedure is achievable. I like how Richard Ngo put it, “there’s no canonical way to scale me up”.
        Roko 19 Sep 2024 17:09 UTC
        2 points
        0
        Parent
        
        humans have incoherent preferences
        
        This isn’t really a problem with alignment so there’s no need to address it here. Alignment means the transmission of a preference ordering to an action sequence. Lacking a coherent preference ordering for states of the universe (or histories, for that matter) is not an alignment problem.
        xpym 20 Sep 2024 9:19 UTC
        1 point
        0
        Parent
        
        This isn’t really a problem with alignment
        
        I’d rather put it that resolving that problem is a prerequisite for the notion of “alignment problem” to be meaningful in the first place. It’s not technically a contradiction to have an “aligned” superintelligence that does nothing, but clearly nobody would in practice be satisfied with that.
        Roko 21 Sep 2024 16:32 UTC
        2 points
        0
        Parent
        you can have an alignment problem without humans. E.g. two strawberries problem.