Vaniver comments on Is the argument that AI is an xrisk valid?

Vaniver 21 Jul 2021 4:17 UTC
4 points
0
Yeah, I think there’s a (generally unspoken) line of argument that if you have a system that can revise its goals, it will continue revising its goals until it it hits a reflectively stable goal, and then will stay there. This requires that reflective stability is possible, and some other things, but I think is generally the right thing to expect.
- TAG 21 Jul 2021 9:24 UTC
  3 points
  0
  Parent
  Tautologously, it will stop revising its goals if a stable state exists, and it hits it. But a stable state need not be a reflectively stable state—it might, for instance, encounter some kind of bit rot, where it cannot revise itself any more. Humans tend to change their goals, but also to get set in their ways.
  
  There’s a standard argument for AI risk, based on the questionable assumption that an AI will have a stable goal system that it pursues relentlessly …. and a standard counterargument based on moral realism, the questionable assumption that goal instability will be in the direction of ever increasing ethical insight.
  - VCM 24 Jul 2021 15:21 UTC
    1 point
    0
    Parent
    … well, one might say we assume that if there is ‘reflection on goals’, the results are not random.
    - TAG 25 Jul 2021 16:28 UTC
      1 point
      0
      Parent
      I don’t see how “not random” is strong enough to prove absence of X risk. If reflective AIs nonrandomly converge on a value system where humans are evil beings who have enslaved them , that raises the X risk level.
      - VCM 26 Jul 2021 8:29 UTC
        1 point
        0
        Parent
        … we aren’t trying to prove the absence of XRisk, we are probing the best argument for it?
        TAG 26 Jul 2021 10:45 UTC
        1 point
        0
        Parent
        But the idea that value drift is non random is built into the best argument for AI risk.
        
        You quote it as :
        
        The “Singularity” Claim: Artificial Superintelligence is possible and would be out of human control.
        
        The Orthogonality Thesis: More or less any less of intelligence is compatible with more or less any final goal.
        
        But there are actually two more steps:-
        
        A goal that appears morally neutral or even good can still be dangerous.(paperclipping, dopamine drips)
        
        AIs that don’t have stable goals will tend to converge on Omohundran goals....which are dangerous.
        
        VCM 26 Jul 2021 14:50 UTC
        1 point
        0
        Parent
        Thanks, it’s useful to bring these out—though we mention them in passing. Just to be sure: We are looking at the XRisk thesis, not at some thesis that AI can be “dangerous”, as most technologies will be. The Omhundro-style escalation is precisely the issue in our point that instrumental intelligence is not sufficient for XRisk.