Stuart_Armstrong comments on Why safe Oracle AI is easier than safe general AI, in a nutshell

Stuart_Armstrong 6 Dec 2011 19:18 UTC
1 point
0
Hum… If we got the combined CEV of two people, one of whom thought violence was ennobling and one who thought it was degrading, would you expect either or both of:

a) their combined CEV would be the same as if we had started with two people both indifferent to violence

b) their combined CEV would be biased in a particular direction that we can know ahead of time
- lukeprog 6 Dec 2011 19:37 UTC
  0 points
  0
  Parent
  The idea is that their extrapolated volitions would plausibly not contain such conflicts, though it’s not clear yet whether we can know what that would be ahead of time. Nor is it clear whether their combined CEV would be the same as the combined CEV of two people indifferent to violence.
  - Stuart_Armstrong 7 Dec 2011 11:11 UTC
    0 points
    0
    Parent
    So, to my ears, it sounds like we don’t have much of an idea at all where the CEV would end up—which means that it most likely ends up somewhere bad, since most random places are bad.
    - Manfred 7 Dec 2011 14:22 UTC
      1 point
      0
      Parent
      Well, if it captures the key parts of what you want, you can know it will turn out fine even if you’re extremely ignorant about what exactly the result will be.
      - Stuart_Armstrong 7 Dec 2011 17:41 UTC
        2 points
        0
        Parent
        
        if it captures the key parts of what you want
        
        Yes, as the Spartans answered to Alexander the Great’s father when he said “You are advised to submit without further delay, for if I bring my army into your land, I will destroy your farms, slay your people, and raze your city.” :
        
        “If”.
        Manfred 7 Dec 2011 19:02 UTC
        0 points
        0
        Parent
        Yup. So, perhaps, focus on that “if.”
    - vallinder 7 Dec 2011 13:50 UTC
      1 point
      0
      Parent
      Shouldn’t we be able to rule out at least some classes of scenarios? For instance, paperclip maximization seems like an unlikely CEV output.
      - Stuart_Armstrong 7 Dec 2011 17:40 UTC
        3 points
        0
        Parent
        Most likely we can rule out most scenarios that all humans agree are bad. So better than clippy, probably.
        
        But we really need a better model of what CEV does! Then we can start to talk sensibly about it.
    - Mark_Friedenbach 17 Oct 2013 17:29 UTC
      0 points
      0
      Parent
      
      which means that it most likely ends up somewhere bad, since most random places are bad.
      
      I don’t think that follows, at all. CEV isn’t a random-walk. It will at the very least end up at a subset of human values. Maybe you meant something different here, by the word ‘bad’?