FAWS comments on Why safe Oracle AI is easier than safe general AI, in a nutshell

FAWS 3 Dec 2011 12:42 UTC
5 points
The problem is that an Oracle AI (even assuming it were perfectly safe) does not actually do much to prevent an UFAI taking over later, and if you use it to help FAI along Hitler and Gandhi will still disagree. (An actual functioning FAI based on Hitler’s CEV would be preferable to the status quo, depressingly enough)
- JoshuaZ 3 Dec 2011 15:02 UTC
  2 points
  Parent
  
  (An actual functioning FAI based on Hitler’s CEV would be preferable to the status quo, depressingly enough)
  
  Can you expand on this logic? This isn’t obvious to me.
  - FAWS 3 Dec 2011 15:32 UTC
    5 points
    Parent
    I don’t have a strong insight into the psychology of Hitler and consider it possible that the CEV process would filter out the insanity and have mostly the same result as the CEV of pretty much anyone else.
    
    Even if not a universe filled with happy “Aryans” working on “perfecting” themselves would be a lot better than a universe filled with paper clips (or a dead universe), and from a consequentialist point of view genocide isn’t worse than being reprocessed into paper clips (this is assuming Hitler wouldn’t want to create an astronomic number of “untermenschen” just to make them suffer).
    
    On aggregate outcomes worse than a Hitler CEV AGI (eventual extinction from non-AI causes, UFAI, alien AGI with values even more distasteful than Hitler’s) seem quite a bit more likely than better outcomes (FAI, AI somehow never happening and humanity reaching a good outcome anyway, alien AGI with values less distasteful than Hitler’s).
    - wedrifid 3 Dec 2011 17:56 UTC
      6 points
      Parent
      (Yes, CEV is most likely better than nothing but...)
      
      I don’t have a strong insight into the psychology of Hitler and consider it possible that the CEV process would filter out the insanity and have mostly the same result as the CEV of pretty much anyone else.
      
      This is way, way, off. CEV isn’t a magic tool that makes people have preferences that we consider ‘sane’. People really do have drastically different preferences. Value is fragile.
      - FAWS 3 Dec 2011 18:31 UTC
        3 points
        Parent
        Well, to the extent apparent insanity is based on (and not merely justified by) factually wrong beliefs CEV should extract saner seeming preferences, and similar for apparent insanity resulting from inconsistency. I have no strong opinion on what the result in this particular case would be.
        wedrifid 3 Dec 2011 18:52 UTC
        −1 points
        Parent
        The important part was this:
        
        and have mostly the same result as the CEV of pretty much anyone else.
        
        No. No, no, no!
      - Tyrrell_McAllister 3 Dec 2011 19:28 UTC
        2 points
        Parent
        
        This is way, way, off. CEV isn’t a magic tool that makes people have preferences that we consider ‘sane’.
        
        FAWS didn’t say that CEV would filter out what-we-consider-to-be Hitler’s insanity. After all, we may be largely insane, too. I take FAWS to be suggesting that CEV would filter out Hitler’s actual insanity, possibly leaving something essentially the same as what CEV gets after it filters out my insanity.
        
        People really do have drastically different preferences.
        
        People express different preferences, but it is not obvious that their CEV-ified preferences would be so different. (I’m inclined to expect that they would be, but it’s not obvious.)
        wedrifid 4 Dec 2011 5:15 UTC
        0 points
        Parent
        
        After all, we may be largely insane, too. I take FAWS to be suggesting that CEV would filter out Hitler’s actual insanity, possibly leaving something essentially the same as what CEV gets after it filters out my insanity.
        
        Possibly. And possibly CEV<Mortimer Q. Snodgrass> is a universe tiled with stabbing victims! There seems to be some irresistible temptation to assume that extrapolating the volition of individuals will lead to convergence. This is a useful social stance to have and it is mostly harmless belief in practical terms for nearly everyone. Yet for anyone who is considering actual outcomes of agents executing coherent extrapolated volitions it is dangerous.
        
        People express different preferences, but it is not obvious that their CEV-ified preferences would be so different.
        
        We are considering individuals of entirely different upbringing and culture, from (quite possibly) a different genetic pool, with clearly different drives and desires and who by their very selection have an entirely different instinctive relationship with power and control. Sure, there are going to be similarities; relative to mindspace in general extrapolated humans will be comparatively similar. We can expect most models of such extrapolated humans to each have a node for sexiness even if the details of that node vary rather significantly. Yet assuming similarities too far beyond that requires altogether too much mind projection.
        Stuart_Armstrong 4 Dec 2011 10:41 UTC
        −3 points
        Parent
        If CEV and CEV end up the same, then the difference between me and hitler (such as whether we should kill jews) is not relevant to the CEV output, which makes me very worried about its content.
      - [deleted] 3 Dec 2011 18:16 UTC
        1 point
        Parent
        
        This is way, way, off. CEV isn’t a magic tool that makes people have preferences that we consider ‘sane’. People really do have drastically different preferences. Value is fragile.
        
        I wholeheartedly agree. It boggles my mind that people think they can predict what CEV would want, let alone CEV.
        Vladimir_Nesov 3 Dec 2011 18:45 UTC
        5 points
        Parent
        What distinguishes Hitler from other people in the arguments about the goodness of CEV’s output?
        
        Something must be known to decide that CEV is better than random noise, and the relevant distinctions between different people are the distinctions you can use to come to different conclusions about quality of CEV’s output. What you don’t know isn’t useful to discern the right answer, only what you do know can be used, even if almost nothing is known.