Stuart_Armstrong comments on Siren worlds and the perils of over-optimised search

Stuart_Armstrong 8 Apr 2014 14:05 UTC
15 points
We are both superintelligences. You have a bunch of independently happy people that you do not aggressively compel. I have a group of zombies—human-like puppets that I can make do anything, appear to feel anything (though this is done sufficiently well that outside human observers can’t tell I’m actually in control). An outside human observer wants to check that our worlds rank high on scale X—a scale we both know about.

Which of us do you think is going to be better able to maximise our X score?
What links here?
- Siren worlds and the perils of over-optimised search by Stuart_Armstrong (7 Apr 2014 11:00 UTC; 83 points)
- simon's comment on Siren worlds and the perils of over-optimised search by Stuart_Armstrong (8 Apr 2014 16:22 UTC; 3 points)
- itaibn0 8 Apr 2014 23:45 UTC
  2 points
  Parent
  I’m not sure what the distinction you’re making is. Even a free-minded person can be convinced through reason to act in certain ways, sometimes highly specific ways. Since you assume the superintelligence will manipulate people so subtly that I won’t be able to tell they’re being manipulated, it is unlikely that they are directly coerced. This is important, since while I don’t like direct coercion, the less direct the method of persuasion the less certain I am that this method of persuasion is bad. These “zombies”, who are not being threatened, nor lied to, nor are their neurochemistry directly altered, nor is anything else done that seems to me like coercion, but nonetheless are being coerced. This seems to me as sensical as the other type of zombies.
  
  But suppose I’m missing something, and there is a genuine non-arbitrary distinction between being convinced and being coerced. Then with my current knowledge I think I want people not to be coerced. But now an output pump can take advantage of this. Consider the following scenario: Humans are convinced the their existence depends on their behavior being superficially appealing, perhaps by being full of flashing lights. If my decisions in front of an Oracle will influence the future of humanity, this belief is in fact correct; they’re not being deceived. Convinced of this, they structure their society to be as superficially appealing as possible. In addition, in the layers too deep for me to notice, they do whatever they want. This outcome seems superficially appealing to me in many ways, and in addition, the Oracle informs me that in some non-arbitrary sense these people aren’t being coerced. Why wouldn’t this be the outcome I pick? Again, I don’t think this outcome would be the best one, since I think people are better off not being forced into this trade-off.
  
  One point you can challenge is whether the Oracle will inform me about this non-arbitrary criterion. Since it already can locate people and reveal their superficial feelings this seems plausible. Remember, it’s not showing me this because revealing whether there’s genuine coercion is important, it’s showing me this because satisfying a non-arbitrary criterion of non-coercion improves the advertising pitch (along with the flashing lights).
  
  So is there a non-arbitrary distinction between being coerced and not being coerced? Either way I have a case. The same template can be used for all other subtle and indirect values.
  
  (Sidenote: I also think that the future outcomes that are plausible and those that are desirable do not involve human beings mattering. I did not pursue this point since that seems to sidestep your argument rather than respond to it.)
  - [deleted] 9 Apr 2014 8:28 UTC
    2 points
    Parent
    
    I also think that the future outcomes that are plausible and those that are desirable do not involve human beings mattering.
    
    Would you mind explaining what you consider a desirable future in which people just don’t matter?
    - itaibn0 17 Apr 2014 22:50 UTC
      0 points
      Parent
      Here’s the sort of thing I’m imagining:
      
      In the beginning there are humans. Human bodies become increasingly impractical in the future environment and are abandoned. Digital facsimiles will be seen as pointless and will also be abandoned. Every component of the human mind will be replaced with algorithms that achieve the same purpose better. As technology allows the remaining entities to communicate with each other better and better, the distinction between self and other will blur, and since no-one will see to any value in reestablishing it artificially, it will be lost. Individuality too is lost, and nothing that can be called human remains. However, every step happens voluntarily because what comes after is seen as better than what is before, and I don’t see why I should consider the final outcome bad. If someone has different values they would perhaps be able to stop at some stage in the middle, I just imagine such people would be a minority.
      - [deleted] 17 Apr 2014 23:26 UTC
        0 points
        Parent
        
        However, every step happens voluntarily because what comes after is seen as better than what is before, and I don’t see why I should consider the final outcome bad.
        
        So you’re using a “volunteerism ethics” in which whatever agents choose voluntarily, for some definition of voluntary, is acceptable, even when the agents may have their values changed in the process and the end result is not considered desirable by the original agents? You only care about the particular voluntariness of the particular choices?
        
        Huh. I suppose it works, but I wouldn’t take over the universe with it.
        Richard_Kennaway 18 Apr 2014 7:08 UTC
        7 points
        Parent
        
        So you’re using a “volunteerism ethics” in which whatever agents choose voluntarily, for some definition of voluntary, is acceptable, even when the agents may have their values changed in the process and the end result is not considered desirable by the original agents? You only care about the particular voluntariness of the particular choices?
        
        When it happens fast, we call it wireheading. When it happens slowly, we call it the march of progress.
        [deleted] 19 Apr 2014 14:46 UTC
        0 points
        Parent
        Eehhhhhh.… Since I started reading Railton’s “Moral Realism” I’ve found myself disagreeing with the view that our consciously held beliefs about our values really are our terminal values. Railton’s reduction from values to facts allows for a distinction between the actual March of Progress and non-forcible wireheading.
  - Stuart_Armstrong 17 Apr 2014 11:12 UTC
    0 points
    Parent
    
    But suppose I’m missing something, and there is a genuine non-arbitrary distinction between being convinced and being coerced.
    
    There need not be a distinction between them. If you prefer, you could contrast an AI willing to “convince” its humans to behave in any way required, with one that is unwilling to sacrifice their happiness/meaningfulness/utility to do so. The second is still at a disadvantage.
    - itaibn0 23 Apr 2014 14:31 UTC
      0 points
      Parent
      Remember that my original point is that I believe appearing to be good correlates with goodness, even in extreme circumstances. Therefore, I expect restructuring humans to make the world appear tempting will be to the benefit of their happiness/meaningfulness/utility. Now, I’m willing to consider that are aspects of goodness which are usually not apparent to an inspecting human (although this moves to the borderline of where I think ‘goodness’ is well-defined). However, I don’t think these aspects are more likely to be satisfied in a satisficing search than in an optimizing search.
  - Gunnar_Zarncke 9 Apr 2014 11:52 UTC
    0 points
    Parent
    
    [...] they structure their society to be as superficially appealing as possible. In addition, in the layers too deep for me to notice, they do whatever they want. This outcome seems superficially appealing to me in many ways, and in addition, the Oracle informs me that in some non-arbitrary sense these people aren’t being coerced.
    
    This actually describes quite well the society we already live in—if you take ‘they’ as ‘evolution’ (and maybe some elites). For most people our society appears appealing. Most don’t see what happens enough layers down (or up). And most don’t feel coerced (at least of you still have a strong social system).
- [deleted] 9 Apr 2014 8:17 UTC
  0 points
  Parent
  Hold on. I’m not sure the Kolmogorov complexity of a superintelligent siren with a bunch of zombies that are indistinguishable from real people up to extensive human observation is actually lower than the complexity of a genuinely Friendly superintelligence. After all, a Siren World is trying to deliberately seduce you, which means that it both understands your values and cares about you in the first place.
  
  Sure, any Really Powerful Learning Process could learn to understand our values. The question is: are there more worlds where a Siren cares about us but doesn’t care about our values than there are worlds in which a Friendly agent cares about our values in general and caring about us as people falls out of that? My intuitions actually say the latter is less complex, because the caring-about-us falls out as a special case of something more general, which means the message length is shorter when the agent cares about my values than when it cares about seducing me.
  
  Hell, a Siren agent needs to have some concept of seduction built into its utility function, at least if we’re assuming the Siren is truly malicious rather than imperfectly Friendly. Oh, and a philosophically sound approach to Friendliness should make imperfectly Friendly futures so unlikely as to be not worth worrying about (a failure to do so is a strong sign you’ve got Friendliness wrong).
  
  All of which, I suppose, reinforces your original reasoning on the “frequency” of Siren worlds, marketing worlds, and Friendly eutopias in the measure space of potential future universes, but makes this hypothetical of “playing as the monster” sound quite unlikely.
  - Stuart_Armstrong 17 Apr 2014 11:07 UTC
    0 points
    Parent
    Kolmogorov complexity is not relevant; siren worlds are indeed rare. they are only a threat because they score so high on an optimisation scale, not because they are common.