cousin_it comments on Three AI Safety Related Ideas

cousin_it 14 Dec 2018 8:12 UTC
LW: 20 AF: 9
0
AF
Hope you get better soon!

humans’ built-in safety mechanisms (such as moral and philosophical reflection)

I feel the opposite way: reflection is divergent and unsafe, while low-level instincts are safety checks. For example, “my grand theory of society says we must kill a class of people to usher in a golden age, but my gut says killing so many people is wrong.”
What links here?
- John_Maxwell's comment on Alignment By Default by johnswentworth (18 Aug 2020 8:49 UTC; 2 points)
- Wei Dai 14 Dec 2018 10:40 UTC
  LW: 18 AF: 7
  0
  AF Parent
  Hope you get better soon!
  
  Thanks, I think I’m getting better. The exercises I was using to keep my condition under control stopped working as well as they used to, but I may have figured out a new routine that works better.
  
  I feel the opposite way: reflection is divergent and unsafe, while low-level instincts are safety checks.
  
  The problem as I see it is that your gut instincts were trained on a very narrow set of data, and therefore can’t be trusted when you move outside of that narrow range. For example, suppose some time in the future you’re playing a game on a very powerful computer and you discover that the game has accidentally evolved a species of seemingly sentient artificial life forms. Would you trust your gut to answer the following questions?
  1. Is it ok to shut down the game?
  2. Is it ok to shut down the game if you first save its state?
  3. Is it ok to shut down the game if you first save its state but don’t plan to ever resume it?
  4. What if you can’t save the game but you can recreate the creatures using a pseudorandom seed? Is it ok to shut down the game in that case? What if you don’t plan to actually recreate the creatures?
  5. What if the same creatures are recreated in every run of the game and there are plenty of other copies of the game running in the galaxy (including some that you know will keep running forever)? Is it ok to shut down your copy then?
  6. What if there are two copies of the game running on the same computer with identical creatures in them? Is it ok to shut down one of the copies?
  7. What moral obligations do you have towards these creatures in general? For example are you obligated to prevent their suffering or to provide them with happier lives than they would “naturally” have?
  (Note that you can’t just “play it safe” and answer “no” to shutting down the game, because if the correct answer is “yes” and you don’t shut down the game, you could end up wasting a lot of resources that can be used to create more value in the universe.)
  
  I do agree that reflection can go wrong pretty often. But I don’t see what we can do about that except to try to figure out how to do it better (including how to use AI to help us do it better).
  - Shmi 17 Dec 2018 2:07 UTC
    4 points
    Parent
    Scott Aaronson’s answer to some of those would be based on the points he advances in The Ghost in the Quantum Turing Machine, where the ultimate crime is the irreversible removal of uniqueness and diversity. My take based on that approach is
    1. not OK to just shut down the game,
    2,3. OK if you save it first, as long as there is a chance of resuming it eventually,
    4. Not clear about the pseudorandom seed, depends on the way to calculate uniqueness.
    5,6. Identical copies are expendable if resources are needed for maintaining uniqueness, though probably not just for fun.
    7. Would need an additional moral framework to answer that. No obligations based on the above model.
    - Wei Dai 17 Dec 2018 3:30 UTC
      8 points
      Parent
      It looks like you’re talking about this from page 28-29 of https://www.scottaaronson.com/papers/giqtm3.pdf:
      
      I’m against any irreversible destruction of knowledge, thoughts, perspectives, adaptations, or ideas, except possibly by their owner. Such destruction is worse the more valuable the thing destroyed, the longer it took to create, and the harder it is to replace [...] Deleting the last copy of an em in existence should be prosecuted as murder, not because doing so snuffs out some inner light of consciousness (who is anyone else to know?), but rather because it deprives the rest of society of a unique, irreplaceable store of knowledge and experiences, precisely as murdering a human would.
      
      I don’t think this can be right. Suppose the universe consists of just two computers that are totally isolated from each other except that one computer can send a signal to the other to blow it up. There is one (different) virtual creature living in each computer, and one of them can press a virtual button to blow up the other computer and also get a small reward if he does so. Then according to this theory, he should think, “I should press this button, because deleting the other creature doesn’t deprive me of any unique, irreplaceable store of knowledge and experiences, because I never had any access to it in the first place.”
      
      ETA: Or consider the above setup except that creature A does have read-only access to creature B’s knowledge and experiences, and creature B is suffering terribly with no end in sight. According to this theory, creature A should think, “I shouldn’t press this button, because what’s important about B is that he provides me with a unique, irreplaceable store of knowledge and experiences. Whether or not creature B is suffering doesn’t matter.”
      - Shmi 17 Dec 2018 5:34 UTC
        2 points
        Parent
        I didn’t claim that the quoted passage universalizes as a version of negative utilitarianism in all imaginable cases, just that it makes sense intuitively in a variety of real-life situations as well as in the many cases not usually considered, like the ones you mentioned, or in case of reversible destruction Scott talks about, or human cloning, or…
        And we can see that in your constructed setup the rationale for preserving the variety “it deprives the rest of society of a unique, irreplaceable store of knowledge and experiences” no longer holds.
        Wei Dai 17 Dec 2018 6:22 UTC
        6 points
        Parent
        I don’t think it makes sense intuitively in the cases I mentioned, because intuitively I think we probably should consider the conscious experiences that the creatures are experiencing (whether they are positive or negative, or whether they are conscious at all), and Scott’s theory seems to be saying that we shouldn’t consider that. So I think the correct answer to my question 1 is probably something like “yes only if the creatures will have more negative than positive experiences over the rest of their lives (and their value to society as knowledge/experience do not make up for that)” instead of the “no” given by Scott’s theory. And 3 might be “no if overall the creatures will have more positive experiences, because by shutting down you’d be depriving them of those experiences”. Of course I’m really unsure about all of this but I don’t see how we can confidently conclude that the answer to 3 is “yes”.
        
        Shmi 18 Dec 2018 5:20 UTC
        4 points
        Parent
        Hmm, if you ask Scott directly, odds are, he will reply to you :)
- Wei Dai 21 Dec 2018 17:51 UTC
  LW: 11 AF: 6
  AF Parent
  It occurs to me that we should actually view system 1 and system 2 as safety checks for each other. See this comment for further discussion.