Roko comments on A Master-Slave Model of Human Preferences

Roko Dec 30, 2009, 2:03 PM
5 points
Yeah, I mean this discussion is—rather amusingly—rather reminiscient of my first encounter with the CEV problem 2.5 years ago.

Comment by Ricky Loynd Jun 23, 2007 7:39 am Here’s my attempt to summarize a common point that Roko and I are trying to make. The underlying motivation for extrapolating volition sounds reasonable, but it depends critically on the AI’s ability to distinguish between goals and beliefs, between preferences and expectations, so that it can model human goals and preferences while substituting its own correct beliefs and expectations. But when you start dissecting most human goals and preferences, you find they contain deeper layers of belief and expectation. If you keep stripping those away, you eventually reach raw biological drives which are not a human belief or expectation. (Though even they are beliefs and expectations of evolution, but let’s ignore that for the moment.) Once you strip away human beliefs and expectations, nothing remains but biological drives, which even the animals have. Yes, an animal, by virtue of its biological drives and ability to act, is more than a predicting rock, but that doesn’t address the issue at hand. Why is it a tragedy when a loved one dies? Is it because the world no longer contains their particular genetic weighting of biological drives? Of course not. After all, they may have left an identical twin to carry forward the very same genetic combination. But it’s not the biology that matters to us. We grieve because what really made that person a person is now gone, and that’s all in the brain; the shared experiences, their beliefs whether correct or mistaken or indeterminate, their hopes and dreams, all those things that separate humans from animals, and indeed, that separate one human from most other humans. All that the brain absorbs and becomes throughout the course of a life, we call the soul, and we see it as our very humanity, that big, messy probability distribution describing our accumulated beliefs and expectations about ourselves, the universe, and our place in it. So if the AI models a human while substituting its own beliefs and anticipations of future experiences, then the AI has discarded all that we value in each other. UNLESS you draw a line somewhere, and crisply define which human beliefs get replaced and which ones don’t. Constructing toy examples where such a line is possible to imagine does not mean that the distinction can be made in any general way, but CEV absolutely requires that there be a concrete distinction.
- Roko Dec 30, 2009, 2:18 PM
  5 points
  Parent
  
  Constructing toy examples where such a line is possible to imagine does not mean that the distinction can be made in any general way, but CEV absolutely requires that there be a concrete distinction.
  
  Basically, CEV works to the extent that there exists a belief/desire separation in a given person. In the thread on the SIAI blog, I posted certain cases where human goals are founded on false beliefs or logically inconsistent thinking, sometimes in complex ways. What is left of the time cube guy once you subtract off his false beliefs and delusions? Not much, probably. The guy is effectively not salvageable, because his identity and values are probably so badly tangled up with the false beliefs that there is no principled way to untangle them, no unique way of extrapolating him that should be considered “correct”.
  - Vladimir_Nesov Jan 1, 2010, 4:44 PM
    2 points
    Parent
    
    What is left of the time cube guy once you subtract off his false beliefs and delusions? Not much, probably.
    
    Beware: you are making a common sense-based prediction about what would be the output of a process that you don’t even have the right concepts for specifying! (See my reply to your other comment.)
    - Roko Jan 1, 2010, 9:08 PM
      0 points
      Parent
      
      common sense-based prediction
      
      It is true that I should sprinkle copious amounts of uncertainty on this prediction.
- SilasBarta Jan 10, 2010, 4:10 PM
  2 points
  Parent
  Wow. Too bad I missed this when it was first posted. It’s what I wish I’d said when justifying my reply to Wei_Dai’s attempted belief/values dichotomy here and here.
  - Roko Jan 10, 2010, 6:09 PM
    0 points
    Parent
    I don’t fully agree with Ricky here, but I think he makes a half-good point.
    
    The ungood part of his comment—and mine—is that you can only do your best. If certain people’s minds are too messed up to actually extract values from, then they are just not salvageable. My mind definitely has values that are belief-independent, though perhaps not all of what I think of as “my values” have this nice property, so ultimately they might be garbage.
    - SilasBarta Jan 10, 2010, 8:25 PM
      0 points
      Parent
      Indeed. Most of the FAI’s job could consist of saying, “Okay, there’s soooooo much I have to disentangle and correct before I can even begin to propose solutions. Sit down and let’s talk.”