WrongBot comments on Two questions about CEV that worry me

WrongBot 23 Dec 2010 18:01 UTC
1 point

This seems to assume that change in human values over time is mostly “progress” rather than drift. Do we have any evidence for that, except saying that our modern values are “good” according to themselves, so whatever historical process led to them must have been “progress”?

Changes in human values seem to have generally involved expanding the subset of people with moral worth, especially post-enlightenment. This suggests to me that value change isn’t random drift, but it’s only weak evidence that the changes reflect some inevitable fact of human nature.
- TheOtherDave 23 Dec 2010 20:54 UTC
  7 points
  Parent
  Suppose, just for the sake of specificity, that it turns out that the underlying mechanism works like this:
  - there’s an impulse (I1) to apply all controllable resources to my own gratification
  - there’s an impulse (I2) to extend my own self-gratifying impulses to others
  - I1 is satiable… the more resources are controllable, the weaker it fires
  - I2 is more readily applied to a given other if that other is similar to me
  - The degree to which I consider something as having “moral worth” depends on my willingness to extend my own self-gratifying impulses to it.
  (I’m not claiming that humans actually have a network like this, I just find it’s easier to think about this stuff with a concrete example.)
  
  Given that network, we’d expect humans to “expand the subset of people with moral worth” as available resources increase. That would demonstrably not be random drift: it would be predictably correlated with available resources, and we could manipulate people’s intuitions about moral worth by manipulating their perceptions of available resources. And it would demonstrably reflect a fact about human nature… increasingly more refined neuroanatomical analyses would identify the neural substrates that implement that network and observe them firing in various situation.
  
  (“Inevitable”? No fact about human nature is inevitable; a properly-placed lesion could presumably disrupt such a network. I assume what’s meant here is that it isn’t contingent on early environment, or some such thing.)
  
  But it’s not clear to me what demonstrating those things buys us.
  
  It certainly doesn’t seem clear to me that I should therefore endorse or repudiate anything in particular, or that I should prefer on this basis that a superintelligence optimize for anything in particular.
  
  OTOH, a great deal of the discussion on LW on this topic seems to suggest, and often seems to take for granted, that I should prefer that a superintelligence optimize for some value V if and only if it turns out that human brains instantiate V. Which I’m not convinced of.
  
  After a month or so of idly considering the question I haven’t yet decided whether I’m misunderstanding, or disagreeing with, the local consensus.
- DanArmak 23 Dec 2010 19:10 UTC
  5 points
  Parent
  There have been other changes as well, which don’t fit this generalization. For instance, we now treat the people who do have moral worth much better, in many ways.
  
  Also, there have historically been major regressions along the “percentage of society having moral worth” scale. E.g., Roman Republican society gave women, and all Roman citizens, more rights than the post-Roman Christian world that followed.
  
  Finally, “not random drift” isn’t the same as “moving towards a global singular goal”. A map with fractal attractors isn’t random, either.
  - WrongBot 23 Dec 2010 19:28 UTC
    2 points
    Parent
    Agreed on all points.
- orthonormal 24 Dec 2010 0:13 UTC
  3 points
  Parent
  Are you sure this isn’t the Texas sharpshooter fallacy?
  
  That is to say, values are complicated enough that if they drifted in a random direction, there would exist a simple-sounding way to describe the direction of drift (neglecting, of course, all the other possible axes of change)- and of course this abstraction would sound like an appealing general principle to those with the current endpoint values.