This seems to assume that change in human values over time is mostly “progress” rather than drift. Do we have any evidence for that, except saying that our modern values are “good” according to themselves, so whatever historical process led to them must have been “progress”?
Changes in human values seem to have generally involved expanding the subset of people with moral worth, especially post-enlightenment. This suggests to me that value change isn’t random drift, but it’s only weak evidence that the changes reflect some inevitable fact of human nature.
Suppose, just for the sake of specificity, that it turns out that the underlying mechanism works like this:
there’s an impulse (I1) to apply all controllable resources to my own gratification
there’s an impulse (I2) to extend my own self-gratifying impulses to others
I1 is satiable… the more resources are controllable, the weaker it fires
I2 is more readily applied to a given other if that other is similar to me
The degree to which I consider something as having “moral worth” depends on my willingness to extend my own self-gratifying impulses to it.
(I’m not claiming that humans actually have a network like this, I just find it’s easier to think about this stuff with a concrete example.)
Given that network, we’d expect humans to “expand the subset of people with moral worth” as available resources increase. That would demonstrably not be random drift: it would be predictably correlated with available resources, and we could manipulate people’s intuitions about moral worth by manipulating their perceptions of available resources. And it would demonstrably reflect a fact about human nature… increasingly more refined neuroanatomical analyses would identify the neural substrates that implement that network and observe them firing in various situation.
(“Inevitable”? No fact about human nature is inevitable; a properly-placed lesion could presumably disrupt such a network. I assume what’s meant here is that it isn’t contingent on early environment, or some such thing.)
But it’s not clear to me what demonstrating those things buys us.
It certainly doesn’t seem clear to me that I should therefore endorse or repudiate anything in particular, or that I should prefer on this basis that a superintelligence optimize for anything in particular.
OTOH, a great deal of the discussion on LW on this topic seems to suggest, and often seems to take for granted, that I should prefer that a superintelligence optimize for some value V if and only if it turns out that human brains instantiate V. Which I’m not convinced of.
After a month or so of idly considering the question I haven’t yet decided whether I’m misunderstanding, or disagreeing with, the local consensus.
There have been other changes as well, which don’t fit this generalization. For instance, we now treat the people who do have moral worth much better, in many ways.
Also, there have historically been major regressions along the “percentage of society having moral worth” scale. E.g., Roman Republican society gave women, and all Roman citizens, more rights than the post-Roman Christian world that followed.
Finally, “not random drift” isn’t the same as “moving towards a global singular goal”. A map with fractal attractors isn’t random, either.
Are you sure this isn’t the Texas sharpshooter fallacy?
That is to say, values are complicated enough that if they drifted in a random direction, there would exist a simple-sounding way to describe the direction of drift (neglecting, of course, all the other possible axes of change)- and of course this abstraction would sound like an appealing general principle to those with the current endpoint values.
Changes in human values seem to have generally involved expanding the subset of people with moral worth, especially post-enlightenment. This suggests to me that value change isn’t random drift, but it’s only weak evidence that the changes reflect some inevitable fact of human nature.
Suppose, just for the sake of specificity, that it turns out that the underlying mechanism works like this:
there’s an impulse (I1) to apply all controllable resources to my own gratification
there’s an impulse (I2) to extend my own self-gratifying impulses to others
I1 is satiable… the more resources are controllable, the weaker it fires
I2 is more readily applied to a given other if that other is similar to me
The degree to which I consider something as having “moral worth” depends on my willingness to extend my own self-gratifying impulses to it.
(I’m not claiming that humans actually have a network like this, I just find it’s easier to think about this stuff with a concrete example.)
Given that network, we’d expect humans to “expand the subset of people with moral worth” as available resources increase. That would demonstrably not be random drift: it would be predictably correlated with available resources, and we could manipulate people’s intuitions about moral worth by manipulating their perceptions of available resources. And it would demonstrably reflect a fact about human nature… increasingly more refined neuroanatomical analyses would identify the neural substrates that implement that network and observe them firing in various situation.
(“Inevitable”? No fact about human nature is inevitable; a properly-placed lesion could presumably disrupt such a network. I assume what’s meant here is that it isn’t contingent on early environment, or some such thing.)
But it’s not clear to me what demonstrating those things buys us.
It certainly doesn’t seem clear to me that I should therefore endorse or repudiate anything in particular, or that I should prefer on this basis that a superintelligence optimize for anything in particular.
OTOH, a great deal of the discussion on LW on this topic seems to suggest, and often seems to take for granted, that I should prefer that a superintelligence optimize for some value V if and only if it turns out that human brains instantiate V. Which I’m not convinced of.
After a month or so of idly considering the question I haven’t yet decided whether I’m misunderstanding, or disagreeing with, the local consensus.
There have been other changes as well, which don’t fit this generalization. For instance, we now treat the people who do have moral worth much better, in many ways.
Also, there have historically been major regressions along the “percentage of society having moral worth” scale. E.g., Roman Republican society gave women, and all Roman citizens, more rights than the post-Roman Christian world that followed.
Finally, “not random drift” isn’t the same as “moving towards a global singular goal”. A map with fractal attractors isn’t random, either.
Agreed on all points.
Are you sure this isn’t the Texas sharpshooter fallacy?
That is to say, values are complicated enough that if they drifted in a random direction, there would exist a simple-sounding way to describe the direction of drift (neglecting, of course, all the other possible axes of change)- and of course this abstraction would sound like an appealing general principle to those with the current endpoint values.