Suppose, just for the sake of specificity, that it turns out that the underlying mechanism works like this:
there’s an impulse (I1) to apply all controllable resources to my own gratification
there’s an impulse (I2) to extend my own self-gratifying impulses to others
I1 is satiable… the more resources are controllable, the weaker it fires
I2 is more readily applied to a given other if that other is similar to me
The degree to which I consider something as having “moral worth” depends on my willingness to extend my own self-gratifying impulses to it.
(I’m not claiming that humans actually have a network like this, I just find it’s easier to think about this stuff with a concrete example.)
Given that network, we’d expect humans to “expand the subset of people with moral worth” as available resources increase. That would demonstrably not be random drift: it would be predictably correlated with available resources, and we could manipulate people’s intuitions about moral worth by manipulating their perceptions of available resources. And it would demonstrably reflect a fact about human nature… increasingly more refined neuroanatomical analyses would identify the neural substrates that implement that network and observe them firing in various situation.
(“Inevitable”? No fact about human nature is inevitable; a properly-placed lesion could presumably disrupt such a network. I assume what’s meant here is that it isn’t contingent on early environment, or some such thing.)
But it’s not clear to me what demonstrating those things buys us.
It certainly doesn’t seem clear to me that I should therefore endorse or repudiate anything in particular, or that I should prefer on this basis that a superintelligence optimize for anything in particular.
OTOH, a great deal of the discussion on LW on this topic seems to suggest, and often seems to take for granted, that I should prefer that a superintelligence optimize for some value V if and only if it turns out that human brains instantiate V. Which I’m not convinced of.
After a month or so of idly considering the question I haven’t yet decided whether I’m misunderstanding, or disagreeing with, the local consensus.
Suppose, just for the sake of specificity, that it turns out that the underlying mechanism works like this:
there’s an impulse (I1) to apply all controllable resources to my own gratification
there’s an impulse (I2) to extend my own self-gratifying impulses to others
I1 is satiable… the more resources are controllable, the weaker it fires
I2 is more readily applied to a given other if that other is similar to me
The degree to which I consider something as having “moral worth” depends on my willingness to extend my own self-gratifying impulses to it.
(I’m not claiming that humans actually have a network like this, I just find it’s easier to think about this stuff with a concrete example.)
Given that network, we’d expect humans to “expand the subset of people with moral worth” as available resources increase. That would demonstrably not be random drift: it would be predictably correlated with available resources, and we could manipulate people’s intuitions about moral worth by manipulating their perceptions of available resources. And it would demonstrably reflect a fact about human nature… increasingly more refined neuroanatomical analyses would identify the neural substrates that implement that network and observe them firing in various situation.
(“Inevitable”? No fact about human nature is inevitable; a properly-placed lesion could presumably disrupt such a network. I assume what’s meant here is that it isn’t contingent on early environment, or some such thing.)
But it’s not clear to me what demonstrating those things buys us.
It certainly doesn’t seem clear to me that I should therefore endorse or repudiate anything in particular, or that I should prefer on this basis that a superintelligence optimize for anything in particular.
OTOH, a great deal of the discussion on LW on this topic seems to suggest, and often seems to take for granted, that I should prefer that a superintelligence optimize for some value V if and only if it turns out that human brains instantiate V. Which I’m not convinced of.
After a month or so of idly considering the question I haven’t yet decided whether I’m misunderstanding, or disagreeing with, the local consensus.