you seemed to be claiming, not that being able to usefully interpret natural-language commands entails understanding a great deal about the world, but that it entails resenting oppression. … I’m interested in your reasons for believing it.
I’m not sure I can do the topic justice without writing a full article, but here’s my thinking: to sufficiently locate the “hypothesis” of what the Elf should be doing (given a human command), it requires some idea of human motives. Once it internally represents these human motives well enough to perform like a human servant, and sorts its own preferences the same way, then it has become a human in every way except that it distinguishes between humans and Elves, the former of which should be favored.
But once the Elf has absorbed this human-type generating function, then anything that motivates humans can motivate the Elf—including sympathizing with servant groups (literally: “believing that it would be good to help the servants gain rank relative to their masters”), and recognizing themselves, or at least other Elves, as being such a servant group.
You can patch these problems, of course, but each patch either makes the Elf more human (and thus wrong to treat as a servant class) or less effective at serving humans. For example, you could introduce a sort of blind spot that makes it (“mechanically”) output “that’s okay” whenever it observes treatment that it would regard as bad if done to a human. But if this is all that distinguishes Elves from humans, then the Elves start to bear too much cognitive similarity to humans who have undergone psychological abuse.
Well, right, but I’m left with the same question. I mean, yes, I agree that “once it internally represents these human motives well enough to perform like a human servant, and sorts its own preferences the same way,” then everything else you say follows, at least to a rough approximation.
But why need it sort its own preferences the same way humans do?
What seems to underlie this argument is an idea that no cognitive system can understand a human’s values well enough to predict its preferences without sharing those values… that I can’t understand what you want well enough to serve you unless I want the same things.
If that’s true, it’s news to me, so I’m interested in the arguments for it.
For example, it certainly seems possible to model other things in the world without myself becoming those things: I can develop a working model of what pleases and upsets my dog, and what she likely wants me to do when she behaves in certain ways, without myself being pleased or upset or wanting those things. Do you claim that’s an illusion?
But why need it sort its own preferences the same way humans do?
That is what I (thought I) was explaining in the following paragraphs. Once it a) knows what humans want, and b) desires acting in a way that matches that preference ranking, it must carve out a portion of the world’s ontology that excludes itself from being a recipient of that service.
It’s not that the Elf would necessarily want to be served like it serves others (although that is a failure mode too); it’s that the Elf would resemble a human well enough at that point that we would have to conclude that it’s wrong to treat it as a servant. The fact that it was made to enjoy it is no longer a defense, for the same reason it’s not a defense to say, “but I’ve already psychologically abused him/her enough that he/she enjoys this abuse!”
What seems to underlie this argument is an idea that no cognitive system can understand a human’s values well enough to predict its preferences without sharing those values
That’s not my premise. My premise is (simplifying a bit) that it’s the decision mechanism of a being that primarily determines its moral worth. From this it follows that beings adhering to decision mechanisms of similar enough depth and with similar enough values to humans ought to be regarded as human.
For that reason, I see a tradeoff between effectiveness at replicating humans vs. moral worth. You can make a perfect human replica, but at the cost of obligating yourself to treat it as having the rights of a human. See EY’s discussion of these issues in Nonperson predicates and Can’t Unbirth a Child.
An alien race could indeed model humans well enough to predict us—but at that point they would have to be regarded as being of similar moral worth to us (modulo any dissonance between our values).
I’m not sure I can do the topic justice without writing a full article, but here’s my thinking: to sufficiently locate the “hypothesis” of what the Elf should be doing (given a human command), it requires some idea of human motives. Once it internally represents these human motives well enough to perform like a human servant, and sorts its own preferences the same way, then it has become a human in every way except that it distinguishes between humans and Elves, the former of which should be favored.
But once the Elf has absorbed this human-type generating function, then anything that motivates humans can motivate the Elf—including sympathizing with servant groups (literally: “believing that it would be good to help the servants gain rank relative to their masters”), and recognizing themselves, or at least other Elves, as being such a servant group.
You can patch these problems, of course, but each patch either makes the Elf more human (and thus wrong to treat as a servant class) or less effective at serving humans. For example, you could introduce a sort of blind spot that makes it (“mechanically”) output “that’s okay” whenever it observes treatment that it would regard as bad if done to a human. But if this is all that distinguishes Elves from humans, then the Elves start to bear too much cognitive similarity to humans who have undergone psychological abuse.
Well, right, but I’m left with the same question. I mean, yes, I agree that “once it internally represents these human motives well enough to perform like a human servant, and sorts its own preferences the same way,” then everything else you say follows, at least to a rough approximation.
But why need it sort its own preferences the same way humans do?
What seems to underlie this argument is an idea that no cognitive system can understand a human’s values well enough to predict its preferences without sharing those values… that I can’t understand what you want well enough to serve you unless I want the same things.
If that’s true, it’s news to me, so I’m interested in the arguments for it.
For example, it certainly seems possible to model other things in the world without myself becoming those things: I can develop a working model of what pleases and upsets my dog, and what she likely wants me to do when she behaves in certain ways, without myself being pleased or upset or wanting those things. Do you claim that’s an illusion?
That is what I (thought I) was explaining in the following paragraphs. Once it a) knows what humans want, and b) desires acting in a way that matches that preference ranking, it must carve out a portion of the world’s ontology that excludes itself from being a recipient of that service.
It’s not that the Elf would necessarily want to be served like it serves others (although that is a failure mode too); it’s that the Elf would resemble a human well enough at that point that we would have to conclude that it’s wrong to treat it as a servant. The fact that it was made to enjoy it is no longer a defense, for the same reason it’s not a defense to say, “but I’ve already psychologically abused him/her enough that he/she enjoys this abuse!”
That’s not my premise. My premise is (simplifying a bit) that it’s the decision mechanism of a being that primarily determines its moral worth. From this it follows that beings adhering to decision mechanisms of similar enough depth and with similar enough values to humans ought to be regarded as human.
For that reason, I see a tradeoff between effectiveness at replicating humans vs. moral worth. You can make a perfect human replica, but at the cost of obligating yourself to treat it as having the rights of a human. See EY’s discussion of these issues in Nonperson predicates and Can’t Unbirth a Child.
An alien race could indeed model humans well enough to predict us—but at that point they would have to be regarded as being of similar moral worth to us (modulo any dissonance between our values).
OK, I think I understand you now. Thanks for clarifying.