To clarify, the claim is not “all agents should prefer wireheading” or “humans should have wireheading-compatible values”, but “if an agent has this set of values and this decision algorithm, then it should wirehead”, with humans being such an agent. The wireheading argument does not propose that humans change their values, but that wireheading actually is a good fulfillment of their existent values (despite seeming objections). That’s as much a factual claim as evolution.
The reason I don’t easily expect rational disagreement is that I expect a) all humans to have the same decision algorithm and b) terminal values are simple and essentially hard-coded.
b) might be false, but then I don’t see a realistic mechanism how they got there in the first place. What’s the evolutionary advantage of an agent that has highly volatile terminal values and can easily be hijacked, or relies on fairly advanced circuitry to even do value calculations?
What’s the evolutionary advantage of an agent that has highly volatile terminal values and can easily be hijacked,
Humans seem to act as general meme hosts. It seems fairly easy for a human to be hijacked by a meme in a way that decreases their genetic inclusive fitness. Presumably this kind of design at least had an evolutionary advantage, in our EEA, or we wouldn’t be this way.
or relies on fairly advanced circuitry to even do value calculations?
If you can host arbitrary memes, then “external referent consequentialism” doesn’t really need any extra circuitry. You just have to be convinced that it’s something you ought to do.
To clarify, the claim is not “all agents should prefer wireheading” or “humans should have wireheading-compatible values”, but “if an agent has this set of values and this decision algorithm, then it should wirehead”, with humans being such an agent. The wireheading argument does not propose that humans change their values, but that wireheading actually is a good fulfillment of their existent values (despite seeming objections). That’s as much a factual claim as evolution.
The reason I don’t easily expect rational disagreement is that I expect a) all humans to have the same decision algorithm and b) terminal values are simple and essentially hard-coded.
b) might be false, but then I don’t see a realistic mechanism how they got there in the first place. What’s the evolutionary advantage of an agent that has highly volatile terminal values and can easily be hijacked, or relies on fairly advanced circuitry to even do value calculations?
Humans seem to act as general meme hosts. It seems fairly easy for a human to be hijacked by a meme in a way that decreases their genetic inclusive fitness. Presumably this kind of design at least had an evolutionary advantage, in our EEA, or we wouldn’t be this way.
If you can host arbitrary memes, then “external referent consequentialism” doesn’t really need any extra circuitry. You just have to be convinced that it’s something you ought to do.