We call an agent wireheaded if it systematically exploits some discrepancy between its true utility calculated w.r.t reality and its substitute utility calculated w.r.t. its model of reality.
I wonder if this definition would classify certain moral theories as “wireheading.” For instance, a consequentialist could argue that deontological ethics is a form of wireheading where people mistake certain useful rules of thumb (i.e. don’t kill, don’t lie) for generating good consequences for the very essence of morality, and try to maximize following those rules instead of maximizing good consequences. This sounds a lot like maximizing the discrepancy between true and substitute utility.
Certain types of simplified consequentialist rules may also be vulnerable to wireheading. For instance, utilitarian theories of ethics tend to model helping others and doing good as “scoring points” and have you “score more points” if you help those who are least well off. Certain thinkers (especially Robin Hanson) have argued that this means that the most efficient way to “score points” is to create tons and tons of impoverished people and then help them, rather than just trying to improve people’s lives. It seems to me that this line of thought is “cheating,” that it is exploiting a loophole in the way utilitarianism models doing good, rather than actually doing good. Does this mean that it is a form of wireheading?
I wonder if this definition would classify certain moral theories as “wireheading.” For instance, a consequentialist could argue that deontological ethics is a form of wireheading where people mistake certain useful rules of thumb (i.e. don’t kill, don’t lie) for generating good consequences for the very essence of morality, and try to maximize following those rules instead of maximizing good consequences. This sounds a lot like maximizing the discrepancy between true and substitute utility.
Certain types of simplified consequentialist rules may also be vulnerable to wireheading. For instance, utilitarian theories of ethics tend to model helping others and doing good as “scoring points” and have you “score more points” if you help those who are least well off. Certain thinkers (especially Robin Hanson) have argued that this means that the most efficient way to “score points” is to create tons and tons of impoverished people and then help them, rather than just trying to improve people’s lives. It seems to me that this line of thought is “cheating,” that it is exploiting a loophole in the way utilitarianism models doing good, rather than actually doing good. Does this mean that it is a form of wireheading?