Surely, “retargetting” their values is a deeply irrational act for almost any agent to perform—at least if we are talking about instrumental rationality. The reason being that your original goals are typically blatted by the retargetting—so rational agents should normally seek to avoid such an event happening to themselves—and should certainly not initiate it. Omohundro discusses the issue here:
We then show that self-improving systems will be driven to clarify their goals and represent them as economic utility functions. They will also strive for their actions to approximate rational economic behavior. This will lead almost all systems to protect their utility functions from modification and their utility measurement systems from corruption.
As for retargeting in general, the argument against it has always reminded me of the advice, “Never admit a mistake. It doesn’t really count as a mistake until you admit it.”
As for Omohundro’s paper, my reaction was negative from the first reading. His reasoning was so unconvincing that I found myself losing confidence in my judgements regarding things for which I had started out in agreement with him.
What would it mean for values to be mistaken, though? Who would be the judge of that?
The person who used to claim that he held a certain set of (not reflectively consistent) values, but who now understands that those values, which he used to hold, were a mistake.
I understand that there are ways of programming an AI so that its values will never change. But that does not mean that an AI must be programmed in that way, or even that it should be programmed in that way. And it definitely does not mean that rational humans cannot change their minds on their ultimate values.
I note that your example appears to generalise poorly. Yes, values can have bugs in them that need working out—but the idea that values are likely to be preserved by rational agents kicks in most seriously after that has happened.
Also, we had best be careful about making later agents the judges of their earlier selves. For each enlightenment, I expect we can find a corresponding conversion to a satanic cult.
FWIW, whether there are ways of making a powerful machine so that its values will never change is a still point of debate. Nobody has ever really proved that you can make a powerful self-improving system value anything other than its own measure of utility in the long term. Omohundro and Yudkowsky make hand-waving arguments about this—but they are not very convincing, IMHO. It would be delightful if we could demonstrate something useful about this question—but so far, nobody has.
Yes, values can have bugs in them that need working out—but the idea that values are likely to be preserved by rational agents kicks in most seriously after that has happened.
Please let me know when it happens.
To my mind, coming up with a set of terminal values which are reflectively consistent and satisfactory in every other way is at least as difficult and controversy-laden as coming up with a satisfactory axiomatization of set theory.
What do you think of the Axiom of Determinacy? I fully expect that my human values will be different from my trans-human values. 1 Corinthians 13:11
Surely, “retargetting” their values is a deeply irrational act for almost any agent to perform—at least if we are talking about instrumental rationality. The reason being that your original goals are typically blatted by the retargetting—so rational agents should normally seek to avoid such an event happening to themselves—and should certainly not initiate it. Omohundro discusses the issue here:
http://selfawaresystems.com/2007/11/30/paper-on-the-basic-ai-drives/
As for retargeting in general, the argument against it has always reminded me of the advice, “Never admit a mistake. It doesn’t really count as a mistake until you admit it.”
As for Omohundro’s paper, my reaction was negative from the first reading. His reasoning was so unconvincing that I found myself losing confidence in my judgements regarding things for which I had started out in agreement with him.
What would it mean for values to be mistaken, though? Who would be the judge of that?
Normally, values are not right or wrong. Rather, “right” and “wrong” are value judgements.
The person who used to claim that he held a certain set of (not reflectively consistent) values, but who now understands that those values, which he used to hold, were a mistake.
I understand that there are ways of programming an AI so that its values will never change. But that does not mean that an AI must be programmed in that way, or even that it should be programmed in that way. And it definitely does not mean that rational humans cannot change their minds on their ultimate values.
I note that your example appears to generalise poorly. Yes, values can have bugs in them that need working out—but the idea that values are likely to be preserved by rational agents kicks in most seriously after that has happened.
Also, we had best be careful about making later agents the judges of their earlier selves. For each enlightenment, I expect we can find a corresponding conversion to a satanic cult.
FWIW, whether there are ways of making a powerful machine so that its values will never change is a still point of debate. Nobody has ever really proved that you can make a powerful self-improving system value anything other than its own measure of utility in the long term. Omohundro and Yudkowsky make hand-waving arguments about this—but they are not very convincing, IMHO. It would be delightful if we could demonstrate something useful about this question—but so far, nobody has.
Please let me know when it happens.
To my mind, coming up with a set of terminal values which are reflectively consistent and satisfactory in every other way is at least as difficult and controversy-laden as coming up with a satisfactory axiomatization of set theory.
What do you think of the Axiom of Determinacy? I fully expect that my human values will be different from my trans-human values. 1 Corinthians 13:11
It sounds like a poorly-specified problem—so perhaps don’t expect to solve that one.
As you may recall, I think that nature has its own maximand—namely entropy—and that the values of living things are just a manifestation of that.