I note that your example appears to generalise poorly. Yes, values can have bugs in them that need working out—but the idea that values are likely to be preserved by rational agents kicks in most seriously after that has happened.
Also, we had best be careful about making later agents the judges of their earlier selves. For each enlightenment, I expect we can find a corresponding conversion to a satanic cult.
FWIW, whether there are ways of making a powerful machine so that its values will never change is a still point of debate. Nobody has ever really proved that you can make a powerful self-improving system value anything other than its own measure of utility in the long term. Omohundro and Yudkowsky make hand-waving arguments about this—but they are not very convincing, IMHO. It would be delightful if we could demonstrate something useful about this question—but so far, nobody has.
Yes, values can have bugs in them that need working out—but the idea that values are likely to be preserved by rational agents kicks in most seriously after that has happened.
Please let me know when it happens.
To my mind, coming up with a set of terminal values which are reflectively consistent and satisfactory in every other way is at least as difficult and controversy-laden as coming up with a satisfactory axiomatization of set theory.
What do you think of the Axiom of Determinacy? I fully expect that my human values will be different from my trans-human values. 1 Corinthians 13:11
I note that your example appears to generalise poorly. Yes, values can have bugs in them that need working out—but the idea that values are likely to be preserved by rational agents kicks in most seriously after that has happened.
Also, we had best be careful about making later agents the judges of their earlier selves. For each enlightenment, I expect we can find a corresponding conversion to a satanic cult.
FWIW, whether there are ways of making a powerful machine so that its values will never change is a still point of debate. Nobody has ever really proved that you can make a powerful self-improving system value anything other than its own measure of utility in the long term. Omohundro and Yudkowsky make hand-waving arguments about this—but they are not very convincing, IMHO. It would be delightful if we could demonstrate something useful about this question—but so far, nobody has.
Please let me know when it happens.
To my mind, coming up with a set of terminal values which are reflectively consistent and satisfactory in every other way is at least as difficult and controversy-laden as coming up with a satisfactory axiomatization of set theory.
What do you think of the Axiom of Determinacy? I fully expect that my human values will be different from my trans-human values. 1 Corinthians 13:11
It sounds like a poorly-specified problem—so perhaps don’t expect to solve that one.
As you may recall, I think that nature has its own maximand—namely entropy—and that the values of living things are just a manifestation of that.