I’m an advocate of this approach in general for a number of reasons, and it’s typically how I explain the idea of FAI to people without seeming like a prophet of the end times. Most of the reasons I like value-learning focus on what happens before a super-intelligence or what happens if a super-intelligence never comes into being.
I am strongly of the opinion that real world testing and real world application of theoretical results often exposes totally unanticipated flaws, and it seems like for the value-learning problem that partial/incomplete solutions are still tremendously useful. This means that progress on the value-learning problem is likely to attract lots of attention and resources and that consequently proposed solutions will be more thoroughly tested in the real world.
Some of the potential advantages:
Resources: It seems like there’s a strong market incentive for understanding human preferences in the form of various recommendation engines. The ability to discern human values, even partially, translates well into any number of potentially useful applications. Symptoms of success in this type of research will almost certainly attract the investment of substantial additional resources to the problem, which is less obviously true for some of the other research directions.
Raising the sanity waterline: Machines aren’t seen as competitors for social status and it’s typically easier to stomach correction from a machine than from another person. The ability to share preferences with a machine and get feedback on the values those preferences relate to would potentially be an invaluable tool for introspection. It’s possible that this could result in people being more rational or even more moral.
Translation: Humans have never really tried to translate human values into a form that would be comprehensible to a non-human before. Value learning is a way to give humans practice discovering/explaining their values in precise ways. This, to my mind, is preferable to the alternative approach of relying on a non-human actor to successfully guess human morality. One of my human values is for humans to have a role in shaping the future, and I’d feel much more comfortable if we got to contribute in a meaningful way to the estimate of human values held by any future super-intelligence.
Relative Difficulty: The human values problem is hard, but discovering human values from data is probably much harder than just learning/representing human values. Learning quantum mechanics is hard, but the discovery of the laws of quantum mechanics was much much more difficult. If we can get human values problem small enough to make it into a seed AI, the chances of AI friendliness increase dramatically.
I haven’t taken the time here to consider in detail how the approaches outlined in your post interact with some of these advantages, but I may try and revisit them when I have the opportunity.
I’m an advocate of this approach in general for a number of reasons, and it’s typically how I explain the idea of FAI to people without seeming like a prophet of the end times. Most of the reasons I like value-learning focus on what happens before a super-intelligence or what happens if a super-intelligence never comes into being.
I am strongly of the opinion that real world testing and real world application of theoretical results often exposes totally unanticipated flaws, and it seems like for the value-learning problem that partial/incomplete solutions are still tremendously useful. This means that progress on the value-learning problem is likely to attract lots of attention and resources and that consequently proposed solutions will be more thoroughly tested in the real world.
Some of the potential advantages:
Resources: It seems like there’s a strong market incentive for understanding human preferences in the form of various recommendation engines. The ability to discern human values, even partially, translates well into any number of potentially useful applications. Symptoms of success in this type of research will almost certainly attract the investment of substantial additional resources to the problem, which is less obviously true for some of the other research directions.
Raising the sanity waterline: Machines aren’t seen as competitors for social status and it’s typically easier to stomach correction from a machine than from another person. The ability to share preferences with a machine and get feedback on the values those preferences relate to would potentially be an invaluable tool for introspection. It’s possible that this could result in people being more rational or even more moral.
Translation: Humans have never really tried to translate human values into a form that would be comprehensible to a non-human before. Value learning is a way to give humans practice discovering/explaining their values in precise ways. This, to my mind, is preferable to the alternative approach of relying on a non-human actor to successfully guess human morality. One of my human values is for humans to have a role in shaping the future, and I’d feel much more comfortable if we got to contribute in a meaningful way to the estimate of human values held by any future super-intelligence.
Relative Difficulty: The human values problem is hard, but discovering human values from data is probably much harder than just learning/representing human values. Learning quantum mechanics is hard, but the discovery of the laws of quantum mechanics was much much more difficult. If we can get human values problem small enough to make it into a seed AI, the chances of AI friendliness increase dramatically.
I haven’t taken the time here to consider in detail how the approaches outlined in your post interact with some of these advantages, but I may try and revisit them when I have the opportunity.