Thank you for posting this Geoffrey. I myself have recently been considering posting the question, “Aligned with which values exactly?”
TL;DR—Could an AI be trained to deduce a default set and system of human values by reviewing all human constitutions, laws, policies and regulations in the manner of AlphaGo?
I come at this from a very different angle than you do. I am not an academic but rather am retired after a thirty year career in IT systems management at the national and provincial (Canada) levels.
Aside from my career my lifelong personal interest has been, well let’s call it “Human Nature”. So long before I had any interest in AI I was reading about anthropology, archeology, philosophy, psychology, history and so on but during the last decade mostly focused on human values. Schwartz and all that. In a very unacademic way, I came to the conclusion that human values seem to explain everything with regards to what individual people feel, think, say and do and the same goes for groups.
Now that I’m retired I write hard science fiction novellas and short stories about social robots. I don’t write hoping for publication but rather to explore issues of human nature both social (e.g. justice) and personal (e.g. purpose). Writing about how and why social robots might function, and with the theory of convergent evolution in mind, I came to the conclusion that social robots would have to have an operating system based on values.
From my reading up to this point I had the gained impression that the study of human values was largely considered a pseudoscience (my apologies if you feel otherwise). Given my view of the foundational importance of values I found this attitude and the accompanying lack of hard scientific research into values frustrating.
However as I did the research into artificial intelligence that was necessary to write my stories I realized that my sense of the importance of values was about to be vindicated. The opening paragraph of one of my chapters is as follows…
During the great expansionist period of the Republic, it was not the fashion to pursue an interest in philosophy. There was much practical work to be done. Science, administration, law and engineering were well regarded careers. The questions of philosophy popular with young people were understandable and tolerated but where expected to be put aside upon entering adulthood.
All that changed with the advent of artificial intelligence.
As I continued to explore the issues of an AI values based operating system the enormity of the problem became clear and is expressed as follows in another chapter…
Until the advent of artificial intelligence the study of human values had not been taken seriously. Values had been spoken of for millennia however scientifically no one actually knew what they were, whether they had any physical basis or how they worked as a system. Yet it seemed that humans based most if not all of their decisions on values and a great deal of the brain’s development between the ages of five and twenty five had to do with values. When AI researchers began to investigate the process by which humans made decisions based on values they found some values seemed to be genetically based but they could not determine in what way, some were learned yet could be inherited and the entire genetic, epigenetic and extra-genetic system of values interacted in a manner that was a complete mystery.
They slowly realized they faced one of the greatest challenges in scientific history.
I’ve come to the conclusion that values are too complex a system to be understood by our current sciences. I believe in this regard that we are about where the ancient Greeks were regarding the structure of matter or where genetics was around the time of Gregor Mendel.
Expert systems or even our most advanced mathematics are not going to be enough nor even suitable approaches towards solving the problem. Something new will be required. I reviewed Stuart Russell’s approach which I interpret as “learning by example” and felt it glossed over some significant issues, for example children learn many things from their parents, not all of them good.
So in answer to your question, “AI alignment with humans… but with which humans?” might I suggest another approach? Could an AI be trained to deduce a default set and system of human values by reviewing all human constitutions, laws, policies and regulations in the manner of AlphaGo? In every culture and region, constitutions, law, policies and regulations represent our best attempts to formalize and institutionalize human values based on our ideas of ethics and justice.
I do appreciate the issue of values conflict that you raise. The Nazis passed some laws. But that’s where the AI and the system it develops comes in. Perhaps we don’t currently have an AI that is up to the task but it appears we are getting there.
This approach it seems would solve three problems; 1) the problem of “which humans” (because it includes source material from all cultures etc.), 2) the problem of “which values” for the same reason and 3) your examples of the contextual problem of “which values apply in which situations” with the approach of “When in Rome, do as the Romans do”.
I agree that the behavioral sciences, social sciences, and humanities need more serious (quantitative) research on values; there is some in fields such as political psychology, social psychology, cultural anthropology, comparative religion, etc—but often such research is a bit pseudo-scientific and judgmental, biased by the personal/political views of the researchers.
However, all these fields seem to agree that there are often much deeper and more pervasive differences in values across people and groups that we typically realize, given our cultural bubbles, assortative socializing, and tendency to stick within our tribe.
On the other hand, empirical research (eg. in the evolutionary psychology of crime) suggests that in some domain, humans have a fairly strong consensus about certain values, e.g. most people in most cultures agree that murder is worse than assault, and assault is worse than theft, and theft is worse than voluntary trade.
It’s an intriguing possibility that AIs might be able to ‘read off’ some general consensus values from the kinds of constitutions, laws, policies, and regulations that have been developed in complex societies over centuries of political debate and discussion. As a traditionalist who tends to respect most things that are ‘Lindy’, that have proven their value across many generations, this has some personal appeal to me. However, many AI researchers are under 40, rather anti-traditionalist, and unlikely to see historical traditions as good guides to current consensus values among humans. So I don’t know how much buy-in such a proposal would get—although I think it’s worth pursuing!
Put another way, any attempt to find consensus human values that have not already been explicitly incorporated into human political, cultural, economic, and family traditions should probably be treated with great suspicion—and may reflect some deep misalignment with most of humanity’s values.
When In Rome
Thank you for posting this Geoffrey. I myself have recently been considering posting the question, “Aligned with which values exactly?”
TL;DR—Could an AI be trained to deduce a default set and system of human values by reviewing all human constitutions, laws, policies and regulations in the manner of AlphaGo?
I come at this from a very different angle than you do. I am not an academic but rather am retired after a thirty year career in IT systems management at the national and provincial (Canada) levels.
Aside from my career my lifelong personal interest has been, well let’s call it “Human Nature”. So long before I had any interest in AI I was reading about anthropology, archeology, philosophy, psychology, history and so on but during the last decade mostly focused on human values. Schwartz and all that. In a very unacademic way, I came to the conclusion that human values seem to explain everything with regards to what individual people feel, think, say and do and the same goes for groups.
Now that I’m retired I write hard science fiction novellas and short stories about social robots. I don’t write hoping for publication but rather to explore issues of human nature both social (e.g. justice) and personal (e.g. purpose). Writing about how and why social robots might function, and with the theory of convergent evolution in mind, I came to the conclusion that social robots would have to have an operating system based on values.
From my reading up to this point I had the gained impression that the study of human values was largely considered a pseudoscience (my apologies if you feel otherwise). Given my view of the foundational importance of values I found this attitude and the accompanying lack of hard scientific research into values frustrating.
However as I did the research into artificial intelligence that was necessary to write my stories I realized that my sense of the importance of values was about to be vindicated. The opening paragraph of one of my chapters is as follows…
During the great expansionist period of the Republic, it was not the fashion to pursue an interest in philosophy. There was much practical work to be done. Science, administration, law and engineering were well regarded careers. The questions of philosophy popular with young people were understandable and tolerated but where expected to be put aside upon entering adulthood.
All that changed with the advent of artificial intelligence.
As I continued to explore the issues of an AI values based operating system the enormity of the problem became clear and is expressed as follows in another chapter…
Until the advent of artificial intelligence the study of human values had not been taken seriously. Values had been spoken of for millennia however scientifically no one actually knew what they were, whether they had any physical basis or how they worked as a system. Yet it seemed that humans based most if not all of their decisions on values and a great deal of the brain’s development between the ages of five and twenty five had to do with values. When AI researchers began to investigate the process by which humans made decisions based on values they found some values seemed to be genetically based but they could not determine in what way, some were learned yet could be inherited and the entire genetic, epigenetic and extra-genetic system of values interacted in a manner that was a complete mystery.
They slowly realized they faced one of the greatest challenges in scientific history.
I’ve come to the conclusion that values are too complex a system to be understood by our current sciences. I believe in this regard that we are about where the ancient Greeks were regarding the structure of matter or where genetics was around the time of Gregor Mendel.
Expert systems or even our most advanced mathematics are not going to be enough nor even suitable approaches towards solving the problem. Something new will be required. I reviewed Stuart Russell’s approach which I interpret as “learning by example” and felt it glossed over some significant issues, for example children learn many things from their parents, not all of them good.
So in answer to your question, “AI alignment with humans… but with which humans?” might I suggest another approach? Could an AI be trained to deduce a default set and system of human values by reviewing all human constitutions, laws, policies and regulations in the manner of AlphaGo? In every culture and region, constitutions, law, policies and regulations represent our best attempts to formalize and institutionalize human values based on our ideas of ethics and justice.
I do appreciate the issue of values conflict that you raise. The Nazis passed some laws. But that’s where the AI and the system it develops comes in. Perhaps we don’t currently have an AI that is up to the task but it appears we are getting there.
This approach it seems would solve three problems; 1) the problem of “which humans” (because it includes source material from all cultures etc.), 2) the problem of “which values” for the same reason and 3) your examples of the contextual problem of “which values apply in which situations” with the approach of “When in Rome, do as the Romans do”.
Netcentrica—thanks for this thoughtful comment.
I agree that the behavioral sciences, social sciences, and humanities need more serious (quantitative) research on values; there is some in fields such as political psychology, social psychology, cultural anthropology, comparative religion, etc—but often such research is a bit pseudo-scientific and judgmental, biased by the personal/political views of the researchers.
However, all these fields seem to agree that there are often much deeper and more pervasive differences in values across people and groups that we typically realize, given our cultural bubbles, assortative socializing, and tendency to stick within our tribe.
On the other hand, empirical research (eg. in the evolutionary psychology of crime) suggests that in some domain, humans have a fairly strong consensus about certain values, e.g. most people in most cultures agree that murder is worse than assault, and assault is worse than theft, and theft is worse than voluntary trade.
It’s an intriguing possibility that AIs might be able to ‘read off’ some general consensus values from the kinds of constitutions, laws, policies, and regulations that have been developed in complex societies over centuries of political debate and discussion. As a traditionalist who tends to respect most things that are ‘Lindy’, that have proven their value across many generations, this has some personal appeal to me. However, many AI researchers are under 40, rather anti-traditionalist, and unlikely to see historical traditions as good guides to current consensus values among humans. So I don’t know how much buy-in such a proposal would get—although I think it’s worth pursuing!
Put another way, any attempt to find consensus human values that have not already been explicitly incorporated into human political, cultural, economic, and family traditions should probably be treated with great suspicion—and may reflect some deep misalignment with most of humanity’s values.