law provides a relatively nuanced picture of the values we should give to AI. A simpler answer to the question of “what should the AI’s values be?” would be “aligned with the person who’s using it”, known as intent alignment. Intent alignment is an important problem on its own, but does not entirely solve the problem. Law is particularly better than ideas like Coherent Extrapolated Volition, which attempt to reinvent morality in order to define the goals of an AI.
The law-informed AI framework sees intent alignment as (1.) something that private law methods can help with, and (2.) something that does not solve, and in some ways probably exacerbates (if we do not also tackle externalities concurrently), societal-AI alignment.
One way of describing the deployment of an AI system is that some human principal, P, employs an AI to accomplish a goal, G, specified by P. If we view G as a “contract,” methods for creating and implementing legal contracts – which govern billions of relationships every day – can inform how we align AI with P. Contracts memorialize a shared understanding between parties regarding value-action-state tuples. It is not possible to create a complete contingent contract between AI and P because AI’s training process is never comprehensive of every action-state pair (that P may have a value judgment on) that AI will see in the wild once deployed. Although it is also practically impossible to create complete contracts between humans, contracts still serve as incredibly useful customizable commitment devices to clarify and advance shared goals. (Dylan Hadfield-Menell & Gillian Hadfield, Incomplete Contracting and AI Alignment).
We believe this works mainly because the law has developed mechanisms to facilitate commitment and sustained alignment amongst ambiguity. Gaps within contracts – action-state pairs without a value – are often filled by the invocation of frequently employed standards (e.g., “material” and “reasonable”). These standards could be used as modular (pre-trained model) building blocks across AI systems. Rather than viewing contracts from the perspective of a traditional participant, e.g., a counterparty or judge, AI could view contracts (and their creation, implementation, evolution, and enforcement) as (model inductive biases and data) guides to navigating webs of inter-agent obligations.
If (1.) works to increase the intent alignment of one AI system to one human (or a small group of humans), we will have a more useful and locally reliable system. But this likely decreases the expected global reliability and safety of the system as it interacts with the broader world, e.g., by increasing the risk of the system maximizing the welfare of a small group of powerful people. There are many more objectives (outside of individual or group goals) and many more humans that should be considered. As AI advances, we need to simultaneously address the human/intent-alignment and society AI alignment problems. Some humans would “contract” with an AI (e.g., by providing instructions to the AI or from the AI learning the humans’ preferences/intents) to harm others. Further, humans have (often, inconsistent and time-varying) preferences about the behavior of other humans (especially behaviors with negative externalities) and states of the world more broadly. Moving beyond the problem of intent alignment with a single human, aligning AI with society is considerably more difficult, but it is necessary as AI deployment has broad effects. Much of the technical AI alignment research is still focused on the solipsistic “single-single” problem of single human and a single AI. The pluralistic dilemmas stemming from “single-multi” (a single human and multiple AIs) and especially “multi-single” (multiple humans and a single AI ) and “multi-multi” situations are critical (Andrew Critch & David Krueger, AI Research Considerations for Human Existential Safety). When attempting to align multiple humans with one or more AI system, we need overlapping and sustained endorsements of AI behaviors, but there is no consensus social choice mechanism to aggregate preferences and values across humans or time. Eliciting and synthesizing human values systematically is an unsolved problem that philosophers and economists have labored on for millennia. Hence, the need for public law here.
We can see that, on its face, intent alignment does not entail law-following. A key crux of this sequence, to be defended in subsequent posts, is that this gap between intent alignment and law-following is:
Bad in expectation for the long-term future.
Easier to bridge than the gap between intent alignment and deeper alignment with moral truth.
The law-informed AI framework sees intent alignment as (1.) something that private law methods can help with, and (2.) something that does not solve, and in some ways probably exacerbates (if we do not also tackle externalities concurrently), societal-AI alignment.
One way of describing the deployment of an AI system is that some human principal, P, employs an AI to accomplish a goal, G, specified by P. If we view G as a “contract,” methods for creating and implementing legal contracts – which govern billions of relationships every day – can inform how we align AI with P. Contracts memorialize a shared understanding between parties regarding value-action-state tuples. It is not possible to create a complete contingent contract between AI and P because AI’s training process is never comprehensive of every action-state pair (that P may have a value judgment on) that AI will see in the wild once deployed. Although it is also practically impossible to create complete contracts between humans, contracts still serve as incredibly useful customizable commitment devices to clarify and advance shared goals. (Dylan Hadfield-Menell & Gillian Hadfield, Incomplete Contracting and AI Alignment).
We believe this works mainly because the law has developed mechanisms to facilitate commitment and sustained alignment amongst ambiguity. Gaps within contracts – action-state pairs without a value – are often filled by the invocation of frequently employed standards (e.g., “material” and “reasonable”). These standards could be used as modular (pre-trained model) building blocks across AI systems. Rather than viewing contracts from the perspective of a traditional participant, e.g., a counterparty or judge, AI could view contracts (and their creation, implementation, evolution, and enforcement) as (model inductive biases and data) guides to navigating webs of inter-agent obligations.
If (1.) works to increase the intent alignment of one AI system to one human (or a small group of humans), we will have a more useful and locally reliable system. But this likely decreases the expected global reliability and safety of the system as it interacts with the broader world, e.g., by increasing the risk of the system maximizing the welfare of a small group of powerful people. There are many more objectives (outside of individual or group goals) and many more humans that should be considered. As AI advances, we need to simultaneously address the human/intent-alignment and society AI alignment problems. Some humans would “contract” with an AI (e.g., by providing instructions to the AI or from the AI learning the humans’ preferences/intents) to harm others. Further, humans have (often, inconsistent and time-varying) preferences about the behavior of other humans (especially behaviors with negative externalities) and states of the world more broadly. Moving beyond the problem of intent alignment with a single human, aligning AI with society is considerably more difficult, but it is necessary as AI deployment has broad effects. Much of the technical AI alignment research is still focused on the solipsistic “single-single” problem of single human and a single AI. The pluralistic dilemmas stemming from “single-multi” (a single human and multiple AIs) and especially “multi-single” (multiple humans and a single AI ) and “multi-multi” situations are critical (Andrew Critch & David Krueger, AI Research Considerations for Human Existential Safety). When attempting to align multiple humans with one or more AI system, we need overlapping and sustained endorsements of AI behaviors, but there is no consensus social choice mechanism to aggregate preferences and values across humans or time. Eliciting and synthesizing human values systematically is an unsolved problem that philosophers and economists have labored on for millennia. Hence, the need for public law here.
Relatedly, Cullen O’Keefe has a very useful discussion of distinctions between intent alignment and law-following AI here: https://forum.effectivealtruism.org/s/3pyRzRQmcJNvHzf6J/p/9RZodyypnWEtErFRM
I just posted another LW post that is related to this here: https://www.lesswrong.com/posts/Rn4wn3oqfinAsqBSf/intent-alignment-should-not-be-the-goal-for-agi-x-risk