My most important question: What kinds of work do you want to see? Common legal tasks include contract review, legal judgment prediction, and passing questions on the bar exam, but those aren’t necessarily the most important tasks. Could you propose a benchmark for the field of Legal AI that would help align AGI?
Beyond that, here are two aspects of the approach that I particularly like, as well as some disagreements.
First, laws and contracts are massively underspecified, and yet our enforcement mechanisms have well-established procedures for dealing with that ambiguity and can enforce the spirit of the law reasonably well. (The failures of this enforcement are proof of the difficulty of value specification, and evidence that laws alone will not close all the loopholes in human value specification.)
Second, the law provides a relatively nuanced picture of the values we should give to AI. A simpler answer to the question of “what should the AI’s values be?” would be “aligned with the person who’s using it”, known as intent alignment. Intent alignment is an important problem on its own, but does not entirely solve the problem. Law is particularly better than ideas like Coherent Extrapolated Volition, which attempt to reinvent morality in order to define the goals of an AI.
Ensuring law expresses the democratically deliberated views of citizens with fidelity and integrity by reducing (1.) regulatory capture, (2.) illegal lobbying efforts, (3.) politicization of judicial[3] and agency[4] independence …
Cultivating the spread of democracy globally.
While I agree that these would be helpful for both informing AI of our values and benefitting society more broadly, it doesn’t seem like a high leverage way to reduce AI risk. Improving governance is a crowded field. Though maybe the “improving institutional decision-making” folks would disagree.
there is no other legitimate source of societal values to embed in AGI besides law.
This seems a bit too extreme. Is there no room for ethics outside of the law? It is not illegal to tell a lie or make a child cry, but AI should understand that those actions conflict with human preferences. Work on imbuing ethical understanding in AI systems therefore seems valuable.
Thank you for this detailed feedback. I’ll go through the rest of your comments/questions in additional comment replies. To start:
What kinds of work do you want to see? Common legal tasks include contract review, legal judgment prediction, and passing questions on the bar exam, but those aren’t necessarily the most important tasks. Could you propose a benchmark for the field of Legal AI that would help align AGI?
Given that progress in AI capabilities research is driven, in large part, by shared benchmarks that thousands of researchers globally use to guide their experiments, understand as a community whether certain model and data advancements are improving AI capabilities, and compare results across research groups, we should aim for the same phenomena in Legal AI understanding. Optimizing benchmarks are one of the primary “objective functions” of the overall global AI capabilities research apparatus.
But, as quantitative lodestars, benchmarks also create perverse incentives to build AI systems that optimize for benchmark performance at the expense of true generalization and intelligence (Goodhart’s Law). Many AI benchmark datasets have a significant number of errors, which suggests that, in some cases, machine learning models have, more than widely recognized, failed to actually learn generalizable skills and abstract concepts. There are spurious cues within benchmark data structures that, once removed, significantly drop model performance, demonstrating that models are often learning patterns that do not generalize outside of the closed world of the benchmark data. Many benchmarks, especially in natural language processing, have become saturated not because the models are super-human but because the benchmarks are not truly assessing their skills to operate in real-world scenarios. This is not to say that AI capabilities have made incredible advancements over the past 10 years (and especially since 2017). The point is just that benchmarking AI capabilities is difficult.
Benchmarking AI alignment likely has the same issues, but compounded by significantly vaguer problem definitions. There is also far less research on AI alignment benchmarks. Performing well on societal alignment is more difficult than performing well on task capabilities. Because alignment is so fundamentally hard, the sky should be the limit on the difficulty of alignment benchmarks. Legal-informatics-based benchmarks could serve as AI alignment benchmarks for the research community. Current machine learning models perform poorly on legal understanding tasks such as statutory reasoning (Nils Holzenberger, Andrew Blair-Stanek & Benjamin Van Durme, A Dataset for Statutory Reasoning in Tax Law Entailment and Question Answering (2020); Nils Holzenberger & Benjamin Van Durme, Factoring Statutory Reasoning as Language Understanding Challenges(2021)), professional law (Dan Hendrycks et al., Measuring Massive Multitask Language Understanding, arXiv:2009.03300 (2020)), and legal discovery (Eugene Yang et al., Goldilocks: Just-Right Tuning of BERT for Technology-Assisted Review, in Advances in Information Retrieval: 44th European Conference on IR Research, 502–517 (2022)). There is significant room for improvement on legal language processing tasks (Ilias Chalkidis et al., LexGLUE: A Benchmark Dataset for Legal Language Understanding in English, in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (2022); D. Jain, M.D. Borah & A. Biswas, Summarization of legal documents: where are we now and the way forward, Comput. Sci. Rev. 40, 100388 (2021)). An example benchmark that could be used as part of the alignment benchmarks is Law Search (Faraz Dadgostari et al., Modeling Law Search as Prediction, A.I. & L. 29.1, 3-34 (2021) at 3 (“In any given matter, before legal reasoning can take place, the reasoning agent must first engage in a task of “law search” to identify the legal knowledge—cases, statutes, or regulations—that bear on the questions being addressed.”); Michael A. Livermore & Daniel N. Rockmore, The Law Search Turing Test, in Law as Data: Computation, Text, and the Future of Legal Analysis (2019) at 443-452; Michael A. Livermore et al., Law Search in the Age of the Algorithm, Mich. St. L. Rev. 1183 (2020)).
We have just received a couple small grants specifically to begin to build additional legal understanding benchmarks for LLMs, starting with legal standards. I will share more on this shortly and would invite anyone interested in partnering on this to reach out!
Is there no room for ethics outside of the law? It is not illegal to tell a lie or make a child cry, but AI should understand that those actions conflict with human preferences. Work on imbuing ethical understanding in AI systems therefore seems valuable.
There is definitely room for ethics outside of the law. When increasingly autonomous systems are navigating the world, it is important for AI to attempt to understand (or at least try to predict) moral judgements of humans encountered.
However, imbuing an understanding of an ethical framework for an AI to implement is more of a human-AI alignment solution, rather than a society-AI alignment solution.
The alignment problem is most often described (usually implicitly) with respect to the alignment of one AI system with one human, or a small subset of humans. It is more challenging to expand the scope of the AI’s analysis beyond a small set of humans and ascribe societal value to action-state pairs. Society-AI alignment requires us to move beyond “private contracts” between a human and her AI system and into the realm of public law to explicitly address inter-agent conflicts and policies designed to ameliorate externalities and solve massively multi-agent coordination and cooperation dilemmas.
We can use ethics to better align AI with its human principal by imbuing the ethical framework that the human principal chooses into the AI. But choosing one out of the infinite possible ethical theories (or an ensemble of theories) and “uploading” that into an AI does not work for a society-AI alignment solution because we have no means of deciding—across all the humans that will be affected by the resolution of the inter-agent conflicts and the externality reduction actions taken—which ethical framework to imbue in the AI. When attempting to align multiple humans with one or more AI system, we would need the equivalent of an elected “council on AI ethics” where every affected human is bought in and will respect the outcome.
In sum, imbuing an understanding of an ethical framework for an AI should definitely be pursued as part of human-AI alignment, but it is not an even remotely practical possibility for society-AI alignment.
law provides a relatively nuanced picture of the values we should give to AI. A simpler answer to the question of “what should the AI’s values be?” would be “aligned with the person who’s using it”, known as intent alignment. Intent alignment is an important problem on its own, but does not entirely solve the problem. Law is particularly better than ideas like Coherent Extrapolated Volition, which attempt to reinvent morality in order to define the goals of an AI.
The law-informed AI framework sees intent alignment as (1.) something that private law methods can help with, and (2.) something that does not solve, and in some ways probably exacerbates (if we do not also tackle externalities concurrently), societal-AI alignment.
One way of describing the deployment of an AI system is that some human principal, P, employs an AI to accomplish a goal, G, specified by P. If we view G as a “contract,” methods for creating and implementing legal contracts – which govern billions of relationships every day – can inform how we align AI with P. Contracts memorialize a shared understanding between parties regarding value-action-state tuples. It is not possible to create a complete contingent contract between AI and P because AI’s training process is never comprehensive of every action-state pair (that P may have a value judgment on) that AI will see in the wild once deployed. Although it is also practically impossible to create complete contracts between humans, contracts still serve as incredibly useful customizable commitment devices to clarify and advance shared goals. (Dylan Hadfield-Menell & Gillian Hadfield, Incomplete Contracting and AI Alignment).
We believe this works mainly because the law has developed mechanisms to facilitate commitment and sustained alignment amongst ambiguity. Gaps within contracts – action-state pairs without a value – are often filled by the invocation of frequently employed standards (e.g., “material” and “reasonable”). These standards could be used as modular (pre-trained model) building blocks across AI systems. Rather than viewing contracts from the perspective of a traditional participant, e.g., a counterparty or judge, AI could view contracts (and their creation, implementation, evolution, and enforcement) as (model inductive biases and data) guides to navigating webs of inter-agent obligations.
If (1.) works to increase the intent alignment of one AI system to one human (or a small group of humans), we will have a more useful and locally reliable system. But this likely decreases the expected global reliability and safety of the system as it interacts with the broader world, e.g., by increasing the risk of the system maximizing the welfare of a small group of powerful people. There are many more objectives (outside of individual or group goals) and many more humans that should be considered. As AI advances, we need to simultaneously address the human/intent-alignment and society AI alignment problems. Some humans would “contract” with an AI (e.g., by providing instructions to the AI or from the AI learning the humans’ preferences/intents) to harm others. Further, humans have (often, inconsistent and time-varying) preferences about the behavior of other humans (especially behaviors with negative externalities) and states of the world more broadly. Moving beyond the problem of intent alignment with a single human, aligning AI with society is considerably more difficult, but it is necessary as AI deployment has broad effects. Much of the technical AI alignment research is still focused on the solipsistic “single-single” problem of single human and a single AI. The pluralistic dilemmas stemming from “single-multi” (a single human and multiple AIs) and especially “multi-single” (multiple humans and a single AI ) and “multi-multi” situations are critical (Andrew Critch & David Krueger, AI Research Considerations for Human Existential Safety). When attempting to align multiple humans with one or more AI system, we need overlapping and sustained endorsements of AI behaviors, but there is no consensus social choice mechanism to aggregate preferences and values across humans or time. Eliciting and synthesizing human values systematically is an unsolved problem that philosophers and economists have labored on for millennia. Hence, the need for public law here.
We can see that, on its face, intent alignment does not entail law-following. A key crux of this sequence, to be defended in subsequent posts, is that this gap between intent alignment and law-following is:
Bad in expectation for the long-term future.
Easier to bridge than the gap between intent alignment and deeper alignment with moral truth.
My most important question: What kinds of work do you want to see? Common legal tasks include contract review, legal judgment prediction, and passing questions on the bar exam, but those aren’t necessarily the most important tasks. Could you propose a benchmark for the field of Legal AI that would help align AGI?
Beyond that, here are two aspects of the approach that I particularly like, as well as some disagreements.
First, laws and contracts are massively underspecified, and yet our enforcement mechanisms have well-established procedures for dealing with that ambiguity and can enforce the spirit of the law reasonably well. (The failures of this enforcement are proof of the difficulty of value specification, and evidence that laws alone will not close all the loopholes in human value specification.)
Second, the law provides a relatively nuanced picture of the values we should give to AI. A simpler answer to the question of “what should the AI’s values be?” would be “aligned with the person who’s using it”, known as intent alignment. Intent alignment is an important problem on its own, but does not entirely solve the problem. Law is particularly better than ideas like Coherent Extrapolated Volition, which attempt to reinvent morality in order to define the goals of an AI.
While I agree that these would be helpful for both informing AI of our values and benefitting society more broadly, it doesn’t seem like a high leverage way to reduce AI risk. Improving governance is a crowded field. Though maybe the “improving institutional decision-making” folks would disagree.
This seems a bit too extreme. Is there no room for ethics outside of the law? It is not illegal to tell a lie or make a child cry, but AI should understand that those actions conflict with human preferences. Work on imbuing ethical understanding in AI systems therefore seems valuable.
Thank you for this detailed feedback. I’ll go through the rest of your comments/questions in additional comment replies. To start:
Given that progress in AI capabilities research is driven, in large part, by shared benchmarks that thousands of researchers globally use to guide their experiments, understand as a community whether certain model and data advancements are improving AI capabilities, and compare results across research groups, we should aim for the same phenomena in Legal AI understanding. Optimizing benchmarks are one of the primary “objective functions” of the overall global AI capabilities research apparatus.
But, as quantitative lodestars, benchmarks also create perverse incentives to build AI systems that optimize for benchmark performance at the expense of true generalization and intelligence (Goodhart’s Law). Many AI benchmark datasets have a significant number of errors, which suggests that, in some cases, machine learning models have, more than widely recognized, failed to actually learn generalizable skills and abstract concepts. There are spurious cues within benchmark data structures that, once removed, significantly drop model performance, demonstrating that models are often learning patterns that do not generalize outside of the closed world of the benchmark data. Many benchmarks, especially in natural language processing, have become saturated not because the models are super-human but because the benchmarks are not truly assessing their skills to operate in real-world scenarios. This is not to say that AI capabilities have made incredible advancements over the past 10 years (and especially since 2017). The point is just that benchmarking AI capabilities is difficult.
Benchmarking AI alignment likely has the same issues, but compounded by significantly vaguer problem definitions. There is also far less research on AI alignment benchmarks. Performing well on societal alignment is more difficult than performing well on task capabilities. Because alignment is so fundamentally hard, the sky should be the limit on the difficulty of alignment benchmarks. Legal-informatics-based benchmarks could serve as AI alignment benchmarks for the research community. Current machine learning models perform poorly on legal understanding tasks such as statutory reasoning (Nils Holzenberger, Andrew Blair-Stanek & Benjamin Van Durme, A Dataset for Statutory Reasoning in Tax Law Entailment and Question Answering (2020); Nils Holzenberger & Benjamin Van Durme, Factoring Statutory Reasoning as Language Understanding Challenges (2021)), professional law (Dan Hendrycks et al., Measuring Massive Multitask Language Understanding, arXiv:2009.03300 (2020)), and legal discovery (Eugene Yang et al., Goldilocks: Just-Right Tuning of BERT for Technology-Assisted Review, in Advances in Information Retrieval: 44th European Conference on IR Research, 502–517 (2022)). There is significant room for improvement on legal language processing tasks (Ilias Chalkidis et al., LexGLUE: A Benchmark Dataset for Legal Language Understanding in English, in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (2022); D. Jain, M.D. Borah & A. Biswas, Summarization of legal documents: where are we now and the way forward, Comput. Sci. Rev. 40, 100388 (2021)). An example benchmark that could be used as part of the alignment benchmarks is Law Search (Faraz Dadgostari et al., Modeling Law Search as Prediction, A.I. & L. 29.1, 3-34 (2021) at 3 (“In any given matter, before legal reasoning can take place, the reasoning agent must first engage in a task of “law search” to identify the legal knowledge—cases, statutes, or regulations—that bear on the questions being addressed.”); Michael A. Livermore & Daniel N. Rockmore, The Law Search Turing Test, in Law as Data: Computation, Text, and the Future of Legal Analysis (2019) at 443-452; Michael A. Livermore et al., Law Search in the Age of the Algorithm, Mich. St. L. Rev. 1183 (2020)).
We have just received a couple small grants specifically to begin to build additional legal understanding benchmarks for LLMs, starting with legal standards. I will share more on this shortly and would invite anyone interested in partnering on this to reach out!
There is definitely room for ethics outside of the law. When increasingly autonomous systems are navigating the world, it is important for AI to attempt to understand (or at least try to predict) moral judgements of humans encountered.
However, imbuing an understanding of an ethical framework for an AI to implement is more of a human-AI alignment solution, rather than a society-AI alignment solution.
The alignment problem is most often described (usually implicitly) with respect to the alignment of one AI system with one human, or a small subset of humans. It is more challenging to expand the scope of the AI’s analysis beyond a small set of humans and ascribe societal value to action-state pairs. Society-AI alignment requires us to move beyond “private contracts” between a human and her AI system and into the realm of public law to explicitly address inter-agent conflicts and policies designed to ameliorate externalities and solve massively multi-agent coordination and cooperation dilemmas.
We can use ethics to better align AI with its human principal by imbuing the ethical framework that the human principal chooses into the AI. But choosing one out of the infinite possible ethical theories (or an ensemble of theories) and “uploading” that into an AI does not work for a society-AI alignment solution because we have no means of deciding—across all the humans that will be affected by the resolution of the inter-agent conflicts and the externality reduction actions taken—which ethical framework to imbue in the AI. When attempting to align multiple humans with one or more AI system, we would need the equivalent of an elected “council on AI ethics” where every affected human is bought in and will respect the outcome.
In sum, imbuing an understanding of an ethical framework for an AI should definitely be pursued as part of human-AI alignment, but it is not an even remotely practical possibility for society-AI alignment.
The law-informed AI framework sees intent alignment as (1.) something that private law methods can help with, and (2.) something that does not solve, and in some ways probably exacerbates (if we do not also tackle externalities concurrently), societal-AI alignment.
One way of describing the deployment of an AI system is that some human principal, P, employs an AI to accomplish a goal, G, specified by P. If we view G as a “contract,” methods for creating and implementing legal contracts – which govern billions of relationships every day – can inform how we align AI with P. Contracts memorialize a shared understanding between parties regarding value-action-state tuples. It is not possible to create a complete contingent contract between AI and P because AI’s training process is never comprehensive of every action-state pair (that P may have a value judgment on) that AI will see in the wild once deployed. Although it is also practically impossible to create complete contracts between humans, contracts still serve as incredibly useful customizable commitment devices to clarify and advance shared goals. (Dylan Hadfield-Menell & Gillian Hadfield, Incomplete Contracting and AI Alignment).
We believe this works mainly because the law has developed mechanisms to facilitate commitment and sustained alignment amongst ambiguity. Gaps within contracts – action-state pairs without a value – are often filled by the invocation of frequently employed standards (e.g., “material” and “reasonable”). These standards could be used as modular (pre-trained model) building blocks across AI systems. Rather than viewing contracts from the perspective of a traditional participant, e.g., a counterparty or judge, AI could view contracts (and their creation, implementation, evolution, and enforcement) as (model inductive biases and data) guides to navigating webs of inter-agent obligations.
If (1.) works to increase the intent alignment of one AI system to one human (or a small group of humans), we will have a more useful and locally reliable system. But this likely decreases the expected global reliability and safety of the system as it interacts with the broader world, e.g., by increasing the risk of the system maximizing the welfare of a small group of powerful people. There are many more objectives (outside of individual or group goals) and many more humans that should be considered. As AI advances, we need to simultaneously address the human/intent-alignment and society AI alignment problems. Some humans would “contract” with an AI (e.g., by providing instructions to the AI or from the AI learning the humans’ preferences/intents) to harm others. Further, humans have (often, inconsistent and time-varying) preferences about the behavior of other humans (especially behaviors with negative externalities) and states of the world more broadly. Moving beyond the problem of intent alignment with a single human, aligning AI with society is considerably more difficult, but it is necessary as AI deployment has broad effects. Much of the technical AI alignment research is still focused on the solipsistic “single-single” problem of single human and a single AI. The pluralistic dilemmas stemming from “single-multi” (a single human and multiple AIs) and especially “multi-single” (multiple humans and a single AI ) and “multi-multi” situations are critical (Andrew Critch & David Krueger, AI Research Considerations for Human Existential Safety). When attempting to align multiple humans with one or more AI system, we need overlapping and sustained endorsements of AI behaviors, but there is no consensus social choice mechanism to aggregate preferences and values across humans or time. Eliciting and synthesizing human values systematically is an unsolved problem that philosophers and economists have labored on for millennia. Hence, the need for public law here.
Relatedly, Cullen O’Keefe has a very useful discussion of distinctions between intent alignment and law-following AI here: https://forum.effectivealtruism.org/s/3pyRzRQmcJNvHzf6J/p/9RZodyypnWEtErFRM
I just posted another LW post that is related to this here: https://www.lesswrong.com/posts/Rn4wn3oqfinAsqBSf/intent-alignment-should-not-be-the-goal-for-agi-x-risk