A Case for AI Safety via Law
This post is to make the subject case more available and open to comment. A paper with the above title is currently languishing in arXiv limbo but available in Google Docs. I was surprised to see my last and only postings to LessWrong were made on this very subject in 2010 as comments to the thread “Why not just write failsafe rules into the superintelligent machine?”
Unfortunately (IMO) this approach to AI alignment doesn’t seem to have gained much traction in the past 13 years. The paper does cite, however, 15 precedents that show there is some support. (Anthropic’s Constitutional AI and John Nay’s Law Informs Code are two good recent examples.)
The following reproduces the Summary of Argument and Conclusion sections from the paper.
The claim being argued is “Effective legal systems are the best way to address AI safety.”
4 Summary of Argument
4.1 Law is the standard, time-tested, best practice for maintaining
order in societies of intelligent agents.
Law has been the primary way of maintaining functional, cohesive societies for thousands of years. It is how humans establish, communicate, and understand what actions are required, permissible, and prohibited in social spheres. Substantial experience exists in drafting, enacting, enforcing, litigating, and maintaining rules in contexts that include public law, private contracts, and the many others noted in this brief. Law will naturally apply to new species of intelligent systems and facilitate safety and value alignment for all.
4.2 Law is scrutable to humans and other intelligent agents.
Unlike AI safety proposals where rules are learned via examples and encoded in artificial (or biological) neural networks, laws are intended to be understood by humans and machines. Although laws can be quite complex, such codified rules are significantly more scrutable than rules learned through induction. The transparent (white box) nature of law provides a critical advantage over opaque (black box) neural network alternatives.
4.3 Law reflects consensus values.
Democratically developed law is intimately linked and essentially equivalent to consensus ethics. Both are human inventions intended to facilitate the wellbeing of individuals and the collective. They represent shared values culturally determined through rational consideration and negotiation. They reflect the wisdom of crowds accumulated over time—not preferences that vary from person to person and are often based on emotion, irrational ideologies, confusion, or psychopathy. Ethical values provide the virtue core of legal systems and reflect the “spirit of the law.” Consequentialist shells surround such cores and specify the “letter of the law.” This relationship between law and ethics makes law a natural solution for human-AI value alignment. A minority of AIs and people, however powerful, cannot game laws to achieve selfish ends.
4.4 Legal systems are responsive to changes in the environment and changes in moral values.
By utilizing legal mechanisms to consolidate values and update them over time, human and AI values can remain aligned indefinitely as values, technologies, and environmental conditions change. Thus law provides a practical implementation of Yudkowsky’s (2004) Coherent Extrapolated Volition by allowing values to evolve that are wise, aspirational, convergent, coherent, suitably extrapolated, and properly interpreted.
4.5 Legal systems restrict overly rapid change.
Legal processes provide checks and balances against overly rapid change to values and laws. Such checks are particularly important when legal change can occur at AI speeds. Legal systems and laws must adapt quickly enough to address the urgency of issues that arise but not so quickly as to risk dire consequences. Laws should be based on careful analysis and effective simulation and the system be able to quickly detect and correct problems found after implementation. New technologies and methods should be introduced to make legal processing as efficient as possible without removing critical checks and balances.
4.6 Laws are context sensitive, hierarchical, and scalable.
Laws apply to contexts ranging from international, national, state, and local governance to all manner of other social contracts. Contexts can overlap, be hierarchical, or have other relationships. Humans have lived under this regime for millennia and are able to understand which laws apply and take precedence over others based on contexts (e.g., jurisdictions, organization affiliations, contracts in force). Artificial intelligent systems will be able to manage the multitude of contexts and applicable laws by identifying, loading, and applying appropriate legal corpora for applicable contexts. For example, AIs (like humans) will understand that crosschecking is permitted in hockey games but not outside the arena. They will know when to apply rules of the road versus rules of the sea. They will know when the laws of chess apply versus rules of Go. They will know their rights relative to every software agent, tool, and service they interface with.
4.7 AI Safety via Law can address the full range of AI safety risks, from systems that are narrowly focused to those having general intelligence or even superintelligence.
Enacting and enforcing appropriate laws, and instilling law-abiding values in AIs and humans, can mitigate risks spanning all levels of AI capability—from narrow AI to AGI and ASI. If intelligent agents stray from the law, effective detection and enforcement must occur.
Even the catastrophic vision of smarter-than-human-intelligence articulated by Yudkowsky (2022, 2023) and others (Bostrom, 2014; Russell, 2019) can be avoided by effective implementation of AISVL. It may require that the strongest version of the instrumental convergence thesis (which they rely on) is not correct. Appendix A suggests some reasons why AI convergence to dangerous values is not inevitable.
AISVL applies to all intelligent systems regardless of their underlying design, cognitive architecture, and technology. It is immaterial whether an AI is implemented using biology, deep learning, constructivist AI (Johnston, 2023), semantic networks, quantum computers, positronics, or other methods. All intelligent systems must comply with applicable laws regardless of their particular values, preferences, beliefs, and how they are wired.
5 Conclusion
Although its practice has often been flawed, law is a natural solution for maintaining social safety and value alignment. All intelligent agents— biological and mechanical—must know the law, strive to abide by it, and be subject to effective intervention when violated. The essential equivalence and intimate link between consensus ethics and democratic law provide a philosophical and practical basis for legal systems that marry values and norms (“virtue cores”) with rules that address real world situations (“consequentialist shells”). In contrast to other AI safety proposals, AISVL requires AIs “do as we legislate, not as we do.”
Advantages of AISVL include its leveraging of time-tested standard practice; scrutability to all intelligent agents; reflection of consensus values; responsiveness to changes in the environment and in moral values; restrictiveness of overly rapid change; context sensitivity, hierarchical structure, and scalability; and applicability to safety risks posed by narrow, general, and even superintelligent AIs.
For the future safety and wellbeing of all sentient systems, work should occur in earnest to improve legal processes and laws so they are more robust, fair, nimble, efficient, consistent, understandable, accepted, and complied with. (Legal frameworks outside of public law may be effective to this end.) Humans are in dire need of such improvements to counter the dangers that we pose to the biosphere and to each other. It is not clear if advanced AI will be more or less dangerous than humans. Law is critical for both.
I feel as if there is some unstated idea here that I am not quite inferring. What is the safety approach supposed to be? If there were an organization devoted to this path to AI safety, what activities would that organization be engaged in?
Seth Herd interprets the idea as “regulation”. Indeed, this seems like the obvious interpretation. But I suspect it misses your point.
The first part is just “regulation”. The second part, “instilling law-abiding values in AIs and humans”, seems like a significant departure. It seems like the proposal involves both (a) designing and enacting a set of appropriate laws, and (b) finding and deploying a way of instilling law-abiding values (in AIs and humans). Possibly (a) includes a law requiring (b): AIs (and AI-producing organizations) must be designed so as to have law-abiding values within some acceptable tolerances.
This seems like a very sensible demand, but it does seem like it has to piggyback on some other approach to alignment, which would solve the object-level instilling-values problem.
If the approach does indeed require “instilling law-abiding values in AI”, it is unclear why “AISVL applies to all intelligent systems regardless of their underlying design”. The technology to instill law-abiding values may apply to specific underlying designs, specific capability ranges, etc. I guess the idea is that part (a) of the approach, the laws themselves, apply regardless. But if part (b), the value-instilling part, has limited applicability, then this has the effect of simply outlawing designs not compatible. That’s fine, but “AISVL applies to all intelligent systems regardless of their underlying design” seems to dramatically over-sell the applicability of the approach in that case. Or perhaps I’m misunderstanding.
Similarly, “AI safety via law can address the full range of safety risks” seems to over-sell the whole section, a major point of which is to claim that AISVL does not apply to the strongest instrumental-convergence concerns. (And why not, exactly? It seems like, if the value-instilling tech existed, it would indeed avert the strongest instrumental-convergence concerns.)
The summary I posted here was just a teaser to the full paper (linked in pgph. 1). That said, your comments show you reasoned pretty closely to points I tried to make therein. Almost no need to read it. :)
The main message of the paper is along the lines of “a.” That is, per the claim in the 4th pgph, “Effective legal systems are the best way to address AI safety.” I’m arguing that having effective legal systems and laws are the critical things. How laws/values get instilled in AIs (and humans) is mostly left as an exercise for the reader. Your point about “simply outlawing designs not compatible” is reasonable.
The way I put it in the paper (sect. 3, pgph. 2): “Many of the proposed non-law-based solutions may be worth pursuing to help assure AI systems are law abiding. However, they are secondary to having a robust, well-managed, readily available corpus of codified law—and complimentary legal systems—as the foundation and ultimate arbiter of acceptable behaviors for all intelligent systems, both biological and mechanical.”
Later I write, “Suggested improvements to law and legal process are mostly beyond the scope of this brief. It is possible, however, that significant technological advances will not be needed for implementing some key capabilities. For example, current Large Language Models are nearly capable of understanding vast legal corpora and making appropriate legal decisions for humans and AI systems (Katz et al., 2023). Thus, a wholesale switch to novel legal encodings (e.g., computational and smart contracts) may not be necessary.”
I suspect some kind of direct specification approach (per Bostrom classification) could work where AIs confirm that (non-trivial) actions they are considering comply with legal corpora appropriate to current contexts before taking action. I presume techniques used by the self-driving-car people will be up to the task for their application.
I struggled with what to say about AISVL wrt superintelligence and instrumental convergence. Probably should have let the argument ride without hedging, i.e., superintelligences will have to comply with laws and the demands of legal systems. They will be full partners with humans in enacting and enforcing laws. It’s hard to just shrug off the concerns of the Yudkowskys, Bostroms, and Russells of the world.
Perhaps the most important and (hopefully) actionable recommendation of the proposal is in the conclusion:
“For the future safety and wellbeing of all sentient systems, work should occur in earnest to improve legal processes and laws so they are more robust, fair, nimble, efficient, consistent, understandable, accepted, and complied with.”
fwiw, I did skim the doc, very briefly.
In that case, I agree with Seth Herd that this approach is not being neglected. Of course it could be done better. I’m not sure exactly how many people are working on it, but I have the impression that it is more than a dozen, since I’ve met some of them without trying.
I think this underestimates the difficulty of self-driving cars. In the application of self-driving airplanes (on runways, not in the air), it is indeed possible to make an adequate model of the environment, such that neural networks can be verified to follow a formally specified set of regulations (and self-correct from undesired states to desired states). With self-driving cars, the environment is far too complex to formally model in that way. You get to a point where you are trusting one AI model (of the complex environment) to verify another. And you can’t explore the whole space effectively, so you still can’t provide really strong guarantees (and this translates to errors in practice).
It seems to me like you are somewhat shrugging off those concerns, since the technological interventions (eg smart contracts, LLMs understanding laws, whatever self-driving-car people get up to) are very “light” in the face of those “heavy” concerns. But a legal approach need not shrug off those concerns. For example, law could require the kind of verification we can now apply to airplane autopilot be applied to self-driving-cars as well. This would make self-driving illegal in effect until a large breakthrough in ML verification takes place, but it would work!
Glad to hear it. I hope to find and follow such work. The people I’m aware of are listed on pp. 3-5 of the paper. Was happy to see O’Keefe, Bai et al. (Anthropic), and Nay leaning this way.
Yes. I’m definitely being glib about implementation details. First things first. :)
I agree with you that if self-driving-cars can’t be “programmed” (instilled) to be adequately law-abiding, their future isn’t bright. Per above, I’m heartened by Anthropic’s Constitutional AI (priming LLMs with basic “laws”) having some success getting AIs to behave. Ditto for anecdotes I’ve heard about “asking an LLM to come up with a money-making plan that doesn’t violate any laws.” Seems too easy right?
One final comment about implementation details. In the appendix I note:
Broadly speaking, implementing AIs using safe architectures (ones not prone to law-breaking) is another implementation direction. Drexler’s CAIS may be an example.
Would you count all the people who worked on the EU AI act?
Sure. Getting appropriate new laws enacted is an important element. From the paper:
I’d say the EU AI Act (and similar) work addresses the “new laws” imperative. (I won’t comment (much) on pros and cons of its content. In general, it seems pretty good. I wonder if they considered adding Etzioni’s first law to the mix, “An AI system must be subject to the full gamut of laws that apply to humans”? That is what I meant by “adopting existing bodies of law to implement AISVL.” The item in the EU AI Act about designing generative AIs to not generate illegal content is related.)
The more interesting work will be on improving legal processes along the dimensions listed above. And really interesting will be, as AIs get more autonomous and agentic, the “instilling” part where AIs must dynamically recognize and comply with the legal-moral corpora appropriate to the contexts they find themselves in.
I think you’ll find this topic discussed a lot, both pro and con, under the term “regulation”.
Sorry for sounding harsh. But to say something meaningful, I believe you have to argue two things:
Laws are distinct enough from human values (1), but following laws / caring about laws / reporting about predicted law violations prevents the violation of human values (2).
I think the post fails to argue both points. I see no argument that instilling laws is distinct enough from instilling values/corrigibility/human semantics in general (1) and that laws actually prevent misalignment (2).
If AI can be just asked to follow your clever idea, then AI is already safe enough without your clever idea. “Asking AI to follow something” is not what Bostrom means by direct specification, as far as I understand.
My argument goes in a different direction. I reject premise (1) and claim there is an “essential equivalence and intimate link between consensus ethics and democratic law [that] provide a philosophical and practical basis for legal systems that marry values and norms (“virtue cores”) with rules that address real world situations (“consequentialist shells”).”
In the body of the paper I characterize democratic law and consensus ethics as follows:
That is, democratic law corresponds to the common definition of Law. Consensus ethics is essentially equivalent to human values when understood in the standard philosophical sense as “shared values culturally determined through rational consideration and negotiation.” In short, I’m of the opinion “Law = Ethics.”
Regarding your premise (2): See my reply to Abram’s comment. I’m mostly ducking the “instilling” aspects. I’m arguing for open, external, effective legal systems as the key to AI alignment and safety. I see the implementation/instilling details as secondary.
My reference to Bostrom’s direct specification was not intended to match his use, i.e., hard coding (instilling) human values in AIs. My usage refers to specifying rules/laws/ethics externally so they are available and usable by all intelligent systems. Of the various alignment approaches Bostrom mentioned (and deprecated), I thought direct specification came closest to AISVL.
Maybe there’s a misunderstanding. Premise (1) makes sure that your proposal is different from any other proposal. It’s impossible to reject premise (1) without losing the proposal’s meaning.
Premise (1) is possible to reject only if you’re not solving Alignment but solving some other problem.
If an AI can be Aligned externally, then it’s already safe enough. It feels like...
You’re not talking about solving Alignment, but talking about some different problem. And I’m not sure what that problem is.
For your proposal to work, the problem needs to be already solved. All the hard/interesting parts need to be already solved.
I’m talking about the need for all AIs (and humans) to be bound by legal systems that include key consensus laws/ethics/values. It may seem obvious, but I think this position is under-appreciated and not universally accepted.
By focusing on the external legal system, many key problems associated with alignment (as recited in the Summary of Argument) are addressed. One worth highlighting is 4.4, which suggests AISVL can assure alignment in perpetuity despite changes in values, environmental conditions, and technologies, i.e., a practical implementation of Yudkowsky’s CEV.
Maybe you should edit the post to add something like this:
...
I think the key problems are not “addressed”, you just assume they won’t exist. And laws are not a “practical implementation of CEV”.