Current technical interests: future visioning, AGI development (via Piaget-inspired constructivism), AI alignment/safety (via law)
JWJohnston
If an AI can be Aligned externally, then it’s already safe enough. It feels like...
You’re not talking about solving Alignment, but talking about some different problem. And I’m not sure what that problem is.
For your proposal to work, the problem needs to be already solved. All the hard/interesting parts need to be already solved.
I’m talking about the need for all AIs (and humans) to be bound by legal systems that include key consensus laws/ethics/values. It may seem obvious, but I think this position is under-appreciated and not universally accepted.
By focusing on the external legal system, many key problems associated with alignment (as recited in the Summary of Argument) are addressed. One worth highlighting is 4.4, which suggests AISVL can assure alignment in perpetuity despite changes in values, environmental conditions, and technologies, i.e., a practical implementation of Yudkowsky’s CEV.
I’m not sure exactly how many people are working on it, but I have the impression that it is more than a dozen, since I’ve met some of them without trying.
Glad to hear it. I hope to find and follow such work. The people I’m aware of are listed on pp. 3-5 of the paper. Was happy to see O’Keefe, Bai et al. (Anthropic), and Nay leaning this way.
It seems to me like you are somewhat shrugging off those concerns, since the technological interventions (eg smart contracts, LLMs understanding laws, whatever self-driving-car people get up to) are very “light” in the face of those “heavy” concerns. But a legal approach need not shrug off those concerns. For example, law could require the kind of verification we can now apply to airplane autopilot be applied to self-driving-cars as well. This would make self-driving illegal in effect until a large breakthrough in ML verification takes place, but it would work!
Yes. I’m definitely being glib about implementation details. First things first. :)
I agree with you that if self-driving-cars can’t be “programmed” (instilled) to be adequately law-abiding, their future isn’t bright. Per above, I’m heartened by Anthropic’s Constitutional AI (priming LLMs with basic “laws”) having some success getting AIs to behave. Ditto for anecdotes I’ve heard about “asking an LLM to come up with a money-making plan that doesn’t violate any laws.” Seems too easy right?
One final comment about implementation details. In the appendix I note:
We suspect emergence of instrumental values is not inevitable for any “sufficiently advanced AI system.” Rather, whether such values emerge depends on what cognitive architecture and environmental conditions (training regimens) are used.
Broadly speaking, implementing AIs using safe architectures (ones not prone to law-breaking) is another implementation direction. Drexler’s CAIS may be an example.
I believe you have to argue two things:
Laws are distinct enough from human values (1), but following laws / caring about laws / reporting about predicted law violations prevents the violation of human values (2).
I think the post fails to argue both points. I see no argument that instilling laws is distinct enough from instilling values/corrigibility/human semantics in general (1) and that laws actually prevent misalignment (2).
My argument goes in a different direction. I reject premise (1) and claim there is an “essential equivalence and intimate link between consensus ethics and democratic law [that] provide a philosophical and practical basis for legal systems that marry values and norms (“virtue cores”) with rules that address real world situations (“consequentialist shells”).”
In the body of the paper I characterize democratic law and consensus ethics as follows:
Both are human inventions intended to facilitate the wellbeing of individuals and the collective. They represent shared values culturally determined through rational consideration and negotiation. To be effective, democratic law and consensus ethics should reflect sufficient agreement of a significant majority of those affected. Democratic law and consensus ethics are not inviolate physical laws, instinctive truths, or commandments from deities, kings, or autocrats. They do not represent individual values, which vary from person to person and are often based on emotion, irrational ideologies, confusion, or psychopathy.
That is, democratic law corresponds to the common definition of Law. Consensus ethics is essentially equivalent to human values when understood in the standard philosophical sense as “shared values culturally determined through rational consideration and negotiation.” In short, I’m of the opinion “Law = Ethics.”
Regarding your premise (2): See my reply to Abram’s comment. I’m mostly ducking the “instilling” aspects. I’m arguing for open, external, effective legal systems as the key to AI alignment and safety. I see the implementation/instilling details as secondary.
If AI can be just asked to follow your clever idea, then AI is already safe enough without your clever idea. “Asking AI to follow something” is not what Bostrom means by direct specification, as far as I understand.
My reference to Bostrom’s direct specification was not intended to match his use, i.e., hard coding (instilling) human values in AIs. My usage refers to specifying rules/laws/ethics externally so they are available and usable by all intelligent systems. Of the various alignment approaches Bostrom mentioned (and deprecated), I thought direct specification came closest to AISVL.
I feel as if there is some unstated idea here that I am not quite inferring. What is the safety approach supposed to be? If there were an organization devoted to this path to AI safety, what activities would that organization be engaged in?
The summary I posted here was just a teaser to the full paper (linked in pgph. 1). That said, your comments show you reasoned pretty closely to points I tried to make therein. Almost no need to read it. :)
The first part is just “regulation”. The second part, “instilling law-abiding values in AIs and humans”, seems like a significant departure. It seems like the proposal involves both (a) designing and enacting a set of appropriate laws, and (b) finding and deploying a way of instilling law-abiding values (in AIs and humans). Possibly (a) includes a law requiring (b): AIs (and AI-producing organizations) must be designed so as to have law-abiding values within some acceptable tolerances.
The main message of the paper is along the lines of “a.” That is, per the claim in the 4th pgph, “Effective legal systems are the best way to address AI safety.” I’m arguing that having effective legal systems and laws are the critical things. How laws/values get instilled in AIs (and humans) is mostly left as an exercise for the reader. Your point about “simply outlawing designs not compatible” is reasonable.
The way I put it in the paper (sect. 3, pgph. 2): “Many of the proposed non-law-based solutions may be worth pursuing to help assure AI systems are law abiding. However, they are secondary to having a robust, well-managed, readily available corpus of codified law—and complimentary legal systems—as the foundation and ultimate arbiter of acceptable behaviors for all intelligent systems, both biological and mechanical.”
Later I write, “Suggested improvements to law and legal process are mostly beyond the scope of this brief. It is possible, however, that significant technological advances will not be needed for implementing some key capabilities. For example, current Large Language Models are nearly capable of understanding vast legal corpora and making appropriate legal decisions for humans and AI systems (Katz et al., 2023). Thus, a wholesale switch to novel legal encodings (e.g., computational and smart contracts) may not be necessary.”
I suspect some kind of direct specification approach (per Bostrom classification) could work where AIs confirm that (non-trivial) actions they are considering comply with legal corpora appropriate to current contexts before taking action. I presume techniques used by the self-driving-car people will be up to the task for their application.
“AI safety via law can address the full range of safety risks” seems to over-sell the whole section, a major point of which is to claim that AISVL does not apply to the strongest instrumental-convergence concerns. (And why not, exactly? It seems like, if the value-instilling tech existed, it would indeed avert the strongest instrumental-convergence concerns.)
I struggled with what to say about AISVL wrt superintelligence and instrumental convergence. Probably should have let the argument ride without hedging, i.e., superintelligences will have to comply with laws and the demands of legal systems. They will be full partners with humans in enacting and enforcing laws. It’s hard to just shrug off the concerns of the Yudkowskys, Bostroms, and Russells of the world.
Perhaps the most important and (hopefully) actionable recommendation of the proposal is in the conclusion:
“For the future safety and wellbeing of all sentient systems, work should occur in earnest to improve legal processes and laws so they are more robust, fair, nimble, efficient, consistent, understandable, accepted, and complied with.”
- 20 Sep 2023 9:37 UTC; 1 point) 's comment on A Case for AI Safety via Law by (
A Case for AI Safety via Law
I submit that current legal systems (or something close) will apply to AIs. And there will be lots more laws written to apply to AI-related matters.
It seems to me current laws already protect against rampant paperclip production. How could an AI fill the universe with paperclips without violating all kinds of property rights, probably prohibitions against mass murder (assuming it kills lots of humans as a side effect), financial and other fraud to aquire enough resources, etc. I see it now: some DA will serve a 25,000 count indictment. That AI will be in BIG trouble.
Or say in a few years technology exists for significant matter transmutation, highly capable AIs exist, one misguided AI pursues a goal of massive paperclip production, and it thinks it found a way to do it without violating existing laws. The AI probably wouldn’t get past converting a block or two in New Jersey before the wider public and legislators wake up to the danger and rapidly outlaw that and related practices. More likely, technologies related to matter transmutation will be highly regulated before an episode like that can occur.
I guess here I’d reiterate this point from my latest reply to orthonormal:
Again, it’s not only about having lots of rules. More importantly it’s about the checks and balances and enforcement the system provides.
It may not be helpful to think of some grand utility-maximising AI that constantly strives to maximize human happiness or some other similar goals, and can cause us to wake up in some alternate reality some day. It would be nice to have some AIs working on how to maximize some things human’s value, e.g., health, happiness, attractive and sensible shoes. If any of those goals would appear to be impeded by current law, the AI would lobby it’s legislator to amend the law. And in a better future, important amendments would go through rigorous analysis in a few days, better root out unintended consequences, and be enacted as quickly as prudent.
For this reason, giving an AI simple goals but complicated restrictions seems incredibly unsafe, which is why SIAI’s approach is figuring out the correct complicated goals.
Tackling FAI by figuring out complicated goals doesn’t sound like a good program to me, but I’d need to dig into more background on it. I’m currently disposed to prefer “complicated restrictions,” or more specifically this codified ethics/law approach.
In your example of a stamp collector run amok, I’d say it’s fine to give an agent the goal of maximizing the number of stamps it collects. Given an internal world model that includes the law/ethics corpus, it should not hack into others’ computers, steal credit card numbers, and appropriate printers to achieve its goal. And if it does (a) Other agents should array against it to prevent the illegal behaviors, and (b) It will be held accountable for those actions.
The EURISKO example seems better to me. The goal of war (defeat one’s enemies) is particularly poignant and much harder to ethically navigate. If the generals think sinking their own ships to win the battle/war is off limits they may have to write laws/rules that forbid it. The stakes of war are particularly high and figuring out the best (ethical?) rules is particularly important and difficult. Rather than banning EURISKO from future war games given its “clever” solutions, it would seem the military could continue to learn from it and amend the laws as necessary. People still debate whether Truman dropping the bomb on Hiroshima was the right decision. Now there’s some tough ethical calculus. Would an ethical AI do better or worse?
Legal systems are what societies currently rely on to protect public liberties and safety. Perhaps an SIAI program can come up with a completely different and better approach. But in lieu of that, why not leverage Law? Law = Codified Ethics.
Again, it’s not only about having lots of rules. More importantly it’s about the checks and balances and enforcement the system provides.
We would probably start with current legal systems and remove outdated laws, clarify the ill-defined, and enact a bunch of new ones. And our (hyper-)rational AI legislators, lawyers, and judges should not be disposed to game the system. AI and other emerging technologies should both enable and require such improvements.
The laws might be appropriately viewed primarily as blocks that keep the AI from taking actions deemed unacceptable by the collective. AIs could pursue whatever goals they sees fit within the constraints of the law.
However, the laws wouldn’t be all prohibitions. The “general laws” would be more prescriptive, e.g., life, liberty, justice for all. The “specific laws” would tend to be more prohibition oriented. Presumably the vast majority of them would be written to handle common situations and important edge cases. If someone suspects the citizenry may be at jeopardy of frequent runaway trolly incidents, the legislature can write statutes on what is legal to throw under the wheels to prevent deaths of (certain configurations of) innocent bystanders. Probably want to start with inanimate objects before considering sentient robots, terminally sick humans, fat men, puppies, babies, and whatever. (It might be nice to have some clarity on this! :-))
To explore your negligence case example, I imagine some statute might require agents to rescue people in imminent danger of losing their lives if possible, subject to certain extenuating cicumstances. The legislature and public can have a lively debate about whether this law still makes sense in a future where dead people can be easily reanimated or if human life is really not valuable in the grand scheme of things. If humans have good representatives in the legistature and/or good a few good AI advocates, mass human extermination shouldn’t be a problem, at least until the consensus shifts in such directions. Perhaps some day there may be a consensus on forced sterilizations to prevent greater harms. I’d argue such a system of laws should be able to handle it. The key seems to be to legislate prescriptions and prohibitions relevant to current state of society and change them as the facts on the ground change. This would seem to get around the impossibility of defining eternal laws or algorithms that are ever-true in every possible future state.
Thanks for the links. I’ll try to make time to check them out more closely.
I had previously skimmed a bunch of lesswrong content and didn’t find anything that dissuaded me from the Asimov’s Laws++ idea. I was encouraged by the first post in the Metaethics Sequence where Eliezer warns about not “trying to oversimplify human morality into One Great Moral Principle.” The law/ethics corpus idea certainly doesn’t do that!
RE: your first and final paragraphs: If I had to characterize my thoughts on how AIs will operate, I’d say they’re likely to be eminently rational. Certainly not anthropomorphized as virtuous or vicious human beings. They will crank the numbers, follow the rules, run the simulations, do the math, play the odds as only machines can. Probably (hopefully?) they’ll have little of the emotional/irrational baggage we humans have been selected to have. Given that, I don’t see much motivation for AIs to fixate on gaming the system. They should be fine with following and improving the rules as rational calculus dictates, subject to the aforementioned checks and balances. They might make impeccable legislators, lawyers, and judges.
I wonder if this solution was dismissed too early by previous analysts due some kind of “scale bias?” The idea of having only 3 or 4 or 5 (Asimov) Laws for FAI is clearly flawed. But scale that to a few hundred thousand or a million, and it might work. No?
Thanks for the comments. See my response to DavidAgain re: loophole-seeking AIs.
Thanks for the thoughts.
You seem to imply that AIs motivations will be substantially humanlike. Why might AIs be motivated to nobble the courts, control pens, overturn vast segments of law, find loopholes, and engage in other such humanlike gamesmanship? Sounds like malicious programming to me.
They should be designed to treat the law as a fundamental framework to work within, akin to common sense, physical theories, and other knowledge they will accrue and use over the course of their operation.
I was glib in my post suggesting that “before taking actions, the AI must check the corpus to make sure it’s desired actions are legal.” Presumably most AIs would compile the law corpus into their own knowledge bases, perhaps largely integrated with other knowledge they rely on. Thus they could react more quickly during decision making and action. They would be required/wired, however, to be reasonably up to the minute on all changes to the law and grok the differences into their semantic nets accordingly. THE KEY THING is that there would be a common body of law ALL are held accountable to. If laws are violated, appropriate consequences would be enforced by the wider society.
The law/ethic corpus would be the playbook that all AIs (and people) “agree to” as a precondition to being a member of a civil society. The law can and should morph over time, but only by means of rational discourse and checks and balances similar to current human law systems, albeit using much more rational and efficient mechanisms.
Hopefully underspecification won’t be a serious problem. AIs should have a good grasp of human psychology, common sense, and ready access to lots of case precedents. As good judges/rational agents they should abide by such precedents until better legislation is enacted. They should have a better understanding of ‘cause to’ and ‘negligence’ concepts than I (a non-lawyer) do :-). If AIs and humans find themselves constantly in violation of negligence laws, such laws should be revised, or the associated punishments reduced or increased as appropriate.
WRT “general laws” and “specific laws,” my feeling is that the former provide the legislative backbone of the system, e.g., Bill of Rights, Golden Rule, Meta-Golden Rule, Asimov’s Laws, … The latter do the heavy lifting of clarifying how law applies in practical contexts and the ever- gnarly edge cases.
This is a very timely question for me. I asked something very similar of Michael Vassar last week. He pointed me to Eliezer’s “Creating Friendly AI 1.0” paper and, like you, I didn’t find the answer there.
I’ve wondered if the Field of Law has been considered as a template for a solution to FAI—something along the lines of maintaining a constantly-updating body of law/ethics on a chip. I’ve started calling it “Asimov’s Laws++.” Here’s a proposal I made on the AGI discussion list in December 2009:
“We all agree that a few simple laws (ala Asimov) are inadequate for guiding AGI behavior. Why not require all AGIs be linked to a SINGLE large database of law—legislation, orders, case law, pending decisions—to account for the constant shifts [in what’s prohibited and what’s allowed]? Such a corpus would be ever-changing and reflect up-to-the-minute legislation and decisions on all matters man and machine. Presumably there would be some high level guiding laws, like the US Constitution and Bill of Rights, to inform the sub-nanosecond decisions. And when an AGI has miliseconds to act, it can inform its action using analysis of the deeper corpus. Surely a 200 volume set of international law would be a cakewalk for an AGI. The latest version of the corpus could be stored locally in most AGIs and just key parts local in low end models—with all being promptly and wirelessly updated as appropriate.
This seems like a reasonable solution given the need to navigate in a complex, ever changing, context-dependent universe.”
Given this approach, AIs’ goals and motivations might be mostly decoupled from an ethics module. An AI could make plans and set goals using any cognitive processes it deems fit. However, before taking actions, the AI must check the corpus to make sure it’s desired actions are legal. If they are not legal, the AI must consider other actions or suffer the wrath of law enforcement (from fines to rehabilitation). This legal system of the future would be similar to what we’re familiar with today, including being managed as a collaborative process between lots of agents (human and machine citizens, legislators, judges, and enforcers). Unlike current legal systems, however, it could hopefully be more nimble, fair, and effective given emerging computer-related technologies and methods (e.g, AI, WiFi, ubiquitous sensors, cheap/powerful processors, decision theory, Computational Law, …).
This seems like a potentially practical, flexible, and effective approach given its long history of human precedent. AIs could even refer to the appropriate corpus when traveling in different jurisdictions (e.g., Western Law, Islamic Law, Chinese Law) in advance of more universal laws/ethics that might emerge in the future.
This approach should make most runaway paper clip production scenarios off limits. Such behavior would seem to violate a myriad of laws (human welfare, property rights, speeding (?)) and would be dealt with harshly.
Perhaps this might be seen as a kind of practical implementation of CEV?
Complex problems require complex solutions.
Comments? Pointers?
Sure. Getting appropriate new laws enacted is an important element. From the paper:
I’d say the EU AI Act (and similar) work addresses the “new laws” imperative. (I won’t comment (much) on pros and cons of its content. In general, it seems pretty good. I wonder if they considered adding Etzioni’s first law to the mix, “An AI system must be subject to the full gamut of laws that apply to humans”? That is what I meant by “adopting existing bodies of law to implement AISVL.” The item in the EU AI Act about designing generative AIs to not generate illegal content is related.)
The more interesting work will be on improving legal processes along the dimensions listed above. And really interesting will be, as AIs get more autonomous and agentic, the “instilling” part where AIs must dynamically recognize and comply with the legal-moral corpora appropriate to the contexts they find themselves in.