This is a very timely question for me. I asked something very similar of Michael Vassar last week. He pointed me to Eliezer’s “Creating Friendly AI 1.0” paper and, like you, I didn’t find the answer there.
I’ve wondered if the Field of Law has been considered as a template for a solution to FAI—something along the lines of maintaining a constantly-updating body of law/ethics on a chip. I’ve started calling it “Asimov’s Laws++.” Here’s a proposal I made on the AGI discussion list in December 2009:
“We all agree that a few simple laws (ala Asimov) are inadequate for guiding AGI behavior. Why not require all AGIs be linked to a SINGLE large database of law—legislation, orders, case law, pending decisions—to account for the constant shifts [in what’s prohibited and what’s allowed]? Such a corpus would be ever-changing and reflect up-to-the-minute legislation and decisions on all matters man and machine. Presumably there would be some high level guiding laws, like the US Constitution and Bill of Rights, to inform the sub-nanosecond decisions. And when an AGI has miliseconds to act, it can inform its action using analysis of the deeper corpus. Surely a 200 volume set of international law would be a cakewalk for an AGI. The latest version of the corpus could be stored locally in most AGIs and just key parts local in low end models—with all being promptly and wirelessly updated as appropriate.
This seems like a reasonable solution given the need to navigate in a complex, ever changing, context-dependent universe.”
Given this approach, AIs’ goals and motivations might be mostly decoupled from an ethics module. An AI could make plans and set goals using any cognitive processes it deems fit. However, before taking actions, the AI must check the corpus to make sure it’s desired actions are legal. If they are not legal, the AI must consider other actions or suffer the wrath of law enforcement (from fines to rehabilitation). This legal system of the future would be similar to what we’re familiar with today, including being managed as a collaborative process between lots of agents (human and machine citizens, legislators, judges, and enforcers). Unlike current legal systems, however, it could hopefully be more nimble, fair, and effective given emerging computer-related technologies and methods (e.g, AI, WiFi, ubiquitous sensors, cheap/powerful processors, decision theory, Computational Law, …).
This seems like a potentially practical, flexible, and effective approach given its long history of human precedent. AIs could even refer to the appropriate corpus when traveling in different jurisdictions (e.g., Western Law, Islamic Law, Chinese Law) in advance of more universal laws/ethics that might emerge in the future.
This approach should make most runaway paper clip production scenarios off limits. Such behavior would seem to violate a myriad of laws (human welfare, property rights, speeding (?)) and would be dealt with harshly.
Perhaps this might be seen as a kind of practical implementation of CEV?
You can’t be serious. Human lawyers find massive logical loopholes in the law all the time, and at least their clients aren’t capable of immediately taking over the world given the opportunity.
The swift genie-like answer: the paperclip maximser would prioritise nobbling the Supreme Court and relevant legislatures. Or just controlling the pen that wrote the laws, if that could be acceptable within the failsafe.
More generally, I don’t think it work. First, there’s a problem of underspecification. Laws require constant interpretation of case law, including a lot of ‘common sense’ type verdicts. We can’t assume AI would read them in the way we do.
Second, they rely on key underlying concepts such as ‘cause to’ and ‘negligence’ that rely on a reasonable person’s expectation. If we ask if a reasonable superintelligent AI knew that some negative/illegal consequences would occur from its act, then the result would nearly always be yes, thus opening it to breaking laws of negligance.
I think there are two types of law, neither of which are suitable.
Specific laws: e.g. no speeding, no stealing
These would mostly not apply, as they ban humans from doing things humans can do and wish to do. Neither would be likely to apply to AI
General laws: uphold life, liberty and the pursuit of happiness
These aren’t failsafes, they’re the underlying utlity-maximiser
You seem to imply that AIs motivations will be substantially humanlike. Why might AIs be motivated to nobble the courts, control pens, overturn vast segments of law, find loopholes, and engage in other such humanlike gamesmanship? Sounds like malicious programming to me.
They should be designed to treat the law as a fundamental framework to work within, akin to common sense, physical theories, and other knowledge they will accrue and use over the course of their operation.
I was glib in my post suggesting that “before taking actions, the AI must check the corpus to make sure it’s desired actions are legal.” Presumably most AIs would compile the law corpus into their own knowledge bases, perhaps largely integrated with other knowledge they rely on. Thus they could react more quickly during decision making and action. They would be required/wired, however, to be reasonably up to the minute on all changes to the law and grok the differences into their semantic nets accordingly. THE KEY THING is that there would be a common body of law ALL are held accountable to. If laws are violated, appropriate consequences would be enforced by the wider society.
The law/ethic corpus would be the playbook that all AIs (and people) “agree to” as a precondition to being a member of a civil society. The law can and should morph over time, but only by means of rational discourse and checks and balances similar to current human law systems, albeit using much more rational and efficient mechanisms.
Hopefully underspecification won’t be a serious problem. AIs should have a good grasp of human psychology, common sense, and ready access to lots of case precedents. As good judges/rational agents they should abide by such precedents until better legislation is enacted. They should have a better understanding of ‘cause to’ and ‘negligence’ concepts than I (a non-lawyer) do :-). If AIs and humans find themselves constantly in violation of negligence laws, such laws should be revised, or the associated punishments reduced or increased as appropriate.
WRT “general laws” and “specific laws,” my feeling is that the former provide the legislative backbone of the system, e.g., Bill of Rights, Golden Rule, Meta-Golden Rule, Asimov’s Laws, … The latter do the heavy lifting of clarifying how law applies in practical contexts and the ever- gnarly edge cases.
I understand where you’re coming from– indeed, the way you’re imagining what an AI would do is fundamentally ingrained in human minds, and it can be quite difficult to notice the strong form of anthropomorphism it entails.
Scattered across Less Wrong are the articles that made me recognize and question some relevant background assumptions; the references in Fake Fake Utility Functions (sic) are a good place to begin.
EDITED TO ADD: In particular, you need to stop thinking of an AI as acting like either a virtuous human being or a vicious human being, and imagining that we just need to prevent the latter. Any AI that we could program from scratch (as opposed to uploading a human brain) would resemble any human far less in xer thought process than any two humans resemble each other.
Thanks for the links. I’ll try to make time to check them out more closely.
I had previously skimmed a bunch of lesswrong content and didn’t find anything that dissuaded me from the Asimov’s Laws++ idea. I was encouraged by the first post in the Metaethics Sequence where Eliezer warns about not “trying to oversimplify human morality into One Great Moral Principle.” The law/ethics corpus idea certainly doesn’t do that!
RE: your first and final paragraphs: If I had to characterize my thoughts on how AIs will operate, I’d say they’re likely to be eminently rational. Certainly not anthropomorphized as virtuous or vicious human beings. They will crank the numbers, follow the rules, run the simulations, do the math, play the odds as only machines can. Probably (hopefully?) they’ll have little of the emotional/irrational baggage we humans have been selected to have. Given that, I don’t see much motivation for AIs to fixate on gaming the system. They should be fine with following and improving the rules as rational calculus dictates, subject to the aforementioned checks and balances. They might make impeccable legislators, lawyers, and judges.
I wonder if this solution was dismissed too early by previous analysts due some kind of “scale bias?” The idea of having only 3 or 4 or 5 (Asimov) Laws for FAI is clearly flawed. But scale that to a few hundred thousand or a million, and it might work. No?
Given that, I don’t see much motivation for AIs to fixate on gaming the system.
Motivation? It’s not as if most AIs would have a sense that gaming a rule system is “fun”, but rather it would be the most efficient way to achieve its goals. Human beings don’t usually try to achieve one of their consciously stated goals with maximum efficiency, at any cost, to an unbounded extent. That’s because we actually have a fairly complicated subconscious goal system which overrides us when we might do something too dumb in pursuit of our conscious goals. This delicate psychology is not, in fact, the only or the easiest way one could imagine to program an artificial intelligence.
Here’s a fictional but still useful idea of a simple AI; note that no matter how good it becomes at predicting consequences and at problem-solving, it will not care that the goal it’s been given is a “stupid” one when pursued at all costs.
To take a less fair example, Lenat’s EURISKO was criticized for finding strategies that violated the ‘spirit’ of the strategy games it played- not because it wanted to be a munchkin, but simply because that was the most efficient way to succeed. If that AI had been in charge of an actual military, giving it the wrong goals might have led to it cleverly figuring out the strategy like killing its own civilians to accomplish a stated objective- not because it was “too dumb”, but because its goal system was too simple.
For this reason, giving an AI simple goals but complicated restrictions seems incredibly unsafe, which is why SIAI’s approach is figuring out the correct complicated goals.
For this reason, giving an AI simple goals but complicated restrictions seems incredibly unsafe, which is why SIAI’s approach is figuring out the correct complicated goals.
Tackling FAI by figuring out complicated goals doesn’t sound like a good program to me, but I’d need to dig into more background on it. I’m currently disposed to prefer “complicated restrictions,” or more specifically this codified ethics/law approach.
In your example of a stamp collector run amok, I’d say it’s fine to give an agent the goal of maximizing the number of stamps it collects. Given an internal world model that includes the law/ethics corpus, it should not hack into others’ computers, steal credit card numbers, and appropriate printers to achieve its goal. And if it does (a) Other agents should array against it to prevent the illegal behaviors, and (b) It will be held accountable for those actions.
The EURISKO example seems better to me. The goal of war (defeat one’s enemies) is particularly poignant and much harder to ethically navigate. If the generals think sinking their own ships to win the battle/war is off limits they may have to write laws/rules that forbid it. The stakes of war are particularly high and figuring out the best (ethical?) rules is particularly important and difficult. Rather than banning EURISKO from future war games given its “clever” solutions, it would seem the military could continue to learn from it and amend the laws as necessary. People still debate whether Truman dropping the bomb on Hiroshima was the right decision. Now there’s some tough ethical calculus. Would an ethical AI do better or worse?
Legal systems are what societies currently rely on to protect public liberties and safety. Perhaps an SIAI program can come up with a completely different and better approach. But in lieu of that, why not leverage Law? Law = Codified Ethics.
Again, it’s not only about having lots of rules. More importantly it’s about the checks and balances and enforcement the system provides.
Legal systems are what societies currently rely on to protect public liberties and safety. Perhaps an SIAI program can come up with a completely different and better approach. But in lieu of that, why not leverage Law? Law = Codified Ethics.
When they work well, human legal systems work because they are applied only to govern humans. Dealing with humans and predicting human behavior is something that humans are pretty good at. We expect humans to have a pretty familiar set of vices and virtues.
Human legal systems are good enough for humans, but simply are not made for any really alien kind of intelligence. Our systems of checks and balances are set up to fight greed and corruption, not a disinterested will to fill the universe with paperclips.
I submit that current legal systems (or something close) will apply to AIs. And there will be lots more laws written to apply to AI-related matters.
It seems to me current laws already protect against rampant paperclip production. How could an AI fill the universe with paperclips without violating all kinds of property rights, probably prohibitions against mass murder (assuming it kills lots of humans as a side effect), financial and other fraud to aquire enough resources, etc. I see it now: some DA will serve a 25,000 count indictment. That AI will be in BIG trouble.
Or say in a few years technology exists for significant matter transmutation, highly capable AIs exist, one misguided AI pursues a goal of massive paperclip production, and it thinks it found a way to do it without violating existing laws. The AI probably wouldn’t get past converting a block or two in New Jersey before the wider public and legislators wake up to the danger and rapidly outlaw that and related practices. More likely, technologies related to matter transmutation will be highly regulated before an episode like that can occur.
How could an AI fill the universe with paperclips without violating all kinds of property rights, probably prohibitions against mass murder (assuming it kills lots of humans as a side effect), financial and other fraud to aquire enough resources, etc...[?])
I have no idea myself, but if I had the power to exponentially increase my intelligence beyond that of any human, I bet I could figure something out.
The law has some quirks. I’d suggest that any system of human law necessarily has some ambiguities, confusions and, internal contradictions. Laws are composed largely of leaky generalizations. When the laws regulate mere humans, we tend to get by, tolerating a certain amount of unfairness and injustice.
For example, I’ve seen a plausible argument that “there is a 50-square-mile swath of Idaho in which one can commit felonies with impunity. This is because of the intersection of a poorly drafted statute with a clear but neglected constitutional provision: the Sixth Amendment’s Vicinage Clause.”
There’s also a story about Kurt Gödel nearly blowing his U.S. citizenship hearing by offering his thoughts on how to hack the U.S. Constitution to “allow the U.S. to be turned into a dictatorship.”
How could an AI fill the universe with paperclips without violating all kinds of property rights...financial and other fraud to aquire enough resources
After reading that line I checked the date of the post to see if perhaps it was from 2007 or earlier.
Yes. When (a substantial, influential fraction of the populations of) two countries hate each other so much that they accept large costs to inflict them larger costs, demand extremely lopsided treaties if they’re willing to negotiate at all, and have runaway “I hate the enemy more than you!” contests among themselves. When a politician in one country who’s willing to negotiate somewhat more is killed by someone who panics at the idea they might give the enemy too much. When someone considers themselves enlightened for saying “Oh, I’m not like my friends. They want them all to die. I just want them to go away and leave us alone.”.
First of all, it’s not clear that individual apparently non-Pareto-optimal actions in isolation are evidence of irrationality or non-Pareto optimal behavior on a larger scale. This is particularly often the case when the “lose-lose” behavior involves threats, commitments, demonstrating willingness to carry through, etc
Second of all, “someone who panics at the idea they might give the enemy too much” implies, or at least leaves open, the possibility that the ultimate concern is losing something ultimately valuable that is being given, rather than the ultimate goal being the defeat of the enemies. Likewise “demand extremely lopsided treaties if they’re willing to negotiate at all”, which implies strongly that they are seeking something other than the defeat of foes.
When someone considers themselves enlightened for saying “Oh, I’m not like my friends. They want them all to die. I just want them to go away and leave us alone.”.
One point of mine is that this “enlightened” statement may actually be the extrapolated volition of even those who think they “want them all to die”. It’s pretty clear how for the “enlightened” person, the unenlightened value set could be instrumentally useful.
Most of all, war was characterized as being something that had the ultimate/motivating goal of defeating enemies. I object that it isn’t, but please recognize I go far beyond what I would need to assert to show that when I ask for examples of war ever being something driven by the ultimate goal of defeating enemies. Showing instances in which wars followed the pattern would only be the beginning of showing war in general is characterized by that goal.
I similarly would protest if someone said “the result of addition is the production of prime numbers, it is the defining characteristic of addition”. I would in that case not ask for counterexamples, but would use other methods to show that no, that isn’t a defining characteristic of addition nor is it the best way to talk about addition. Of course, some addition does result in prime numbers.
I agree there could be such a war, but I don’t know that there have ever been any, and highlighting this point is an attempt to at least show that any serious doubt can only be about whether war ever is characterized by having the ultimate goal of defeating enemies; there can be no doubt that war in general does not have as its motivating goal the defeat of one’s enemies.
I am aware of ignoring threats, using uncompromisable principles to get an advantage in negotiations, breaking your receiver to decide on a meeting point, breaking your steering wheel to win at Chicken, etc. I am also aware of the theorem that says even if there is a mutually beneficial trade, there are cases where selfish rational agents refuse to trade, and that the theorem does not go away when the currency they use is thousands of lives. I still claim that the type of war I’m talking about doesn’t stem from such calculations; that people on side A are willing to trade a death on side A for a death on side B, as evidenced by their decisions, knowing that side B is running the same algorithm.
A non-war exemple is blood feuds; you know that killing a member of family B who killed a member of family A will only lead to perpetuating the feud, but you’re honor-bound to do it. Now, the concept of honor did originate from needing to signal a commitment to ignore status exortion, and (in the absence of relatively new systems like courts of law) unilaterally backing down would hurt you a lot—but honor acquired a value of its own, independently from these goals. (If you doubt it, when France tried to ban duels and encourage trials, it used a court composed of war heroes who’d testified the plaintiff wasn’t dishonourable for refusing to duel.)
Second of all, “someone who panics at the idea they might give the enemy too much” implies, or at least leaves open, the possibility that the ultimate concern is losing something ultimately valuable that is being given, rather than the ultimate goal being the defeat of the enemies.
Plausible, but not true of the psychology of this particular case.
Likewise “demand extremely lopsided treaties if they’re willing to negotiate at all”, which implies strongly that they are seeking something other than the defeat of foes.
Well obviously they aren’t foe-deaths-maximizers. It’s just that they’re willing to trade off a lot of whatever-they-went-to-war-for-at-first in order to annoy the enemy.
One point of mine is that this “enlightened” statement may actually be the extrapolated volition of even those who think they “want them all to die”.
The person who said that was talking about a war where it’s quite unrealistic to think any side would go away (as with all wars over inhabited territory). Genociding the other side would be outright easier.
war was characterized as being something that had the ultimate/motivating goal of defeating enemies. I object that it isn’t
Agree it isn’t. I don’t even think anyone starts a war with that in mind—war is typically a game of Chicken. I’m pointing out a failure that leads from “I’m going to instill my supporters with an irrational burning hatred of the enemy, so that I can’t back down, so that they have to” to “I have an irrational burning hatred of the enemy! I’ll never let them back down, that’d let them off too easily!”.
I agree there could be such a war, but I don’t know that there have ever been any
Care to guess which war in particular I was thinking of? (By PM if it’s too political.) I think it applies to any entrenched conflict where the identify as enemies of the and have done so for several generations, but I do have a prototype. Hints:
The “enlightened” remark won’t help, it was in a (second-hand, but verbatim quote) personal conversation.
The politician will.
The “personal conversation” and “political” bits indicate it can’t be too old.
Plausible, but not true of the psychology of this particular case.
I’ll go along, but don’t forget my original point was that this psychology does not universally characterize war.
Well obviously they aren’t foe-deaths-maximizers. It’s just that they’re willing to trade off a lot of whatever-they-went-to-war-for-at-first in order to annoy the enemy.
Good point, you are right about that.
The person who said that was talking about a war where it’s quite unrealistic to think any side would go away (as with all wars over inhabited territory). Genociding the other side would be outright easier.
I don’t understand what you mean to imply by this. It may still be useful to be hateful and think genocide is an ultimate goal. If one is unsure whether it is better to swerve left or swerve right to avoid an accident, ignorant conviction that only swerving right can save you may be more useful than true knowledge that swerving right is the better bet to save you. Even if the indifferent person personally favored genocide and it was optimal in a sense, such an attitude would be more common among hateful people.
Agree it isn’t. I don’t even think anyone starts a war with that in mind—war is typically a game of Chicken. I’m pointing out a failure that leads from “I’m going to instill my supporters with an irrational burning hatred of the enemy, so that I can’t back down, so that they have to” to “I have an irrational burning hatred of the enemy! I’ll never let them back down, that’d let them off too easily!”.
Hmm I think it’s enough for me if no one ever starts a war with that in mind, even if my original response was broader than that. Then at some point in every war, defeating the enemy is not an ultimate goal. This sufficiently disentangles “defeat of the enemy” from war and shows they are not tightly associated, which is what I wanted to say.
The “enlightened” remark won’t help, it was in a (second-hand, but verbatim quote) personal conversation.
I’m puzzled as to why you thought it would help, if first hand.
The politician will.
When (a substantial, influential fraction of the populations of) two countries hate each other so much that they accept large costs to inflict them larger costs, demand extremely lopsided treaties if they’re willing to negotiate at all, and have runaway “I hate the enemy more than you!” contests among themselves.
When a politician in one country who’s willing to negotiate somewhat more is killed by someone who panics at the idea they might give the enemy too much.
“Too much” included weapons and...I’m not seeing the hate.
“The word “peace” is, to me, first of all peace within the nation. You must love your [own people] before you can love others. The concept of peace has been turned into a destructive instrument with which anything can be done. I mean, you can kill people, abandon people [to their fate], close Jews into ghettos and surround them with Arabs, give guns to the army [Palestinian Police], establish a [Palestinian] army, and say: this is for the sake of peace. You can release Hamas terrorists from prison, free murderers with blood on their hands, and everything in the framework of peace.
“It wasn’t a matter of revenge, or punishment, or anger, Heaven forbid, but what would stop [the Oslo process],” he told the authors. “I thought about it a lot and understood that if I took Rabin down, that’s what would stop it.”
“What about the tragedy you caused your family?” he was asked.
“My considerations were that in the long run, my family would also be saved. I mean, if [the peace process] continued, my family would be ruined too. Do you understand what I’m saying? The whole country would be ruined. I thought about this for two years, and I calculated the possibilities and the risks. If I hadn’t done it, I would feel much worse. My deed will be understood in the future. I saved the people of Israel from destruction.”
I don’t understand what you mean to imply by this.
That wanting to be left alone is an unreasonable goal.
I’m puzzled as to why you thought it would help, if first hand.
I don’t.
Yeah, that was easy. :)
Your link is paywalled, though the text can be found easily elsewhere.
I’m… extremely surprised. I have read stuff Amir said and wrote, but I haven’t read this book. I have seen other people exhibit the hatred I speak of, and I sorta assumed it fit in with the whole “omg he’s giving our land to enemies gotta kill him” thing. It does involve accepting only very stringent conditions for peace, but I completely misunderstood the psychology… so he really murdered someone out of a cold sense of duty. I thought he just thought Rabin was a bad guy and looked for a fancy Hebrew word for “bad guy” as an excuse to kill him, but he was entirely sincere. Yikes.
That wanting to be left alone is an unreasonable goal.
I’m not sure what “left alone” means, exactly. I think I disagree with some plausible meanings and agree with others.
have runaway “I hate the enemy more than you!” contests among themselves
I think the Israeli feeling towards Arabs is better characterized as “I just want them to go away and leave us alone,” and if you asked this person’s friends they would deny hating and claim “I just want them to go away and leave us alone,” possibly honestly, possibly truthfully.
It does involve accepting only very stringent conditions for peace,
I think different segments of Israeli society have different non-negotiable conditions and weights for negotiable ones, and only the combination of them all is so inflexible. One can say about any subset that, granted the world as it is, including other segments of society, their demands are temporally impossible to meet from resources available.
Biblical Israel did not include much of modern Israel, including coastal and inland areas surrounding Gaza, coastal areas in the north and, the desert in the south. It did include territory not part of modern Israel, the areas surrounding the Golan and areas on the east bank of the Jordan river, and its core was the land on the west bank of the Jordan river. It would not be at all hard to induce the Israeli right to give up on acquiring southeast Syria, etc. even though it was once biblical Israel. Far harder is having them accede to losing entirely and being evicted from the land where Israel has political and military control, had the biblical states, and they are a minority population.
It might not be difficult to persuade the right to make many concessions the Israeli left or other countries would never accept. Examples include “second class citizenship” in the colloquial sense i.e. permanent non-citizen metic status for non-Jews, paying non-Jews to leave, or even giving them a state in what was never biblical Israel where Jews now live and evicting Jews resident there, rather than give non-Jews a state where they now are the majority population in what was once biblical Israel. The left would not look kindly upon such a caste system, forced transfer, soft genocide of paying a national group to disperse, or evicting majority populations to conform to biblical history.
I think it is only the Israeli right+Israeli left conditions for peace that are so stringent, and so I reject the formulation “it does involve accepting only very stringent conditions for peace” as a characterization of either the Israeli left or right, though not them in combination. To say it of the right pretends liberal conclusions (that I happen to have) are immutable.
I think different segments of Israeli society have different non-negotiable conditions and weights for negotiable ones, and only the combination of them all is so inflexible.
Mostly agreed, though I don’t think it’s the right way of looking at the problem—you want to consider all the interactions between the demands of each Israeli subgroup (also, groups of Israel supporters abroad) and the demands of each Palestinian subgroup (also, surrounding Arab countries).
I reject the formulation “it does involve accepting only very stringent conditions for peace” as a characterization of either the Israeli left or right
I meant just Yigal Amir. I’m pretty sure the guy wasn’t particularly internally divided.
Probably, but one ought to consider what policies he would endure that he would not have met with vigilante violence. I may have the most irrevocable possible opposition to, say, the stimulus bill’s destruction of inefficient car engines when replacing the engines would be even less efficient by every metric than continuing to run the old engine, a crude confluence of the broken window fallacy and lost purposes, but no amount of that would make me kill anybody.
Hmm… interesting ideas. I don’t intend to suggest that the AI would have human intentions at all, I think we might be modelling the idea of a failsafe in a different way.
I was assuming that the idea was an AI with a separate utility-maximising system, but to also make it follow laws as absolute, inviolable rules, thus stopping unintended consequences from the utility maximisation. In this system, the AI would ‘want’ to pursue its more general goal and the laws would be blocks. As such, it would find other ways to pursue its goals, including changing the laws themselves.
If the corpus of laws instead form part of what the computer is trying to achieve/uphold we face different problems. Firstly, laws are prohibitions and it’s not clear how to ‘maximise’ them beyond simple obedience. Unless it’s stopping other people breaking them in a Robocop way. Second, failsafes are needed because even ‘maximise human desire satisfaction’ can throw up lots of unintended results. An entire corpus of law would be far more unpredictable in its effects as a core programme, and thus require even more failsafes!
On a side point, my argument about cause, negligence etc. was not that the computer would fail to understand them, but that as regards a superintelligence, they could easily be either meaningless or over-effective.
For an example of the latter, if we allow someone to die, that’s criminal negligence. This is designed for walking past drowning people and ignoring them etc. A law-abiding computer might calculate, say, that even with cryonics etc, every life will end in death due to the universe’s heat death. It might then sterilise the entire human population to avoid new births, as each birth would necessitate a death. And so on. Obviously this would clash with other laws, but that’s part of the problem: every action would involve culpability in some way, due to greater knowledge of consequences.
The laws might be appropriately viewed primarily as blocks that keep the AI from taking actions deemed unacceptable by the collective. AIs could pursue whatever goals they sees fit within the constraints of the law.
However, the laws wouldn’t be all prohibitions. The “general laws” would be more prescriptive, e.g., life, liberty, justice for all. The “specific laws” would tend to be more prohibition oriented. Presumably the vast majority of them would be written to handle common situations and important edge cases. If someone suspects the citizenry may be at jeopardy of frequent runaway trolly incidents, the legislature can write statutes on what is legal to throw under the wheels to prevent deaths of (certain configurations of) innocent bystanders. Probably want to start with inanimate objects before considering sentient robots, terminally sick humans, fat men, puppies, babies, and whatever. (It might be nice to have some clarity on this! :-))
To explore your negligence case example, I imagine some statute might require agents to rescue people in imminent danger of losing their lives if possible, subject to certain extenuating cicumstances. The legislature and public can have a lively debate about whether this law still makes sense in a future where dead people can be easily reanimated or if human life is really not valuable in the grand scheme of things. If humans have good representatives in the legistature and/or good a few good AI advocates, mass human extermination shouldn’t be a problem, at least until the consensus shifts in such directions. Perhaps some day there may be a consensus on forced sterilizations to prevent greater harms. I’d argue such a system of laws should be able to handle it. The key seems to be to legislate prescriptions and prohibitions relevant to current state of society and change them as the facts on the ground change. This would seem to get around the impossibility of defining eternal laws or algorithms that are ever-true in every possible future state.
I still don’t see how laws as barriers could be effective. People are arguing whether it’s possible to write highly specific failsafe rules capable of acting as barriers, and the general feeling is that you wouldn’t be able to second-guess the AI enough to do that effectively. I’m not sure what replacing these specific laws with a large corpus of laws achieves. On the plus side, you’ve got a large group of overlapping controls that might cover each others’ weaknesses. But they’re not specially written with AI in mind and even if they were, small political shifts could lead to loopholes opening. And the number also means that you can’t clearly see what’s permitted or not: it risks an illusion of safety simply because we find it harder to think of something bad an AI could do that doesn’t break any law.
Not to mention the fact that a utility-maximising AI would seek to change laws to make them better for humans, so the rules controlling the AI would be a target of their influence.
I guess here I’d reiterate this point from my latest reply to orthonormal:
Again, it’s not only about having lots of rules. More importantly it’s about the checks and balances and enforcement the system provides.
It may not be helpful to think of some grand utility-maximising AI that constantly strives to maximize human happiness or some other similar goals, and can cause us to wake up in some alternate reality some day. It would be nice to have some AIs working on how to maximize some things human’s value, e.g., health, happiness, attractive and sensible shoes. If any of those goals would appear to be impeded by current law, the AI would lobby it’s legislator to amend the law. And in a better future, important amendments would go through rigorous analysis in a few days, better root out unintended consequences, and be enacted as quickly as prudent.
Many legal systems have all sorts of laws that are vague or even contradictory. Sometimes laws are on the books and are just no longer enforced. Many terms in laws are also ill-defined, sometimes deliberately so. Having an AI try to have almost anything to do with them is a recipe for disaster or comedy (most likely both).
We would probably start with current legal systems and remove outdated laws, clarify the ill-defined, and enact a bunch of new ones. And our (hyper-)rational AI legislators, lawyers, and judges should not be disposed to game the system. AI and other emerging technologies should both enable and require such improvements.
Surely a 200 volume set of international law would be a cakewalk for an AGI.
It seems like an applause light to invoke international law as a solution to almost anything, particularly this problem. What aspect of having rules made in a compromise of politicing makes it less likely to have exploitable loopholes than any other system?
If they are not legal, the AI must consider other actions or suffer the wrath of law enforcement (from fines to rehabilitation).
Fines? The misdoing we’re worried about is seizing power. Fines would require power sufficient to punish an AI after its misdoings, and have nothing to do with programming it not to be harmful.
AIs could even refer to the appropriate corpus when traveling in different jurisdictions (e.g., Western Law, Islamic Law, Chinese Law)
Somehow I do’t think the solution to the problem of having powerful AIs that don’t care about us (for better or worse) is to teach them Islamic law.
This is a very timely question for me. I asked something very similar of Michael Vassar last week. He pointed me to Eliezer’s “Creating Friendly AI 1.0” paper and, like you, I didn’t find the answer there.
I’ve wondered if the Field of Law has been considered as a template for a solution to FAI—something along the lines of maintaining a constantly-updating body of law/ethics on a chip. I’ve started calling it “Asimov’s Laws++.” Here’s a proposal I made on the AGI discussion list in December 2009:
“We all agree that a few simple laws (ala Asimov) are inadequate for guiding AGI behavior. Why not require all AGIs be linked to a SINGLE large database of law—legislation, orders, case law, pending decisions—to account for the constant shifts [in what’s prohibited and what’s allowed]? Such a corpus would be ever-changing and reflect up-to-the-minute legislation and decisions on all matters man and machine. Presumably there would be some high level guiding laws, like the US Constitution and Bill of Rights, to inform the sub-nanosecond decisions. And when an AGI has miliseconds to act, it can inform its action using analysis of the deeper corpus. Surely a 200 volume set of international law would be a cakewalk for an AGI. The latest version of the corpus could be stored locally in most AGIs and just key parts local in low end models—with all being promptly and wirelessly updated as appropriate.
This seems like a reasonable solution given the need to navigate in a complex, ever changing, context-dependent universe.”
Given this approach, AIs’ goals and motivations might be mostly decoupled from an ethics module. An AI could make plans and set goals using any cognitive processes it deems fit. However, before taking actions, the AI must check the corpus to make sure it’s desired actions are legal. If they are not legal, the AI must consider other actions or suffer the wrath of law enforcement (from fines to rehabilitation). This legal system of the future would be similar to what we’re familiar with today, including being managed as a collaborative process between lots of agents (human and machine citizens, legislators, judges, and enforcers). Unlike current legal systems, however, it could hopefully be more nimble, fair, and effective given emerging computer-related technologies and methods (e.g, AI, WiFi, ubiquitous sensors, cheap/powerful processors, decision theory, Computational Law, …).
This seems like a potentially practical, flexible, and effective approach given its long history of human precedent. AIs could even refer to the appropriate corpus when traveling in different jurisdictions (e.g., Western Law, Islamic Law, Chinese Law) in advance of more universal laws/ethics that might emerge in the future.
This approach should make most runaway paper clip production scenarios off limits. Such behavior would seem to violate a myriad of laws (human welfare, property rights, speeding (?)) and would be dealt with harshly.
Perhaps this might be seen as a kind of practical implementation of CEV?
Complex problems require complex solutions.
Comments? Pointers?
You can’t be serious. Human lawyers find massive logical loopholes in the law all the time, and at least their clients aren’t capable of immediately taking over the world given the opportunity.
Thanks for the comments. See my response to DavidAgain re: loophole-seeking AIs.
The swift genie-like answer: the paperclip maximser would prioritise nobbling the Supreme Court and relevant legislatures. Or just controlling the pen that wrote the laws, if that could be acceptable within the failsafe.
More generally, I don’t think it work. First, there’s a problem of underspecification. Laws require constant interpretation of case law, including a lot of ‘common sense’ type verdicts. We can’t assume AI would read them in the way we do. Second, they rely on key underlying concepts such as ‘cause to’ and ‘negligence’ that rely on a reasonable person’s expectation. If we ask if a reasonable superintelligent AI knew that some negative/illegal consequences would occur from its act, then the result would nearly always be yes, thus opening it to breaking laws of negligance.
I think there are two types of law, neither of which are suitable.
Specific laws: e.g. no speeding, no stealing
These would mostly not apply, as they ban humans from doing things humans can do and wish to do. Neither would be likely to apply to AI
General laws: uphold life, liberty and the pursuit of happiness
These aren’t failsafes, they’re the underlying utlity-maximiser
Thanks for the thoughts.
You seem to imply that AIs motivations will be substantially humanlike. Why might AIs be motivated to nobble the courts, control pens, overturn vast segments of law, find loopholes, and engage in other such humanlike gamesmanship? Sounds like malicious programming to me.
They should be designed to treat the law as a fundamental framework to work within, akin to common sense, physical theories, and other knowledge they will accrue and use over the course of their operation.
I was glib in my post suggesting that “before taking actions, the AI must check the corpus to make sure it’s desired actions are legal.” Presumably most AIs would compile the law corpus into their own knowledge bases, perhaps largely integrated with other knowledge they rely on. Thus they could react more quickly during decision making and action. They would be required/wired, however, to be reasonably up to the minute on all changes to the law and grok the differences into their semantic nets accordingly. THE KEY THING is that there would be a common body of law ALL are held accountable to. If laws are violated, appropriate consequences would be enforced by the wider society.
The law/ethic corpus would be the playbook that all AIs (and people) “agree to” as a precondition to being a member of a civil society. The law can and should morph over time, but only by means of rational discourse and checks and balances similar to current human law systems, albeit using much more rational and efficient mechanisms.
Hopefully underspecification won’t be a serious problem. AIs should have a good grasp of human psychology, common sense, and ready access to lots of case precedents. As good judges/rational agents they should abide by such precedents until better legislation is enacted. They should have a better understanding of ‘cause to’ and ‘negligence’ concepts than I (a non-lawyer) do :-). If AIs and humans find themselves constantly in violation of negligence laws, such laws should be revised, or the associated punishments reduced or increased as appropriate.
WRT “general laws” and “specific laws,” my feeling is that the former provide the legislative backbone of the system, e.g., Bill of Rights, Golden Rule, Meta-Golden Rule, Asimov’s Laws, … The latter do the heavy lifting of clarifying how law applies in practical contexts and the ever- gnarly edge cases.
I understand where you’re coming from– indeed, the way you’re imagining what an AI would do is fundamentally ingrained in human minds, and it can be quite difficult to notice the strong form of anthropomorphism it entails.
Scattered across Less Wrong are the articles that made me recognize and question some relevant background assumptions; the references in Fake Fake Utility Functions (sic) are a good place to begin.
EDITED TO ADD: In particular, you need to stop thinking of an AI as acting like either a virtuous human being or a vicious human being, and imagining that we just need to prevent the latter. Any AI that we could program from scratch (as opposed to uploading a human brain) would resemble any human far less in xer thought process than any two humans resemble each other.
Thanks for the links. I’ll try to make time to check them out more closely.
I had previously skimmed a bunch of lesswrong content and didn’t find anything that dissuaded me from the Asimov’s Laws++ idea. I was encouraged by the first post in the Metaethics Sequence where Eliezer warns about not “trying to oversimplify human morality into One Great Moral Principle.” The law/ethics corpus idea certainly doesn’t do that!
RE: your first and final paragraphs: If I had to characterize my thoughts on how AIs will operate, I’d say they’re likely to be eminently rational. Certainly not anthropomorphized as virtuous or vicious human beings. They will crank the numbers, follow the rules, run the simulations, do the math, play the odds as only machines can. Probably (hopefully?) they’ll have little of the emotional/irrational baggage we humans have been selected to have. Given that, I don’t see much motivation for AIs to fixate on gaming the system. They should be fine with following and improving the rules as rational calculus dictates, subject to the aforementioned checks and balances. They might make impeccable legislators, lawyers, and judges.
I wonder if this solution was dismissed too early by previous analysts due some kind of “scale bias?” The idea of having only 3 or 4 or 5 (Asimov) Laws for FAI is clearly flawed. But scale that to a few hundred thousand or a million, and it might work. No?
Motivation? It’s not as if most AIs would have a sense that gaming a rule system is “fun”, but rather it would be the most efficient way to achieve its goals. Human beings don’t usually try to achieve one of their consciously stated goals with maximum efficiency, at any cost, to an unbounded extent. That’s because we actually have a fairly complicated subconscious goal system which overrides us when we might do something too dumb in pursuit of our conscious goals. This delicate psychology is not, in fact, the only or the easiest way one could imagine to program an artificial intelligence.
Here’s a fictional but still useful idea of a simple AI; note that no matter how good it becomes at predicting consequences and at problem-solving, it will not care that the goal it’s been given is a “stupid” one when pursued at all costs.
To take a less fair example, Lenat’s EURISKO was criticized for finding strategies that violated the ‘spirit’ of the strategy games it played- not because it wanted to be a munchkin, but simply because that was the most efficient way to succeed. If that AI had been in charge of an actual military, giving it the wrong goals might have led to it cleverly figuring out the strategy like killing its own civilians to accomplish a stated objective- not because it was “too dumb”, but because its goal system was too simple.
For this reason, giving an AI simple goals but complicated restrictions seems incredibly unsafe, which is why SIAI’s approach is figuring out the correct complicated goals.
Tackling FAI by figuring out complicated goals doesn’t sound like a good program to me, but I’d need to dig into more background on it. I’m currently disposed to prefer “complicated restrictions,” or more specifically this codified ethics/law approach.
In your example of a stamp collector run amok, I’d say it’s fine to give an agent the goal of maximizing the number of stamps it collects. Given an internal world model that includes the law/ethics corpus, it should not hack into others’ computers, steal credit card numbers, and appropriate printers to achieve its goal. And if it does (a) Other agents should array against it to prevent the illegal behaviors, and (b) It will be held accountable for those actions.
The EURISKO example seems better to me. The goal of war (defeat one’s enemies) is particularly poignant and much harder to ethically navigate. If the generals think sinking their own ships to win the battle/war is off limits they may have to write laws/rules that forbid it. The stakes of war are particularly high and figuring out the best (ethical?) rules is particularly important and difficult. Rather than banning EURISKO from future war games given its “clever” solutions, it would seem the military could continue to learn from it and amend the laws as necessary. People still debate whether Truman dropping the bomb on Hiroshima was the right decision. Now there’s some tough ethical calculus. Would an ethical AI do better or worse?
Legal systems are what societies currently rely on to protect public liberties and safety. Perhaps an SIAI program can come up with a completely different and better approach. But in lieu of that, why not leverage Law? Law = Codified Ethics.
Again, it’s not only about having lots of rules. More importantly it’s about the checks and balances and enforcement the system provides.
When they work well, human legal systems work because they are applied only to govern humans. Dealing with humans and predicting human behavior is something that humans are pretty good at. We expect humans to have a pretty familiar set of vices and virtues.
Human legal systems are good enough for humans, but simply are not made for any really alien kind of intelligence. Our systems of checks and balances are set up to fight greed and corruption, not a disinterested will to fill the universe with paperclips.
I submit that current legal systems (or something close) will apply to AIs. And there will be lots more laws written to apply to AI-related matters.
It seems to me current laws already protect against rampant paperclip production. How could an AI fill the universe with paperclips without violating all kinds of property rights, probably prohibitions against mass murder (assuming it kills lots of humans as a side effect), financial and other fraud to aquire enough resources, etc. I see it now: some DA will serve a 25,000 count indictment. That AI will be in BIG trouble.
Or say in a few years technology exists for significant matter transmutation, highly capable AIs exist, one misguided AI pursues a goal of massive paperclip production, and it thinks it found a way to do it without violating existing laws. The AI probably wouldn’t get past converting a block or two in New Jersey before the wider public and legislators wake up to the danger and rapidly outlaw that and related practices. More likely, technologies related to matter transmutation will be highly regulated before an episode like that can occur.
I have no idea myself, but if I had the power to exponentially increase my intelligence beyond that of any human, I bet I could figure something out.
The law has some quirks. I’d suggest that any system of human law necessarily has some ambiguities, confusions and, internal contradictions. Laws are composed largely of leaky generalizations. When the laws regulate mere humans, we tend to get by, tolerating a certain amount of unfairness and injustice.
For example, I’ve seen a plausible argument that “there is a 50-square-mile swath of Idaho in which one can commit felonies with impunity. This is because of the intersection of a poorly drafted statute with a clear but neglected constitutional provision: the Sixth Amendment’s Vicinage Clause.”
There’s also a story about Kurt Gödel nearly blowing his U.S. citizenship hearing by offering his thoughts on how to hack the U.S. Constitution to “allow the U.S. to be turned into a dictatorship.”
After reading that line I checked the date of the post to see if perhaps it was from 2007 or earlier.
Can you think of an instance where defeat of one’s enemies was more than an instrumental goal and was an ultimate goal?
Yes. When (a substantial, influential fraction of the populations of) two countries hate each other so much that they accept large costs to inflict them larger costs, demand extremely lopsided treaties if they’re willing to negotiate at all, and have runaway “I hate the enemy more than you!” contests among themselves. When a politician in one country who’s willing to negotiate somewhat more is killed by someone who panics at the idea they might give the enemy too much. When someone considers themselves enlightened for saying “Oh, I’m not like my friends. They want them all to die. I just want them to go away and leave us alone.”.
First of all, it’s not clear that individual apparently non-Pareto-optimal actions in isolation are evidence of irrationality or non-Pareto optimal behavior on a larger scale. This is particularly often the case when the “lose-lose” behavior involves threats, commitments, demonstrating willingness to carry through, etc
Second of all, “someone who panics at the idea they might give the enemy too much” implies, or at least leaves open, the possibility that the ultimate concern is losing something ultimately valuable that is being given, rather than the ultimate goal being the defeat of the enemies. Likewise “demand extremely lopsided treaties if they’re willing to negotiate at all”, which implies strongly that they are seeking something other than the defeat of foes.
One point of mine is that this “enlightened” statement may actually be the extrapolated volition of even those who think they “want them all to die”. It’s pretty clear how for the “enlightened” person, the unenlightened value set could be instrumentally useful.
Most of all, war was characterized as being something that had the ultimate/motivating goal of defeating enemies. I object that it isn’t, but please recognize I go far beyond what I would need to assert to show that when I ask for examples of war ever being something driven by the ultimate goal of defeating enemies. Showing instances in which wars followed the pattern would only be the beginning of showing war in general is characterized by that goal.
I similarly would protest if someone said “the result of addition is the production of prime numbers, it is the defining characteristic of addition”. I would in that case not ask for counterexamples, but would use other methods to show that no, that isn’t a defining characteristic of addition nor is it the best way to talk about addition. Of course, some addition does result in prime numbers.
I agree there could be such a war, but I don’t know that there have ever been any, and highlighting this point is an attempt to at least show that any serious doubt can only be about whether war ever is characterized by having the ultimate goal of defeating enemies; there can be no doubt that war in general does not have as its motivating goal the defeat of one’s enemies.
I am aware of ignoring threats, using uncompromisable principles to get an advantage in negotiations, breaking your receiver to decide on a meeting point, breaking your steering wheel to win at Chicken, etc. I am also aware of the theorem that says even if there is a mutually beneficial trade, there are cases where selfish rational agents refuse to trade, and that the theorem does not go away when the currency they use is thousands of lives. I still claim that the type of war I’m talking about doesn’t stem from such calculations; that people on side A are willing to trade a death on side A for a death on side B, as evidenced by their decisions, knowing that side B is running the same algorithm.
A non-war exemple is blood feuds; you know that killing a member of family B who killed a member of family A will only lead to perpetuating the feud, but you’re honor-bound to do it. Now, the concept of honor did originate from needing to signal a commitment to ignore status exortion, and (in the absence of relatively new systems like courts of law) unilaterally backing down would hurt you a lot—but honor acquired a value of its own, independently from these goals. (If you doubt it, when France tried to ban duels and encourage trials, it used a court composed of war heroes who’d testified the plaintiff wasn’t dishonourable for refusing to duel.)
Plausible, but not true of the psychology of this particular case.
Well obviously they aren’t foe-deaths-maximizers. It’s just that they’re willing to trade off a lot of whatever-they-went-to-war-for-at-first in order to annoy the enemy.
The person who said that was talking about a war where it’s quite unrealistic to think any side would go away (as with all wars over inhabited territory). Genociding the other side would be outright easier.
Agree it isn’t. I don’t even think anyone starts a war with that in mind—war is typically a game of Chicken. I’m pointing out a failure that leads from “I’m going to instill my supporters with an irrational burning hatred of the enemy, so that I can’t back down, so that they have to” to “I have an irrational burning hatred of the enemy! I’ll never let them back down, that’d let them off too easily!”.
Care to guess which war in particular I was thinking of? (By PM if it’s too political.) I think it applies to any entrenched conflict where the identify as enemies of the and have done so for several generations, but I do have a prototype. Hints:
The “enlightened” remark won’t help, it was in a (second-hand, but verbatim quote) personal conversation.
The politician will.
The “personal conversation” and “political” bits indicate it can’t be too old.
It’s not particularly hard to guess.
I’ll go along, but don’t forget my original point was that this psychology does not universally characterize war.
Good point, you are right about that.
I don’t understand what you mean to imply by this. It may still be useful to be hateful and think genocide is an ultimate goal. If one is unsure whether it is better to swerve left or swerve right to avoid an accident, ignorant conviction that only swerving right can save you may be more useful than true knowledge that swerving right is the better bet to save you. Even if the indifferent person personally favored genocide and it was optimal in a sense, such an attitude would be more common among hateful people.
Hmm I think it’s enough for me if no one ever starts a war with that in mind, even if my original response was broader than that. Then at some point in every war, defeating the enemy is not an ultimate goal. This sufficiently disentangles “defeat of the enemy” from war and shows they are not tightly associated, which is what I wanted to say.
I’m puzzled as to why you thought it would help, if first hand.
“Too much” included weapons and...I’m not seeing the hate.
That wanting to be left alone is an unreasonable goal.
I don’t.
Yeah, that was easy. :)
Your link is paywalled, though the text can be found easily elsewhere.
I’m… extremely surprised. I have read stuff Amir said and wrote, but I haven’t read this book. I have seen other people exhibit the hatred I speak of, and I sorta assumed it fit in with the whole “omg he’s giving our land to enemies gotta kill him” thing. It does involve accepting only very stringent conditions for peace, but I completely misunderstood the psychology… so he really murdered someone out of a cold sense of duty. I thought he just thought Rabin was a bad guy and looked for a fancy Hebrew word for “bad guy” as an excuse to kill him, but he was entirely sincere. Yikes.
I’m not sure what “left alone” means, exactly. I think I disagree with some plausible meanings and agree with others.
I think the Israeli feeling towards Arabs is better characterized as “I just want them to go away and leave us alone,” and if you asked this person’s friends they would deny hating and claim “I just want them to go away and leave us alone,” possibly honestly, possibly truthfully.
I think different segments of Israeli society have different non-negotiable conditions and weights for negotiable ones, and only the combination of them all is so inflexible. One can say about any subset that, granted the world as it is, including other segments of society, their demands are temporally impossible to meet from resources available.
Biblical Israel did not include much of modern Israel, including coastal and inland areas surrounding Gaza, coastal areas in the north and, the desert in the south. It did include territory not part of modern Israel, the areas surrounding the Golan and areas on the east bank of the Jordan river, and its core was the land on the west bank of the Jordan river. It would not be at all hard to induce the Israeli right to give up on acquiring southeast Syria, etc. even though it was once biblical Israel. Far harder is having them accede to losing entirely and being evicted from the land where Israel has political and military control, had the biblical states, and they are a minority population.
It might not be difficult to persuade the right to make many concessions the Israeli left or other countries would never accept. Examples include “second class citizenship” in the colloquial sense i.e. permanent non-citizen metic status for non-Jews, paying non-Jews to leave, or even giving them a state in what was never biblical Israel where Jews now live and evicting Jews resident there, rather than give non-Jews a state where they now are the majority population in what was once biblical Israel. The left would not look kindly upon such a caste system, forced transfer, soft genocide of paying a national group to disperse, or evicting majority populations to conform to biblical history.
I think it is only the Israeli right+Israeli left conditions for peace that are so stringent, and so I reject the formulation “it does involve accepting only very stringent conditions for peace” as a characterization of either the Israeli left or right, though not them in combination. To say it of the right pretends liberal conclusions (that I happen to have) are immutable.
Mostly agreed, though I don’t think it’s the right way of looking at the problem—you want to consider all the interactions between the demands of each Israeli subgroup (also, groups of Israel supporters abroad) and the demands of each Palestinian subgroup (also, surrounding Arab countries).
I meant just Yigal Amir. I’m pretty sure the guy wasn’t particularly internally divided.
I had meant to imply that
Probably, but one ought to consider what policies he would endure that he would not have met with vigilante violence. I may have the most irrevocable possible opposition to, say, the stimulus bill’s destruction of inefficient car engines when replacing the engines would be even less efficient by every metric than continuing to run the old engine, a crude confluence of the broken window fallacy and lost purposes, but no amount of that would make me kill anybody.
Hmm… interesting ideas. I don’t intend to suggest that the AI would have human intentions at all, I think we might be modelling the idea of a failsafe in a different way.
I was assuming that the idea was an AI with a separate utility-maximising system, but to also make it follow laws as absolute, inviolable rules, thus stopping unintended consequences from the utility maximisation. In this system, the AI would ‘want’ to pursue its more general goal and the laws would be blocks. As such, it would find other ways to pursue its goals, including changing the laws themselves.
If the corpus of laws instead form part of what the computer is trying to achieve/uphold we face different problems. Firstly, laws are prohibitions and it’s not clear how to ‘maximise’ them beyond simple obedience. Unless it’s stopping other people breaking them in a Robocop way. Second, failsafes are needed because even ‘maximise human desire satisfaction’ can throw up lots of unintended results. An entire corpus of law would be far more unpredictable in its effects as a core programme, and thus require even more failsafes!
On a side point, my argument about cause, negligence etc. was not that the computer would fail to understand them, but that as regards a superintelligence, they could easily be either meaningless or over-effective.
For an example of the latter, if we allow someone to die, that’s criminal negligence. This is designed for walking past drowning people and ignoring them etc. A law-abiding computer might calculate, say, that even with cryonics etc, every life will end in death due to the universe’s heat death. It might then sterilise the entire human population to avoid new births, as each birth would necessitate a death. And so on. Obviously this would clash with other laws, but that’s part of the problem: every action would involve culpability in some way, due to greater knowledge of consequences.
The laws might be appropriately viewed primarily as blocks that keep the AI from taking actions deemed unacceptable by the collective. AIs could pursue whatever goals they sees fit within the constraints of the law.
However, the laws wouldn’t be all prohibitions. The “general laws” would be more prescriptive, e.g., life, liberty, justice for all. The “specific laws” would tend to be more prohibition oriented. Presumably the vast majority of them would be written to handle common situations and important edge cases. If someone suspects the citizenry may be at jeopardy of frequent runaway trolly incidents, the legislature can write statutes on what is legal to throw under the wheels to prevent deaths of (certain configurations of) innocent bystanders. Probably want to start with inanimate objects before considering sentient robots, terminally sick humans, fat men, puppies, babies, and whatever. (It might be nice to have some clarity on this! :-))
To explore your negligence case example, I imagine some statute might require agents to rescue people in imminent danger of losing their lives if possible, subject to certain extenuating cicumstances. The legislature and public can have a lively debate about whether this law still makes sense in a future where dead people can be easily reanimated or if human life is really not valuable in the grand scheme of things. If humans have good representatives in the legistature and/or good a few good AI advocates, mass human extermination shouldn’t be a problem, at least until the consensus shifts in such directions. Perhaps some day there may be a consensus on forced sterilizations to prevent greater harms. I’d argue such a system of laws should be able to handle it. The key seems to be to legislate prescriptions and prohibitions relevant to current state of society and change them as the facts on the ground change. This would seem to get around the impossibility of defining eternal laws or algorithms that are ever-true in every possible future state.
I still don’t see how laws as barriers could be effective. People are arguing whether it’s possible to write highly specific failsafe rules capable of acting as barriers, and the general feeling is that you wouldn’t be able to second-guess the AI enough to do that effectively. I’m not sure what replacing these specific laws with a large corpus of laws achieves. On the plus side, you’ve got a large group of overlapping controls that might cover each others’ weaknesses. But they’re not specially written with AI in mind and even if they were, small political shifts could lead to loopholes opening. And the number also means that you can’t clearly see what’s permitted or not: it risks an illusion of safety simply because we find it harder to think of something bad an AI could do that doesn’t break any law.
Not to mention the fact that a utility-maximising AI would seek to change laws to make them better for humans, so the rules controlling the AI would be a target of their influence.
I guess here I’d reiterate this point from my latest reply to orthonormal:
It may not be helpful to think of some grand utility-maximising AI that constantly strives to maximize human happiness or some other similar goals, and can cause us to wake up in some alternate reality some day. It would be nice to have some AIs working on how to maximize some things human’s value, e.g., health, happiness, attractive and sensible shoes. If any of those goals would appear to be impeded by current law, the AI would lobby it’s legislator to amend the law. And in a better future, important amendments would go through rigorous analysis in a few days, better root out unintended consequences, and be enacted as quickly as prudent.
Many legal systems have all sorts of laws that are vague or even contradictory. Sometimes laws are on the books and are just no longer enforced. Many terms in laws are also ill-defined, sometimes deliberately so. Having an AI try to have almost anything to do with them is a recipe for disaster or comedy (most likely both).
We would probably start with current legal systems and remove outdated laws, clarify the ill-defined, and enact a bunch of new ones. And our (hyper-)rational AI legislators, lawyers, and judges should not be disposed to game the system. AI and other emerging technologies should both enable and require such improvements.
It seems like an applause light to invoke international law as a solution to almost anything, particularly this problem. What aspect of having rules made in a compromise of politicing makes it less likely to have exploitable loopholes than any other system?
Fines? The misdoing we’re worried about is seizing power. Fines would require power sufficient to punish an AI after its misdoings, and have nothing to do with programming it not to be harmful.
Somehow I do’t think the solution to the problem of having powerful AIs that don’t care about us (for better or worse) is to teach them Islamic law.