Consider this tumblr post by nostalgebraist, the contents of which I entirely concur with and endorse. It would seem to contradict, or at least undermine the applicability of, the approach I describe in this post.
More generally, the contradiction arises because, while this is entirely true—
the rule as stated, together with the criteria for deciding whether something is a “legitimate” exception, is the actual rule.
—the difficulty is that in some cases, the stated rule may be straightforward and legible, but the criteria for evaluating the legitimacy of exceptions is complex and illegible (and, in many or even most such cases, attempting to make the criteria legible will inevitably result in discarding important information).
Thus, e.g., in the sort of scenario described by nostalgebraist, the “actual rule” is “these are the explicit rules, but I also reserve the right to apply my own, fundamentally irreducible[1] judgment to make exceptions, and I admit of no formal/explicit rule which stands above that right”. In this case, spelling out the “actual rule” seems to have gained us very little.
Yet I think that the approach I describe withstands this challenge—because it remains the best approach, despite not being perfect; all the other solutions to the question (of what to do about apparently-compelling exceptions to apparently-reasonable rules) do no better, in such cases.
And while we gain little by spelling out the “actual rule” in these “complex and/or illegible exception-judging criteria” situations, nevertheless we do gain something—namely, making explicit (and therefore salient) the fact that unexpected exceptions (driven by irreducible judgment) are a possibility. What is explicit, can be better prepared for, and can be discussed, and problems addressed; so this is a benefit, if not a very great one.
Why do I say “fundamentally irreducible”? Suppose that you offer some operationalization of my judgment criteria—one which appears to account for all of the judgments I’ve made, to instantiate any principles that seem to stand behind my judgment criteria, not to to leave unaddressed any cases I can imagine, etc. You may be tempted to call this a successful reduction—to identify my judgment with your reduction of it. Yet recall that, by construction, I have retained the right to “call bullshit” on any application of an explicit rule which I feel goes against the rule’s spirit; which means that I remain free to, e.g., reject the output of your operationalization of my judgment criteria, in any future case, no matter how closely that output has matched my judgment thus far. Since this applies to any operationalization you can construct—which must, by definition, be explicit—the “personal judgment” rule is a meta-rule of a higher order than any explicit rule, and operationalizing it is impossible.
Paul Scharre, in his excellent book about the application of AI to military technology, Army of None, has an anecdote which I think is relevant. In the book, he talks about leading a patrol up an Afghan hillside. As he and the troops under his command ascend the hillside, they’re spotted by a farmer. Realizing that they’ve been spotted, the patrol hunkers down and awaits the inevitable attack by Afghan insurgent forces. However, before the attackers arrive, something unexpected happens. A little girl, about 5 or 6 years of age, comes up to the position, with some goats and a radio. She reports the details of the Americans’ deployment to the attacking insurgents and departs. Shortly thereafter, the insurgent attack begins in earnest, and results in the Americans being driven off the hillside.
After the failed patrol, Scharre’s troop held an after-action briefing where they discussed what they might have done differently. Among the things they discussed was potentially detaining the little girl, or at least relieving her of her radio so as to limit the information being passed back to the attackers. However, at no point, did anyone suggest the alternative of shooting the girl, even though they would have been perfectly justified, under the laws of war and rules of engagement, in doing so. Under the laws of war, anyone who acts like a soldier is a soldier, and this includes 5-year-old-girls conducting reconnaissance for insurgents. However, everyone understood, on a visceral level, that there was a difference between permissible and correct and that the choice of shooting the girl, while permissible, was morally abhorrent to the point where it was discarded at an unconscious level.
That said, no one in the troop also said, “Okay, well, we need to amend our rules of engagement to say, ‘Shooting at people conducting reconnaissance is permissible… except when the person is a cute little 5-year-old-girl.’” Everyone recognized, again, at an unconscious level, that there was value to having a legible rule “Shooting at people behaving in a soldierly manner is acceptable,” with illegible exceptions (“Except when that person is a 5-year-old girl leading goats”). The drafters of rules cannot anticipate every circumstance in which the rule might be applied, and thus having some leeway about the specific obligations (while making the intent of the rule clear) is valuable insofar as it allows people to take actions without being paralyzed by doubt. This applies as much to rules governing as an organization as it does to rules that you make for yourself.
The application to AI is, I hope, obvious. (Unfriendly) AIs don’t make a distinction between permissible and correct. Anything that is permissible is an option that can be taken, if it furthers the AI’s objective. Given that, I would summarize your point about having illegible exceptions as, “You are not an unfriendly AI. Don’t act like one.”
That’s something you see in movies, yes, but as I understand what Paul Scharre is saying, it’s not something that’s actually true. According to him, the laws of war “care about what you do, not who you are.” If you are behaving in a soldierly fashion, you are a soldier, whether you are a young man, old man, woman, or child.
Anecdote: during deployment when we arrive in country, we are given briefings about the latest tactics being employed in the area where we will be operating. When I went to Iraq in 2008 one of these briefings was about young girls wearing suicide vests, which was previously unprecedented.
The tactic consisted of taking a family hostage, and telling the girl that if she did not wear this vest and go to X place at Y time, her family would be killed. Then they would detonate the vest by remote.
We copped to it because sometimes we had jammers on which prevented the detonation, and one of the girls told us what happened. Of course, we didn’t have jammers everywhere. Then the calculus changes from whether we can take the hit in order to spare the child, to one child or many (suicide bombings target crowds).
The obvious wrongness of killing children does not change; nor that of allowing children to die. So one guy eats the sin, and the others feel ashamed for letting him.
On a more depressing note one might look into the events in the Korean War “except for women and children” was not applied. The movie is called A Little Pond (it was available on Amazon Prime a year or so back not sure if it’s there now though) about the events at Nogunri.
Now, the movie also depicts the more human side of a soldier when confronted directly with that act—rather than the impersonal shapes from hundreds of meters away—near the end of the movie.
I would also add, regarding the whole permissible versus exception, that I suspect it is even grayer than suggested. The 5 year old with a radio is hardly and less a part of the fighting force than the civilians providing all the logistics and production supporting any of the military actions. So where is that line?
I’m not sure the AI will do much worse or much better than those making the plans and issuing the orders far from the battle ground and not exposed to the bloodshed and human carnage.
Indeed. In particular I want to note Nate Soares’ point about how one of the reasons you don’t necessarily know what you’re fighting for, is that your goal(s) may change as you learn more, grow, etc. Similarly, illegible complex judgment criteria may shift over time (and for that reason will not be amenable to formalization, which is of necessity static), while still always being “my own judgment”; it is precisely that freedom to alter the criteria which I protect by resisting any proffered formalization.
re your footnote: The explicit version of your judgement allows you an override, yet by construction you will never take it. So the crux behind whether the versions are semantically the same is whether we define rules to allow or disallow actions, or timelines.
Playing the devil’s advocate:
Consider this tumblr post by nostalgebraist, the contents of which I entirely concur with and endorse. It would seem to contradict, or at least undermine the applicability of, the approach I describe in this post.
More generally, the contradiction arises because, while this is entirely true—
—the difficulty is that in some cases, the stated rule may be straightforward and legible, but the criteria for evaluating the legitimacy of exceptions is complex and illegible (and, in many or even most such cases, attempting to make the criteria legible will inevitably result in discarding important information).
Thus, e.g., in the sort of scenario described by nostalgebraist, the “actual rule” is “these are the explicit rules, but I also reserve the right to apply my own, fundamentally irreducible[1] judgment to make exceptions, and I admit of no formal/explicit rule which stands above that right”. In this case, spelling out the “actual rule” seems to have gained us very little.
Yet I think that the approach I describe withstands this challenge—because it remains the best approach, despite not being perfect; all the other solutions to the question (of what to do about apparently-compelling exceptions to apparently-reasonable rules) do no better, in such cases.
And while we gain little by spelling out the “actual rule” in these “complex and/or illegible exception-judging criteria” situations, nevertheless we do gain something—namely, making explicit (and therefore salient) the fact that unexpected exceptions (driven by irreducible judgment) are a possibility. What is explicit, can be better prepared for, and can be discussed, and problems addressed; so this is a benefit, if not a very great one.
Why do I say “fundamentally irreducible”? Suppose that you offer some operationalization of my judgment criteria—one which appears to account for all of the judgments I’ve made, to instantiate any principles that seem to stand behind my judgment criteria, not to to leave unaddressed any cases I can imagine, etc. You may be tempted to call this a successful reduction—to identify my judgment with your reduction of it. Yet recall that, by construction, I have retained the right to “call bullshit” on any application of an explicit rule which I feel goes against the rule’s spirit; which means that I remain free to, e.g., reject the output of your operationalization of my judgment criteria, in any future case, no matter how closely that output has matched my judgment thus far. Since this applies to any operationalization you can construct—which must, by definition, be explicit—the “personal judgment” rule is a meta-rule of a higher order than any explicit rule, and operationalizing it is impossible.
Paul Scharre, in his excellent book about the application of AI to military technology, Army of None, has an anecdote which I think is relevant. In the book, he talks about leading a patrol up an Afghan hillside. As he and the troops under his command ascend the hillside, they’re spotted by a farmer. Realizing that they’ve been spotted, the patrol hunkers down and awaits the inevitable attack by Afghan insurgent forces. However, before the attackers arrive, something unexpected happens. A little girl, about 5 or 6 years of age, comes up to the position, with some goats and a radio. She reports the details of the Americans’ deployment to the attacking insurgents and departs. Shortly thereafter, the insurgent attack begins in earnest, and results in the Americans being driven off the hillside.
After the failed patrol, Scharre’s troop held an after-action briefing where they discussed what they might have done differently. Among the things they discussed was potentially detaining the little girl, or at least relieving her of her radio so as to limit the information being passed back to the attackers. However, at no point, did anyone suggest the alternative of shooting the girl, even though they would have been perfectly justified, under the laws of war and rules of engagement, in doing so. Under the laws of war, anyone who acts like a soldier is a soldier, and this includes 5-year-old-girls conducting reconnaissance for insurgents. However, everyone understood, on a visceral level, that there was a difference between permissible and correct and that the choice of shooting the girl, while permissible, was morally abhorrent to the point where it was discarded at an unconscious level.
That said, no one in the troop also said, “Okay, well, we need to amend our rules of engagement to say, ‘Shooting at people conducting reconnaissance is permissible… except when the person is a cute little 5-year-old-girl.’” Everyone recognized, again, at an unconscious level, that there was value to having a legible rule “Shooting at people behaving in a soldierly manner is acceptable,” with illegible exceptions (“Except when that person is a 5-year-old girl leading goats”). The drafters of rules cannot anticipate every circumstance in which the rule might be applied, and thus having some leeway about the specific obligations (while making the intent of the rule clear) is valuable insofar as it allows people to take actions without being paralyzed by doubt. This applies as much to rules governing as an organization as it does to rules that you make for yourself.
The application to AI is, I hope, obvious. (Unfriendly) AIs don’t make a distinction between permissible and correct. Anything that is permissible is an option that can be taken, if it furthers the AI’s objective. Given that, I would summarize your point about having illegible exceptions as, “You are not an unfriendly AI. Don’t act like one.”
At least in the old war movies I’ve seen, that used to have the general “except for women and children” clause.
That’s something you see in movies, yes, but as I understand what Paul Scharre is saying, it’s not something that’s actually true. According to him, the laws of war “care about what you do, not who you are.” If you are behaving in a soldierly fashion, you are a soldier, whether you are a young man, old man, woman, or child.
I affirm Scharre’s interpretation.
Anecdote: during deployment when we arrive in country, we are given briefings about the latest tactics being employed in the area where we will be operating. When I went to Iraq in 2008 one of these briefings was about young girls wearing suicide vests, which was previously unprecedented.
The tactic consisted of taking a family hostage, and telling the girl that if she did not wear this vest and go to X place at Y time, her family would be killed. Then they would detonate the vest by remote.
We copped to it because sometimes we had jammers on which prevented the detonation, and one of the girls told us what happened. Of course, we didn’t have jammers everywhere. Then the calculus changes from whether we can take the hit in order to spare the child, to one child or many (suicide bombings target crowds).
The obvious wrongness of killing children does not change; nor that of allowing children to die. So one guy eats the sin, and the others feel ashamed for letting him.
On a more depressing note one might look into the events in the Korean War “except for women and children” was not applied. The movie is called A Little Pond (it was available on Amazon Prime a year or so back not sure if it’s there now though) about the events at Nogunri.
Now, the movie also depicts the more human side of a soldier when confronted directly with that act—rather than the impersonal shapes from hundreds of meters away—near the end of the movie.
I would also add, regarding the whole permissible versus exception, that I suspect it is even grayer than suggested. The 5 year old with a radio is hardly and less a part of the fighting force than the civilians providing all the logistics and production supporting any of the military actions. So where is that line?
I’m not sure the AI will do much worse or much better than those making the plans and issuing the orders far from the battle ground and not exposed to the bloodshed and human carnage.
Meta: I approve of the practice of arguing against your own post in a comment.
See also You Don’t Get To Know What You’re Fighting For, which makes this sort of situation more explicit.
Indeed. In particular I want to note Nate Soares’ point about how one of the reasons you don’t necessarily know what you’re fighting for, is that your goal(s) may change as you learn more, grow, etc. Similarly, illegible complex judgment criteria may shift over time (and for that reason will not be amenable to formalization, which is of necessity static), while still always being “my own judgment”; it is precisely that freedom to alter the criteria which I protect by resisting any proffered formalization.
re your footnote: The explicit version of your judgement allows you an override, yet by construction you will never take it. So the crux behind whether the versions are semantically the same is whether we define rules to allow or disallow actions, or timelines.