Autopilot Ethics And The Illusory Self

Link post

I think most people reading this blog agree that the folk idea of a “self” is bunk. There is no ghost or homunculus unconstrained by physical reality controlling our body, nor are our future or past actions understandable or predictable by our current self.

I assume this seems unimportant to you, it seemed unimportant to me. But when I was trying to steelman Harris’ case for harping on so much about free will, I think I accidentally managed to convince myself I was wrong in thinking it so irrelevant a matter.

Among philosophers, a guy like Chalmers or Nagle is actually fairly niche and often cited just for the association. Which are the philosophers’ people actually read, say the one that the vast majority of politicians have read? Well, for example, John Rawls.

Remember Rawls’ “original condition,” thought experiment? What most people take to be the underpinning of our commonly agreed upon political philosophy?… yeah, you’re right, that makes no sense unless you assume you are a magical ghost that’s just so happening to control a human body.

And suddenly you realize that a very very large swath of philosophy and thought in general, both modern and historical, might be laboring under this completely nonsensical model of the mind/​self that we simply fail to notice.


Since I wrote that, I kept thinking of more and more situations where laboring under this model seems to cause a great amount of confusion.

I think one is people thinking about algorithms used to control things like cars making decisions of ethical importance. The very idea of an autopilot having to make an ethical decision is kind of silly, given how rarely such a situation occurs on the road in which someone must be harmed. But let me grant that there are situations in which an autopilot might be forced to make a split-second decision with ethical import.

Maybe not quite a trolley problem, something more realistic, like breaking quickly enough to endanger the driver or bumping slightly into the car in front, thus damaging it and putting its passengers in slight danger. Or deciding whether to veer off-road into a ditch or hit a child that jumped in the middle of the street. You get the point, take your pick of a semi-reasonable thought experiment where one way or another the autopilot is likely to cause harm to someone


The “real” answer to this question is that nobody has the ability to give a general answer. The best you can do is observe the autopilot’s behavior in a given instance.

Note: assume I’m talking about any popular self-driving software + hardware suite you’d like

The autopilot follows simple rules with a focus on getting to a destination without crashing or breaking any laws. Under any conditions of risk (e.g. road too wet, mechanical detect) it delegates the wheel and option to keep going to a human driver, or outright blocks the car until help can arrive.

When it comes to split-second decisions where risk is involved, there is no “ethics module” that kicks in to evaluate the risk. If the car can’t be safely stopped, the autopilot will try to unsafely stop it. How it does this is a generative rule stemming out of its programming for more generic situations. Since the amount of specific dangerous situations is infinite, you can’t really have specific rules for all (or any) of them.


People gasp at this in horror, “well how can you entrust a thing to take away a human’s life if it doesn’t reason about ethics”.

It’s at this point that the illusory self comes in. Obviously, with a moment of clear thinking, it’s obvious that a human is no different from an autopilot in these circumstances.

Your system of ethics, education, feelings about those in the car or about the people on the road will not come into play when making a split-millisecond decision. What will come into play is… well, a complex interaction of fast-acting sub-systems which we don’t understand.

At no point will you ponder the ethics of the situation, who to endanger and who not to, while trying to stop a car going 100km/​h in the span of less than a second.

People thinking about wanna-be trolley problems in traffic and assuming they know how they would react are just confused about their own mind, thinking their actions are going to be consistent with the narrative they are living at a given moment, under any circumstances.

What I find interesting in this scenario though, besides the number of people being fooled by it, is that it doesn’t have to do with self-control or preference predictions. It’s simply a scenario that’s too quick for any “self like” thing to be in control, yet for some reason, we are convinced that, were we to be put in such a situation, we could act from a place that is, or at least feels similar to, the current self.


One interesting point here though is that an autopilot could in theory aspire to make such decisions while thinking ethically, even if a human can’t, since compute might allow for it.

Currently, we hold humans liable if they make a mistake on the road, but only to the extent that we take away their right to drive, which is the correct utilitarian response (and too seldom does it happen). But nobody goes to prison for killing a pedestrian outside of situations where the accident was due to previous acts that conscious decisions could have avoided (getting in the car drunk, getting in the car without a license).

However, car companies could be held liable for killing pedestrians, and the liability could be in the dozens or hundreds of millions, since, unlike people, they can afford this. That would lead to a race to get a 100% safe autopilot and we might expect that autopilot to do superhuman things such as reason morally under split-millisecond constraints and take the correct decision.

But right now I think we are far away from that, and most people still oppose autopilots on grounds that they are inferior to people. This is what I’m hoping to help debunk here.