AI safety: the ultimate trolley problem

chaosmage9 Apr 2022 12:05 UTC

−21 points

I’ll assume you’re familiar with the trolley problem and the fat man variant.

There’s a subtlety to it: if you pushed the fat man and diverted the trolley, you can explain what you did when the police comes, and might get away with something like negligent manslaughter—depending on your jurisdiction, after a trial you might even go unpunished.

Do you push the fat man if you know for a fact you’ll be tried and convicted of murder and nobody will thank you for saving the lives of those who were on the track? I think that’s a much higher bar.

But that’s not the ultimate trolley problem. The ultimate trolley problem is this.

The trolley is AGI. It is on track to crash into the world and kill all humans. Strong regulation and powerful public pressure of sufficient intensity to possibly avert this will arrive in time (=the trolley gets on the other track) iff there’s a very public accident with a runaway AI that kills lots of people and is barely contained (=pushing the fat man). And you’ll be investigated thoroughly -nothing in your digital footprints can indicate your true intent (such as your association with places like Less Wrong) because in that case people who shout about AI safety will be associated with terrorism and their recommendations will be ignored extra hard.

What are the ethics of that? Are you THAT consequentialist?

chaosmage9 Apr 2022 12:05 UTC

−21 points

6 comments1 min readLW link

Shmi 9 Apr 2022 18:58 UTC
5 points
Not sure why people silently downvote this, it’s a common pitfall in consequentialism and is discussed here often, most recently by Eliezer in https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy in Q4.
It’s relatively safe to be around an Eliezer Yudkowsky while the world is ending, because he’s not going to do anything extreme and unethical unless it would really actually save the world in real life, and there are no extreme unethical actions that would really actually save the world the way these things play out in real life, and he knows that.
- justinpombrio 9 Apr 2022 20:27 UTC
  6 points
  Parent
  This reads like a call to violence for anyone who is consequentialist.
  
  It’s saying that either you make a rogue AI “that kills lots of people and is barely contained”, or unfriendly AGI happens and everyone dies. I think the conclusion is meant to be “and therefore you shouldn’t be consequentalist” and not “and therefore you should make a rogue AI”, but it’s not entirely clear?
  
  And I don’t think the “either” statement holds because it’s ignoring other options, and ignoring the high chance the rogue AI isn’t contained. So you end up with “a poor argument, possibly in favor of making a rogue AI”, which seems optimized to get downvotes from this community.
  - chaosmage 20 Apr 2022 15:05 UTC
    2 points
    Parent
    No it doesn’t mean you shouldn’t be consequentialist. I’m challenging people to point out the flaw in the argument.
    
    If you find the argument persuasive, and think the ability to “push the fat man” (without getting LW tangled up in the investigation) might be a resource worth keeping, the correct action to take is not to comment, and perhaps to downvote.
- Chris_Leong 10 Apr 2022 7:42 UTC
  3 points
  Parent
  There’s a difference between abstract philosophical discussion and calls to action and I think most people feel that it leans too far towards the later.
StrangerTides 10 Apr 2022 20:38 UTC
3 points
The fat man variant itself has always seemed invalid to me, because in the real world you could not be sure that pushing the fat man would actually stop the trolley, but you could be more confident that he would be killed either way, so it’s logical not to push him. In the scenario described here, you’d be even less confident that your “public accident” would have the desired effect.
lorepieri 10 Apr 2022 17:45 UTC
3 points
The downvotes are excessive, the post is provoking, but interesting.
I think you will not even need to “push the fat man”. The development on an AGI will be slow and gradual (as any other major technology) and there will be incidents along the way (e.g. an AGI chatbot harassing someone). Those incidents will periodically mandate new regulations, so that measurements to tackle real AGI related dangers will be enacted, similarly to what happens in the nuclear energy sector. They will not be perfect, but there will be regulations.
The tricky part is that not all nations will set similar safety level, in fact some may encourage the development of unsafe, but high reward, AGI. So overall it looks like “pushing the fat man” will not even work that well.