[Question] What could small scale disasters from AI look like?

CharlesD31 Aug 2021 15:52 UTC

14 points

If an AI system or systems goes wrong in the near term and causes harm to humans in a way which is consistent with or supportive of alignment being a big deal, what might it look like?

I’m asking because I’m curious about potential fire-alarm scenarios (including things which just help to make AI risks salient to the wider public), and also looking to operationalise a forecasting question which is currently drafted as

By 2032, will we see an event precipitated by AI that causes at least 100 deaths and/or at least $1B 2021 USD in economic damage?

to allow a clear and sensible resolution.

CharlesD31 Aug 2021 15:52 UTC

14 points

8 comments1 min readLW link

Anon User 31 Aug 2021 19:59 UTC
8 points
Boeing MCAS (https://en.wikipedia.org/wiki/Maneuvering_Characteristics_Augmentation_System) is blaimed by more than 100 deaths. How much “AI” would a similar system need to include for a similar tragedy to count as “an event precipitated by AI”?
- CharlesD 1 Sep 2021 12:31 UTC
  1 point
  Parent
  Great point—I’m not sure if that contained aspects which are similar enough to AI to resolve such a question. This source doesn’t think it counts as AI (though it doesn’t provide much of an argument for this) and I can’t find reference to machine learning or AI on the MCAS page, though clearly one could use AI tools to develop an automated control system like this and I don’t feel well positioned to judge whether it should count.
  - Anon User 7 Sep 2021 2:02 UTC
    3 points
    Parent
    To clarify—I do not think MCAS specifically is an AI based system, I was just thinking of a hypothetical future similar system that does include a weak AI component, but where, similarly to ACAS the issue is not so much with the flaw in AI itself, but in how it is being used in a larger system.
    
    In other words, I think your test needs to make a distinction between a situation where one needed a trustworthy AI, and the actual AI was unintentionally/unexpectedly untrustworthy vs a situation where perhaps the AI performed reasonably well, but the use of AI was problematic, causing a disaster anyway.
Zac Hatfield-Dodds 1 Sep 2021 4:41 UTC
7 points
Such scenarios are at best smoke, not fire alarms.

When I observe that there’s no fire alarm for AGI, I’m not saying that there’s no possible equivalent of smoke appearing from under a door.

What I’m saying rather is that the smoke under the door is always going to be arguable; it is not going to be a clear and undeniable and absolute sign of fire; and so there is never going to be a fire alarm producing common knowledge that action is now due and socially acceptable. …

There is never going to be a time before the end when you can look around nervously, and see that it is now clearly common knowledge that you can talk about AGI being imminent, and take action and exit the building in an orderly fashion, without fear of looking stupid or frightened.
- CharlesD 1 Sep 2021 14:35 UTC
  3 points
  Parent
  The article convincingly makes the weaker claim that there’s no guarantee of a fire alarm, and provides several cases which support this. I don’t buy the claim (which the article also tries to make) that there is no possible fire alarm, and such a claim seems impossible to prove anyway.
  
  Whether it’s smoke or a fire alarm, that doesn’t really address the specific question I’m asking, in any case.
niplav 31 Aug 2021 22:46 UTC
4 points
AI systems find ways to completely manipulate some class of humans, e.g. by making them addicted. Arguably, this is already happening on a wider scale to a smaller amount – people becoming “addicted” to algorithmically generated feeds.

Maybe the question could be concretized to the amount of time people spend on their devices on average?
- CharlesD 1 Sep 2021 14:38 UTC
  2 points
  Parent
  That seems like a different question which is partially entangled with AI but not necessarily, as more screen time doesn’t necessarily need to be caused by AI, and the harms are harder to evaluate (even the sign of the value of “more screen time” is probably disputed).

Charlie Steiner 1 Sep 2021 1:07 UTC
6 points
Some high-profile failures I think we won’t get are related to convergent goals, such as acquiring computing power, deceiving humans into not editing you, etc. We’ll probably get examples of this sort of thing in small scale experiments, that specialists might hear about, but if an AI that’s deceptive for instrumental reasons causes $1bn in damages I think it will be rather too late to learn our lesson.