However, my classification looks different from the one above: it is a classification of external behaviours, not of internal failure modes. It start from the risks of AI which is below human level, like “narrow-AI viruses” or narrow AI used to create advance weapons, like biological weapons.
Then I look into different risks during AI takeoff and after it. The interesting ones are:
AI kills human to make world simpler.
Two AIs go into war
AI blackmails humanity by a doomsday weapon to get what it needs.
Next is the difference between non-friendly AIs and failures of friendliness. For example, if AI wireheads everybody, it is failure of friendliness, as well as dangerous value learners.
Another source of failures is technical: that is bugs, accumulation of errors, conflicting subgoals and general problems related to complexity. AI’s self-wireheading also belongs here. All this could result into unpredictable halting of AI-Singleton with catastrophic consequences for all humanity about which it now cares.
The last source of possible AI-halting is unresolvable philosophical problems, which effectively halt it. We could imagine several, but not all. Such problems are something like: unsolvable “meaning of life” (or “is-ought” problem) and the problem that the result of computation doesn’t depend on AI’s existence, so it can’t prove to itself that it actually exist.
AI also could encounter more advance alien AI (or its signals) and fail its victim.
3 years ago I created a map of different ideas about possible AI failures (LW-post, pdf). Recently I converted it into an article “Classification of global catastrophic risks connected with artificial intelligence”. I think there is around 100 failure modes which we could imagine now, and obviously some unimaginable.
However, my classification looks different from the one above: it is a classification of external behaviours, not of internal failure modes. It start from the risks of AI which is below human level, like “narrow-AI viruses” or narrow AI used to create advance weapons, like biological weapons.
Then I look into different risks during AI takeoff and after it. The interesting ones are:
AI kills human to make world simpler.
Two AIs go into war
AI blackmails humanity by a doomsday weapon to get what it needs.
Next is the difference between non-friendly AIs and failures of friendliness. For example, if AI wireheads everybody, it is failure of friendliness, as well as dangerous value learners.
Another source of failures is technical: that is bugs, accumulation of errors, conflicting subgoals and general problems related to complexity. AI’s self-wireheading also belongs here. All this could result into unpredictable halting of AI-Singleton with catastrophic consequences for all humanity about which it now cares.
The last source of possible AI-halting is unresolvable philosophical problems, which effectively halt it. We could imagine several, but not all. Such problems are something like: unsolvable “meaning of life” (or “is-ought” problem) and the problem that the result of computation doesn’t depend on AI’s existence, so it can’t prove to itself that it actually exist.
AI also could encounter more advance alien AI (or its signals) and fail its victim.