TASRA: A Taxonomy and Analysis of Societal-Scale Risks from AI

Partly in response to calls for more detailed accounts of how AI could go wrong, e.g., from Ng and Bengio’s recent exchange on Twitter, here’s a new paper with Stuart Russell:

Discussion on Twitter… comments welcome!
https://twitter.com/AndrewCritchCA/status/1668476943208169473
arXiv draft:
”TASRA: A Taxonomy and Analysis of Societal-Scale Risks from AI”

Many of the ideas will not be new to LessWrong or the Alignment Forum, but holistically I hope the paper will make a good case to the world for using logically exhaustive arguments to identify risks (which, outside LessWrong, is often not assumed to be a valuable approach to thinking about risk).

I think the most important figure from the paper is this one:

… and, here are some highlights:

Self-fulfilling pessimism:
https://arxiv.org/pdf/2306.06924.pdf#page=4
Industries that could eventually get out of control in a closed loop:
https://arxiv.org/pdf/2306.06924.pdf#page=5
...as in this “production web” story:
https://arxiv.org/pdf/2306.06924.pdf#page=6
Two “bigger than expected” AI impact stories:
https://arxiv.org/pdf/2306.06924.pdf#page=8
Email helpers and corrupt mediators, which kinda go together:
https://arxiv.org/pdf/2306.06924.pdf#page=10
https://arxiv.org/pdf/2306.06924.pdf#page=11
Harmful A/B testing:
https://arxiv.org/pdf/2306.06924.pdf#page=12
Concerns about weaponization by criminals and states:
https://arxiv.org/pdf/2306.06924.pdf#page=13

Enjoy :)