I write fiction. I’m also interested in how AI is going to impact the world. Among other things, I’d prefer that AI not lead to catastrophe. Let’s imagine that I want to combine these two interests, writing fiction that explores the risks posed by AI. How should I go about doing so? More concretely, what ideas about AI might I try to communicate via fiction?

This post is an attempt to partially answer that question. It is also an attempt to invoke Cunningham’s Law: I’m sure there will be things I miss or get wrong, and I’m hoping the comments section might illuminate some of these.

Holden’s Messages

A natural starting point is Holden’s recent blog post, Spreading Messages to Help With the Most Important Century. Stripping out the nuances of that post, here’s a list of the messages that Holden would like to see spread:

We should worry about conflict between misaligned AI and all humans.
AIs could behave deceptively, so “evidence of safety” might be misleading.
AI projects should establish and demonstrate safety (and potentially comply with safety standards) before deploying powerful systems.
Alignment research is prosocial and great.
It might be important for companies (and other institutions) to act in unusual ways.
We’re not ready for this.

However, as interesting as this list is, it’s not what I’m looking for; I’m not looking for bottom-line messages to convey. Instead, I want to identify a list of smaller ideas that will help people to reach their own bottom lines by thinking carefully through the issues. The idea of instrument convergence might appear on such a list. The idea that alignment research is great would not.

One reason for my focus is that fiction writing is ultimately about details. Fiction might convey big messages, but it does so by exploring more specific ideas. This raises the question: which specific ideas?

Another reason for my focus is that I’m allergic to propaganda. I don’t want to tell people what to think and would prefer to introduce ideas that can help people think for themselves. Of course, not all message fiction is propaganda, and I’m not accusing Holden of calling for propaganda. Still, my personal preference is to focus on how to convey the nuts and bolts needed to understand AI.^[1]

What Nuts and Which Bolts?

So with context to hand, back to the question: what ideas about AI might someone try to convey via fiction? Here’s a potential list:

Basics of AI
1. Neural networks are black boxes (though interpretability might help us to see inside).
AI “Psychology”
1. AI systems are likely to be alien in how they think. They are unlikely to think like humans.
2. Orthogonality and instrumental convergence might provide insight into likely AI behaviour.
3. AI systems might be agents, in some relatively natural sense. They might also simulate agents, even if they are not agents.
Potential dangers from AI
1. Outer misalignment is a potential danger, but in the context of neural networks so too is inner misalignment (related: reward misspecification and goal misgeneralisation).
2. Deceptive alignment might lead to worries about a treacherous turn.
3. The possibility of recursive improvement might influence views about takeoff speed (which might influence views about safety).
Broader Context of Potential Risks
1. Different challenges might arise in the case of a singleton, when compared with multipolar scenarios.
2. Arms races can lead to outcomes that no-one wants.
3. AI rights could be a real thing but also incorrect attribution of rights to AI could itself pose a risk (by making it harder to control AI behaviour).

So that’s the list. Having seen it, one might naturally wonder why fiction is the right medium to communicate ideas like this. Part of the answer is that I think it’s useful to explore ideas from many angles.

Another part of the answer is that conveying an idea is one thing but conveying an intuition is another. Humans are used to modelling other humans, and so it is likely that we’ll anthropomorphise when considering AI. Fiction might help with this. It’s one thing to state in factual tones that AI systems are likely to have an alien psychology. It’s quite another to be shown a world in which humans come up against the alien.

So why communicate the ideas? Because it’s plausibly good that those working on AI capabilities, those working on AI safety, and people more broadly are able to reflect on the implications of AI and can understand why many are concerned about it. And why fiction? In part, because an intuitive grasp can be as important as a grasp of facts.

AI Fables

I started this post with a hypothetical, imagining that I wanted to write fiction that explores AI risk. In reality, I doubt that I’ll find a great deal of time to do so. Still, I’d be excited to see other people writing fiction of this sort.

Here’s one genre of story I’d be interested to see more of: AI fables. Fables are short stories, with a particular aesthetic sensibility, that convey a lesson.

While I enjoy the aesthetic of fables I wouldn’t want to narrow the focus too much, but I’d love to see more short stories, of the sort that could be read around a fire on a winter’s night, that communicate a brief lesson about AI.

For example, stories of djinni and golems can be used to communicate the problem of outer misalignment; even if something does precisely what we tell it to, it can be hard to ensure that it does what we actually want it to. I’d love to see a fable that likewise communicated the problem of inner misalignment. I’d love to see a wide variety of such fables, exploring a range of ideas about AI, and maybe even a collection putting them in one place.

If you know of such a story, please link it in the comments. If you write such a story, please link it. And if you have thoughts or additions for the list of ideas in the post, I’d love to hear these.

The ideas in this post were developed in discussion with Elizabeth Garrett and Damon Sasi. Thanks also to Conor Barnes for feedback.

^
I’m also not confident in the bottom lines; I retain substantial uncertainty about how likely AI is to lead to extinction or something equally bad (as opposed to more mundane, but still awful, catastrophe). However, I feel far more confident that there is insight to be gleaned from reflection on the various concepts and ideas underlying the case for AI risk. So this is where I focus.