It’s striking that there are so few concrete fictional descriptions of realistic AI catastrophe, despite the large amount of fiction in the LessWrong canon. The few exceptions, like Gwern’s here or Gabe’s here, are about fast take-offs and direct takeover.
I think this is a shame. The concreteness and specificity of fiction make it great for imagining futures, and its emotional pull can help us make sense of the very strange world we seem to be heading towards. And slower catastrophes, like Christiano’s What failure looks like, are a large fraction of a lot of people’s p(doom), despite being less cinematic.
One thing that motivated me in writing this was that Bostrom’s phrase “a Disneyland without children” seemed incredibly poetic. On first glance it’s hard to tell a compelling or concrete story about gradual goodharting: “and lo, many actors continued to be compelled by local incentives towards collective loss of control …”—zzzzz … But imagine a technological and economic wonderland rising, but gradually disfiguring itself as it does so, until you have an edifice of limitless but perverted plenty standing crystalline against the backdrop of a grey dead world—now that is a poetic tragedy. And that’s what I tried to put on paper here.
Did it work? Unclear. On the literary level, I’ve had people tell me they liked it a lot. I’m decently happy with it, though I think I should’ve cut it down in length a bit more.
On the worldbuilding, I appreciated being questioned on the economic mechanics in the comments, and I think my exploration of this in the comments is a decent stab at what I think is a neglected set of questions about how much the current economy being fundamentally grounded in humans limits the scope of economic-goodharting catastrophes. Recently, I discovered earlier exploration of very similar questions in Scott Alexander’s 2016 “Ascended economy?”, and by Andrew Critch here. I also greatly appreciated Andrew Critch’s recent (2024) post raising very similar concerns about “extinction by industrial dehumanization”.
I continue to hope that more people work on this, and that this piece can help by concretising this class of risks in people’s minds (I think it is very hard to get people to grok a future scenario and care about it unless there is some evocative description of it!).
I’d also hope there was some way to distribute this story more broadly than just on LessWrong and my personal blog. Ted Chiang and the Arrival movie got lots of people exposed to the principle of least action—no small feat. It’s time for the perception of AI risk to break out of decades of Terminator comparisons, and move towards a basket of good fictional examples that memorably demonstrate subtle concepts.
It’s striking that there are so few concrete fictional descriptions of realistic AI catastrophe, despite the large amount of fiction in the LessWrong canon. The few exceptions, like Gwern’s here or Gabe’s here, are about fast take-offs and direct takeover.
I think this is a shame. The concreteness and specificity of fiction make it great for imagining futures, and its emotional pull can help us make sense of the very strange world we seem to be heading towards. And slower catastrophes, like Christiano’s What failure looks like, are a large fraction of a lot of people’s p(doom), despite being less cinematic.
One thing that motivated me in writing this was that Bostrom’s phrase “a Disneyland without children” seemed incredibly poetic. On first glance it’s hard to tell a compelling or concrete story about gradual goodharting: “and lo, many actors continued to be compelled by local incentives towards collective loss of control …”—zzzzz … But imagine a technological and economic wonderland rising, but gradually disfiguring itself as it does so, until you have an edifice of limitless but perverted plenty standing crystalline against the backdrop of a grey dead world—now that is a poetic tragedy. And that’s what I tried to put on paper here.
Did it work? Unclear. On the literary level, I’ve had people tell me they liked it a lot. I’m decently happy with it, though I think I should’ve cut it down in length a bit more.
On the worldbuilding, I appreciated being questioned on the economic mechanics in the comments, and I think my exploration of this in the comments is a decent stab at what I think is a neglected set of questions about how much the current economy being fundamentally grounded in humans limits the scope of economic-goodharting catastrophes. Recently, I discovered earlier exploration of very similar questions in Scott Alexander’s 2016 “Ascended economy?”, and by Andrew Critch here. I also greatly appreciated Andrew Critch’s recent (2024) post raising very similar concerns about “extinction by industrial dehumanization”.
I continue to hope that more people work on this, and that this piece can help by concretising this class of risks in people’s minds (I think it is very hard to get people to grok a future scenario and care about it unless there is some evocative description of it!).
I’d also hope there was some way to distribute this story more broadly than just on LessWrong and my personal blog. Ted Chiang and the Arrival movie got lots of people exposed to the principle of least action—no small feat. It’s time for the perception of AI risk to break out of decades of Terminator comparisons, and move towards a basket of good fictional examples that memorably demonstrate subtle concepts.