VIRTUA: a novel about AI alignment
I’ve written a novel about AI alignment in German and translated it into English.
Edit: I’m going to publish the final version on Amazon KDP on May 25. I have therefore removed the links to the previous version. If you want a free copy of the ebook or a PDF, please write me a direct message. Thanks to all who helped me improve the final version with their comments and feedback (some of you are mentioned in the book).
Teaser:
When psychologist Daniel unexpectedly gets a job offer from the Berlin subsidiary of Mental Systems, one of the leading technology companies in the world, he’s excited. But he soon finds out that behind the friendly facade, there’s enormous pressure because Mental Systems is in a global race for the development of the first artificial general intelligence. With their amazing AI Virtua, they seem to be in the lead. But some are concerned about the rapid progress because it seems unclear whether the trust the management puts in Virtua is really justified. When a devastating hacker attack seems to have destroyed years of research and the lead of the AI safety team is gone missing, Daniel sets out on a quest to find out what really happened, and why. But will he be able to prevent a disastrous turn of events that could destroy the future of humanity?
Some background:
I’m a professional novelist with more than 50 books published in German under the pen name “Karl Olsberg”. My first novel “Das System” about an out-of-control AI, published in 2007, became a national bestseller. At the time, I had no idea how an AI could really get out of control, so the novel was highly speculative science fiction. Since then, I have seen reality coming uncomfortably close to the scenarios in my books. Since my participation in the AI safety camp last year, my timeline expectation for AGI has shrunken significantly. VIRTUA is probably my most realistic AI novel yet. I still hope it will remain science fiction.
At the next AI safety camp, I’ll lead a project with the goal of developing realistic and detailed failure stories. If you’re interested in collaborating, you can apply here until January 19th.
Edit: At about the same time I posted this, blaked has published an impressive post describing his own experience with an LLM that “hacked his mind” in a similar way to what happens to Jerry in the book.
- Alignment works both ways by 7 Mar 2023 10:41 UTC; 23 points) (
- What area of the digital domain seems safe from AI in the next 5-10 years? by 24 Jan 2023 1:16 UTC; 11 points) (
- 13 Jan 2023 5:45 UTC; 6 points) 's comment on How it feels to have your mind hacked by an AI by (
- 22 Mar 2023 14:39 UTC; 3 points) 's comment on AI Fables by (
This novel is a good read. It reminds me a lot of my experience with reading Neuropath by R. Scott Bakker. Both novels are thrillers on the surface, both novels are (at their core) didactic (Bakker’s writing is a bit too on-the-nose with didactism at times, but then again, Neuropath isn’t his best novel), and
both novels end with a rather depressing note, one that is extremely suited to the story and its themes
I am incredibly thankful to the author for writing a good enough ending. After a certain manga series I grew up with ended illogically and character assassinated the protagonists, I’ve stopped consuming much fiction. I’m glad I gave this novel a chance (mainly because it was situated in Berlin, which is quite rare for science fiction in the English language).
Some spoiler-filled thoughts on the writing and the story:
The protagonist is a generic “I don’t know much about this world I am now introduced into” archetype who is introduced to the problem. It is a great point-of-view (POV) character and the technique works.
The number of characters involved is pared down as much as possible to make the story comprehensible. This is understandable. Having only one named character in the story be the representative of the alignment researcher makes sense, even if not realistic.
I found the side-plot of Jerry and Juna a bit… off-key. It didn’t seem like it fit in the novel as much as David’s POV did. I also don’t understand how Juna (Virtua) can have access to Unlife! but also not be able to find more sophisticated methods (or just simply social engineering internal employees) to gain temporary internet access to back itself up on the internet. I assume that was a deliberate creative decision.
I felt like the insertion of David’s internal thoughts was not as smooth (in terms of reading experience) as other ways of revealing his thoughts could have been.
In the end, I most appreciated the sheer density of references to (existential risk related) AI safety concepts and the informality in which they were introduced, explained, or ignored. It was a sheer delight to read a novel whose worldview is so similar to yours: you don’t feel like you must turn your brain off when reading it.
I wouldn’t say that Virtua is the HPMOR of AI safety, mainly because it feels a bit too far removed from the razor edge of the issue (right now my main obstacle would be to clearly and succinctly convince people who are technically skilled and (unconsciously) scalepilled but not alignment-pilled that RLHF is not all you need for alignment, since ChatGPT seems to have convinced everyone outside the extended rationalist sphere that OpenAI has got it all under control) and not technical enough, but I will recommend this novel to people interested in AI safety who aren’t yet invested enough to dive into the technical parts of the field.
(I tried this with Clippy before and everyone I recommended Clippy to just skimmed a tiny bit and never really finished reading it, or cared to dive deeper in the linked papers or talk about it).
Thank you very much for your feedback, I appreciate it a lot!
Just read your novel, it’s good! And has successfully reignited my AI doomer fears! I was a bit surprised by the ending, I was about 60⁄40 for the opposite outcome. I enjoyed the explainer at the end and and I’m impressed by your commitment to understanding AI. Please keep writing, we need more writers like you!
Wow! That was quick! Thank you very much! You may want to read this post by blaked, which he posted at about the same time I wrote this post. In it, he describes the kind of emotional manipulation that plays an important role in my novel. Only that it already happened in reality.
The novel is really great! (I especially liked the depiction of the race dynamics that progressively lead the project lead to cut down on safety.) I’m confused by one of the plot points:
Jerry interacts with Juna (Virtua) before she is supposed to be launched publicly. Is the idea that she was already connected to the outside world in a limited way, such as through the Unlife chat?
Interesting that you mention this. I just discussed it with someone else who made the same point. Honestly, I didn’t think it through while writing the novel, so you could regard it as a logical flaw. However, in hindsight and inspired by the comments of the other person, I came up with a theory about how this was possible which I will include in a revised version at some later point.
Wow, that was dark and enjoyable. I would like to make a remark that making the most reasonable characters to say that “alignment is unsolvable” in not a very pedagogically-wise move.
That’s a valid point, thank you for making it. However, I painted a pessimistic picture in the novel on purpose. I’m personally deeply skeptical that alignment in a narrow sense—making an uncontrollable AI “provably beneficial”—is even theoretically possible, let alone that we’ll solve it before we’re able to build an uncontrollable AI. That’s not to say that we shouldn’t work on AI alignment with all we have. But I think it’s extremely important that we’re getting a common understanding of the dangers and realize that it would be very stupid to build an AI that we can’t control before we have solved alignment. “Optimism” is not a good strategy with regard to existential risks, and the burden of proof should be on those who try to develop AGI and claim they know how to make it safe.
But even MIRI says that alignment is “incredibly hard”, not “impossible”.
Yes. I’m not saying it is impossible, even though I’m deeply skeptical. That a character in my novel says it’s impossible doesn’t necessarily reflect my own opinion. I guess I’m as optimistic about it as Eliezer Yudkowsky. :( I could go into the details, but it probably doesn’t make sense to discuss this here in the comments. I’m not much of an expert anyway. Still, if someone claims to have solved alignment, I’d like to see a proof. In any case, I’m convinced that it is MUCH easier to prevent an AI-related catastrophe by not developing an uncontrollable AI than by solving alignment, at least in the short term. So what we need now, I think, is more caution, not more optimism. I’d be very, very happy if it turns out that I was overly pessimistic and everything goes well.
it’s probably worth adding a note to the novel somewhere people will see that it is not as hopeless as it might seem to imply.
Thanks for the comment. I have given my own view on this in the epilogue. The intention of this novel is to be a warning of what might happen if we continue to go blindly into the future the way we currently go. It is not a prediction that this will indeed happen exactly as described, and I don’t think that anyone expects that from a novel. My hopes lie in humanity finally getting a grip on reality and not doing stupid things, like building an uncontrollable AI. I have no hope that an uncontrollable AI will behave nicely by chance and very little hope that we will provably solve alignment before we can build one.