Why was the AI Alignment community so unprepared for engaging with the wider world when the moment finally came?
In 2022, I think it was becoming clear that there’d be a huge flood of interest. Why did I think this? Here are some reasons: I’ve long thought that once MMLU performance crosses a threshold, Google would start to view AI as an existential threat to their search engine, and it seemed like in 2023 that threshold would be crossed. Second, at a rich person’s party, there were many highly plugged-in elites who were starting to get much more anxious about AI (this was before ChatGPT), which updated me that the tide may turn soon.
Since I believed the interest would shift so much, I changed how I spent my time a lot in 2022: I started doing substantially less technical work to instead work on outreach and orienting documents. Here are several projects I did, some for targeted for the expert community and some targeted towards the general public:
We ran an AI arguments writing competition. After seeing that we could not crowdsource AI risk writing to the community through contests last year, I also started work on An Overview of Catastrophic Risks last winter. We had a viable draft several in April, but then I decided to restructure it, which required rewriting it and making it longer. This document was partly a synthesis of the submissions from the first round of the AI arguments competition, so fortunately the competition did not go to waste. Apologies the document took so long.
Last summer and fall, I worked on explaining a different AI risk to a lay audience in Natural Selection Favors AIs over Humans (apparently this doom path polls much better than treacherous turn stories; I held onto the finished paper for months and waited for GPT-4′s release before releasing it to have good timing).
X-Risk Analysis for AI Research tries to systematically articulate how to analyze AI research’s relation to x-risk for a technical audience. It was my first go at writing about AI x-risk for the ML research community. I recognize this paper was around a year ahead of its time and maybe I should have held onto it to release it later.
Finally, after a conversation with Kelsey Piper and the aforementioned party, I was inspired to work on a textbook An Introduction to AI Safety, Ethics, and Society. This is by far the largest writing project I’ve been a part of. Currently, the only way to become an AI x-risk expert is to live in Berkeley. I want to reduce this barrier as much as possible, relate AI risk to existing literatures, and let people have a more holistic understanding of AI risk (I think people should have a basic understanding of all of corrigibility, international coordination for AI, deception, etc.). This book is not an ML PhD topics book; it’s more to give generalists good models. The textbook’s contents will start to be released section-by-section on a daily basis starting late this month or next month. Normally textbooks take several years to make, so I’m happy this will be out relatively quickly.
One project we only started in 2023 is newsletter, so we can’t claim prescience for that.
If you want more AI risk outputs, CAIS is funding-constrained and is currently fundraising for a writer.
This seems like an impressive level of successfully betting on future trends before they became obvious.
apparently this doom path polls much better than treacherous turn stories
Are you talking about literal polling here? Are there actual numbers on what doom stories the public finds more and less plausible, and with what exact audience?
I held onto the finished paper for months and waited for GPT-4′s release before releasing it to have good timing
[...]
I recognize this paper was around a year ahead of its time and maybe I should have held onto it to release it later.
It’s interesting that paper timing is so important. I’d have guessed earlier is better (more time for others to build on it, the ideas to seep into the field, and presumably gives more “academic street cred”), and any publicity boost from a recent paper (e.g. journalists more likely to be interested or whatever) could mostly be recovered later by just pushing it again when it becomes relevant (e.g. “interview with scientists who predicted X / thought about Y already a year ago” seems pretty journalist-y).
Currently, the only way to become an AI x-risk expert is to live in Berkeley.
There’s an underlying gist here that I agree with, but the this point seems too strong; I don’t think there is literally no one who counts as an expert who hasn’t lived in the Bay, let alone Berkeley alone. I would maybe buy it if the claim were about visiting.
In 2022, I think it was becoming clear that there’d be a huge flood of interest. Why did I think this? Here are some reasons: I’ve long thought that once MMLU performance crosses a threshold, Google would start to view AI as an existential threat to their search engine, and it seemed like in 2023 that threshold would be crossed. Second, at a rich person’s party, there were many highly plugged-in elites who were starting to get much more anxious about AI (this was before ChatGPT), which updated me that the tide may turn soon.
Since I believed the interest would shift so much, I changed how I spent my time a lot in 2022: I started doing substantially less technical work to instead work on outreach and orienting documents. Here are several projects I did, some for targeted for the expert community and some targeted towards the general public:
We ran an AI arguments writing competition. After seeing that we could not crowdsource AI risk writing to the community through contests last year, I also started work on An Overview of Catastrophic Risks last winter. We had a viable draft several in April, but then I decided to restructure it, which required rewriting it and making it longer. This document was partly a synthesis of the submissions from the first round of the AI arguments competition, so fortunately the competition did not go to waste. Apologies the document took so long.
Last summer and fall, I worked on explaining a different AI risk to a lay audience in Natural Selection Favors AIs over Humans (apparently this doom path polls much better than treacherous turn stories; I held onto the finished paper for months and waited for GPT-4′s release before releasing it to have good timing).
X-Risk Analysis for AI Research tries to systematically articulate how to analyze AI research’s relation to x-risk for a technical audience. It was my first go at writing about AI x-risk for the ML research community. I recognize this paper was around a year ahead of its time and maybe I should have held onto it to release it later.
Finally, after a conversation with Kelsey Piper and the aforementioned party, I was inspired to work on a textbook An Introduction to AI Safety, Ethics, and Society. This is by far the largest writing project I’ve been a part of. Currently, the only way to become an AI x-risk expert is to live in Berkeley. I want to reduce this barrier as much as possible, relate AI risk to existing literatures, and let people have a more holistic understanding of AI risk (I think people should have a basic understanding of all of corrigibility, international coordination for AI, deception, etc.). This book is not an ML PhD topics book; it’s more to give generalists good models. The textbook’s contents will start to be released section-by-section on a daily basis starting late this month or next month. Normally textbooks take several years to make, so I’m happy this will be out relatively quickly.
One project we only started in 2023 is newsletter, so we can’t claim prescience for that.
If you want more AI risk outputs, CAIS is funding-constrained and is currently fundraising for a writer.
This seems like an impressive level of successfully betting on future trends before they became obvious.
Are you talking about literal polling here? Are there actual numbers on what doom stories the public finds more and less plausible, and with what exact audience?
It’s interesting that paper timing is so important. I’d have guessed earlier is better (more time for others to build on it, the ideas to seep into the field, and presumably gives more “academic street cred”), and any publicity boost from a recent paper (e.g. journalists more likely to be interested or whatever) could mostly be recovered later by just pushing it again when it becomes relevant (e.g. “interview with scientists who predicted X / thought about Y already a year ago” seems pretty journalist-y).
There’s an underlying gist here that I agree with, but the this point seems too strong; I don’t think there is literally no one who counts as an expert who hasn’t lived in the Bay, let alone Berkeley alone. I would maybe buy it if the claim were about visiting.