I think the pro-AI people in Silicon Valley are doing a pretty bad job on, let’s say, convincing people that it’s going to be good for them, that it’s going to be good for the average person, that it’s going to be good for our society. And if it all ends up being of some version where humans are headed toward the glue-factory like a horse… man, that probably makes me want to become a luddite too.
I think Amodei did not ask himself “What about my models of the situation would be most relevant to the average person trying to understand the world and the AI industry?” but “What about my models of the situation would be most helpful in building a positive narrative for AI with the average person.” I imagine this is roughly the same algorithm that Altman is running, but Amodei is a much stronger intellectual so is able to write an essay this detailed and thoughtful.
He does start out by saying he thinks & worries a lot about the risks (first paragraph):
I think and talk a lot about the risks of powerful AI. The company I’m the CEO of, Anthropic, does a lot of research on how to reduce these risks… I think that most people are underestimating just how radical the upside of AI could be, just as I think most people are underestimating how bad the risks could be.
He then explains (second paragraph) that the essay is meant to sketch out what things could look like if things go well:
In this essay I try to sketch out what that upside might look like—what a world with powerful AI might look like if everything goes right.
My current belief is that this essay is optimized to be understandable by a much broader audience than any comparable public writing from Anthropic on extinction-level risk.
For instance, did you know that the word ‘extinction’ doesn’t appear anywhere on Anthropic’s or Dario’s websites? Nor do ‘disempower’ or ‘disempowerment’. The words ‘existential’ and ‘existentially’ only come up three times: when describing the work of an external organization (ARC), in one label in a paper, and one mention in the Constitutional AI. In its place they always talk about ‘catastrophic’ risk, which of course for most readers spans a range many orders of magnitude less serious (e.g. damages of $100M). Now, if Amodei doesn’t believe that existential threats are legitimate, then I think there are many people at his organization who have gone there on the trust that it is indeed a primary concern of his and who will be betrayed in that. If he does, as I think more likely, how has he managed to ensure basically no discussion of it on the company website or in its research, yet has published a long narrative of how AI can help with “poverty” “inequality” “peace” “meaning” “health” and other broad positives? This seems to me very likely to be heavilyfiltered sharing of his models and beliefs, with highly distortionary impacts on the rest of the worlds’ models of AI in the positive direction, which is to be expected from this $10B+ company that sells AI products. Rather than (say) Amodei merely getting to things out of order and of course he’ll soon be following-up with just as thorough an account of his models of the existential threats he believes are on the horizon in just as optimized a fashion for broad understanding. And the rest of the organization just never thought to write about the extinction/disempowerment risk explicitly in their various posts and papers.
(I would be interested in a link to whatever the best and broadly-readable piece by Anthropic or its leadership on existential risk from AI is. Some chance it is better than I am modeling it as. I have not listened to any of Amodei’s podcasts, perhaps he speaks more straightforwardly there.)
The reason I think this dynamic exists for the Machines of Loving Grace posts is a combination of 2 reasons:
It’s intentionally not talking about misalignment, and assumes as a premise that the AI we do get is aligned by some method that is low tax enough that basically everyone else also adopts the solution.
You can’t get a lot of nuance/future shock in public facing posts, for the reasons laid out by Raemon here, which summarized is that even in a context where people aren’t adversarial and are just unreliable, it’s very hard to communicate nuanced ideas, and when there are adversarial forces, you really need to avoid giving out too much nuance to your policy, because people will exploit that.
This is explainable by the fact that the essay is a weird mix of both a call to action to bring about a positive vision of an AI future, combined with it also claiming/predicting some important things of what he thinks AI could do.
He is both importantly doing predictions/model sharing in the essay, and also shaping the prediction/scenario to make the positive vision more likely to be true (more cynically, one could argue that it’s merely a narrative optimized for consumption to the broader public where the essay broadly doesn’t have a purpose of being truth-tracking).
I mean I guess this is literally true, but to be clear I think it’s broadly not much less deceptive (edit: or at least, ‘filtered’).
I remind you of this Thiel quote:
I think Amodei did not ask himself “What about my models of the situation would be most relevant to the average person trying to understand the world and the AI industry?” but “What about my models of the situation would be most helpful in building a positive narrative for AI with the average person.” I imagine this is roughly the same algorithm that Altman is running, but Amodei is a much stronger intellectual so is able to write an essay this detailed and thoughtful.
He does start out by saying he thinks & worries a lot about the risks (first paragraph):
He then explains (second paragraph) that the essay is meant to sketch out what things could look like if things go well:
I think this is a coherent thing to do?
My current belief is that this essay is optimized to be understandable by a much broader audience than any comparable public writing from Anthropic on extinction-level risk.
For instance, did you know that the word ‘extinction’ doesn’t appear anywhere on Anthropic’s or Dario’s websites? Nor do ‘disempower’ or ‘disempowerment’. The words ‘existential’ and ‘existentially’ only come up three times: when describing the work of an external organization (ARC), in one label in a paper, and one mention in the Constitutional AI. In its place they always talk about ‘catastrophic’ risk, which of course for most readers spans a range many orders of magnitude less serious (e.g. damages of $100M). Now, if Amodei doesn’t believe that existential threats are legitimate, then I think there are many people at his organization who have gone there on the trust that it is indeed a primary concern of his and who will be betrayed in that. If he does, as I think more likely, how has he managed to ensure basically no discussion of it on the company website or in its research, yet has published a long narrative of how AI can help with “poverty” “inequality” “peace” “meaning” “health” and other broad positives? This seems to me very likely to be heavily filtered sharing of his models and beliefs, with highly distortionary impacts on the rest of the worlds’ models of AI in the positive direction, which is to be expected from this $10B+ company that sells AI products. Rather than (say) Amodei merely getting to things out of order and of course he’ll soon be following-up with just as thorough an account of his models of the existential threats he believes are on the horizon in just as optimized a fashion for broad understanding. And the rest of the organization just never thought to write about the extinction/disempowerment risk explicitly in their various posts and papers.
(I would be interested in a link to whatever the best and broadly-readable piece by Anthropic or its leadership on existential risk from AI is. Some chance it is better than I am modeling it as. I have not listened to any of Amodei’s podcasts, perhaps he speaks more straightforwardly there.)
Added: As a small contrast, OpenAI mentions extinction and human disempowerment directly, in the 2nd paragraph on their Superalignment page, and an OpenAI blogpost by Altman links to a Karnofsky Cold Takes piece titled “AI Could Defeat All Of Us Combined”. Altman also wrote two posts in 2014 on the topic of existential threats from Machine Intelligence. I would be interested to know the most direct things that Amodei has published about the topic.
There is Dario’s written testimony before Congress, which mentions existential risk as a serious possibility: https://www.judiciary.senate.gov/imo/media/doc/2023-07-26_-_testimony_-_amodei.pdf
He also signed the CAIS statement on x-risk: https://www.safe.ai/work/statement-on-ai-risk
The reason I think this dynamic exists for the Machines of Loving Grace posts is a combination of 2 reasons:
It’s intentionally not talking about misalignment, and assumes as a premise that the AI we do get is aligned by some method that is low tax enough that basically everyone else also adopts the solution.
You can’t get a lot of nuance/future shock in public facing posts, for the reasons laid out by Raemon here, which summarized is that even in a context where people aren’t adversarial and are just unreliable, it’s very hard to communicate nuanced ideas, and when there are adversarial forces, you really need to avoid giving out too much nuance to your policy, because people will exploit that.
See here for full story:
https://www.lesswrong.com/posts/4ZvJab25tDebB8FGE/you-get-about-five-words#tREaGcLsrtdz3WHnd
The dynamic I want explaining is why it persists over the entire written publications by Anthropic, not this one post.
This is explainable by the fact that the essay is a weird mix of both a call to action to bring about a positive vision of an AI future, combined with it also claiming/predicting some important things of what he thinks AI could do.
He is both importantly doing predictions/model sharing in the essay, and also shaping the prediction/scenario to make the positive vision more likely to be true (more cynically, one could argue that it’s merely a narrative optimized for consumption to the broader public where the essay broadly doesn’t have a purpose of being truth-tracking).
It’s a confusing essay, ultimately.