Credit where credit is due: this is much better in terms of sharing one’s models than one could say of Sam Altman, in recent days.
As noted above the footnotes, many people at Anthropic reviewed the essay. I’m surprised that Dario would hire so many people he thinks need to “touch grass” (because they think the scenario he describes in the essay sounds tame), as I’m pretty sure that describes a very large percentage of Anthropic’s first ~150 employees (certainly over 20%, maybe 50%).
My top hypothesis is that this is a snipe meant to signal Dario’s (and Anthropic’s) factional alliance with Serious People; I don’t think Dario actually believes that “less tame” scenarios are fundamentally implausible[1]. Other possibilities that occur to me, with my not very well considered probability estimates:
I’m substantially mistaken about how many early Anthropic employees think “less tame” outcomes are even remotely plausible (20%), and Anthropic did actively try to avoid hiring people with those models early on (1%).
I’m not mistaken about early employee attitudes, but Dario does actually believe AI is extremely likely to be substantially transformative, and extremely unlikely to lead to the “sci-fi”-like scenarios he derides (20%). Conditional on that, he didn’t think it mattered whether his early employees had those models (20%) or might have slightly preferred not, all else equal, but wasn’t that fussed about it compared to recruiting strong technical talent (60%).
I’m just having a lot of trouble reconciling what I know of the beliefs of Anthropic employees, and the things Dario says and implies in this essay. Do Anthropic employees who think less tame outcomes are plausible believe Dario when he says they should “touch grass”? If you don’t feel comfortable answering that question in public, or can’t (due to NDA), please consider whether this is a good situation to be in.
I think the pro-AI people in Silicon Valley are doing a pretty bad job on, let’s say, convincing people that it’s going to be good for them, that it’s going to be good for the average person, that it’s going to be good for our society. And if it all ends up being of some version where humans are headed toward the glue-factory like a horse… man, that probably makes me want to become a luddite too.
I think Amodei did not ask himself “What about my models of the situation would be most relevant to the average person trying to understand the world and the AI industry?” but “What about my models of the situation would be most helpful in building a positive narrative for AI with the average person.” I imagine this is roughly the same algorithm that Altman is running, but Amodei is a much stronger intellectual so is able to write an essay this detailed and thoughtful.
He does start out by saying he thinks & worries a lot about the risks (first paragraph):
I think and talk a lot about the risks of powerful AI. The company I’m the CEO of, Anthropic, does a lot of research on how to reduce these risks… I think that most people are underestimating just how radical the upside of AI could be, just as I think most people are underestimating how bad the risks could be.
He then explains (second paragraph) that the essay is meant to sketch out what things could look like if things go well:
In this essay I try to sketch out what that upside might look like—what a world with powerful AI might look like if everything goes right.
My current belief is that this essay is optimized to be understandable by a much broader audience than any comparable public writing from Anthropic on extinction-level risk.
For instance, did you know that the word ‘extinction’ doesn’t appear anywhere on Anthropic’s or Dario’s websites? Nor do ‘disempower’ or ‘disempowerment’. The words ‘existential’ and ‘existentially’ only come up three times: when describing the work of an external organization (ARC), in one label in a paper, and one mention in the Constitutional AI. In its place they always talk about ‘catastrophic’ risk, which of course for most readers spans a range many orders of magnitude less serious (e.g. damages of $100M). Now, if Amodei doesn’t believe that existential threats are legitimate, then I think there are many people at his organization who have gone there on the trust that it is indeed a primary concern of his and who will be betrayed in that. If he does, as I think more likely, how has he managed to ensure basically no discussion of it on the company website or in its research, yet has published a long narrative of how AI can help with “poverty” “inequality” “peace” “meaning” “health” and other broad positives? This seems to me very likely to be heavilyfiltered sharing of his models and beliefs, with highly distortionary impacts on the rest of the worlds’ models of AI in the positive direction, which is to be expected from this $10B+ company that sells AI products. Rather than (say) Amodei merely getting to things out of order and of course he’ll soon be following-up with just as thorough an account of his models of the existential threats he believes are on the horizon in just as optimized a fashion for broad understanding. And the rest of the organization just never thought to write about the extinction/disempowerment risk explicitly in their various posts and papers.
(I would be interested in a link to whatever the best and broadly-readable piece by Anthropic or its leadership on existential risk from AI is. Some chance it is better than I am modeling it as. I have not listened to any of Amodei’s podcasts, perhaps he speaks more straightforwardly there.)
The reason I think this dynamic exists for the Machines of Loving Grace posts is a combination of 2 reasons:
It’s intentionally not talking about misalignment, and assumes as a premise that the AI we do get is aligned by some method that is low tax enough that basically everyone else also adopts the solution.
You can’t get a lot of nuance/future shock in public facing posts, for the reasons laid out by Raemon here, which summarized is that even in a context where people aren’t adversarial and are just unreliable, it’s very hard to communicate nuanced ideas, and when there are adversarial forces, you really need to avoid giving out too much nuance to your policy, because people will exploit that.
This is explainable by the fact that the essay is a weird mix of both a call to action to bring about a positive vision of an AI future, combined with it also claiming/predicting some important things of what he thinks AI could do.
He is both importantly doing predictions/model sharing in the essay, and also shaping the prediction/scenario to make the positive vision more likely to be true (more cynically, one could argue that it’s merely a narrative optimized for consumption to the broader public where the essay broadly doesn’t have a purpose of being truth-tracking).
(I work at Anthropic.) My read of the “touch grass” comment is informed a lot by the very next sentences in the essay:
But more importantly, tame is good from a societal perspective. I think there’s only so much change people can handle at once, and the pace I’m describing is probably close to the limits of what society can absorb without extreme turbulence.
which I read as saying something like “It’s plausible that things could go much faster than this, but as a prediction about what will actually happen, humanity as a whole probably doesn’t want things to get incredibly crazy so fast, and so we’re likely to see something tamer.” I basically agree with that.
Do Anthropic employees who think less tame outcomes are plausible believe Dario when he says they should “touch grass”?
FWIW, I don’t read the footnote as saying “if you think crazier stuff is possible, touch grass”—I read it as saying “if you think the stuff in this essay is ‘tame’, touch grass”. The stuff in this essay is in fact pretty wild!
That said, I think I have historically underrated questions of how fast things will go given realistic human preferences about the pace of change, and that I might well have updated more in the above direction if I’d chatted with ordinary people about what they want out of the future, so “I needed to touch grass” isn’t a terrible summary. But IMO believing “really crazy scenarios are plausible on short timescales and likely on long timescales” is basically the correct opinion, and to the extent the essay can be read as casting shade on such views it’s wrong to do so. I would have worded this bit of the essay differently.
Re: honesty and signaling, I think it’s true that this essay’s intended audience is not really the crowd that’s already gamed out Mercury disassembly timelines, and its focus is on getting people up to shock level 2 or so rather than SL4, but as far as I know everything in it is an honest reflection of what Dario believes. (I don’t claim any special insight into Dario’s opinions here, just asserting that nothing I’ve seen internally feels in tension with this essay.) Like, it isn’t going out of its way to talk about the crazy stuff, but I don’t read that omission as dishonest.
For my own part:
I think it’s likely that we’ll get nanotech, von Neumann probes, Dyson spheres, computronium planets, acausal trade, etc in the event of aligned AGI.
Whether that stuff happens within the 5-10y timeframe of the essay is much less obvious to me—I’d put it around 30-40% odds conditional on powerful AI from roughly the current paradigm, maybe?
In the other 60-70% of worlds, I think this essay does a fairly good job of describing my 80th percentile expectations (by quality-of-outcome rather than by amount-of-progress).
I would guess that I’m somewhat more Dyson-sphere-pilled than Dario.
I’d be pretty excited to see competing forecasts for what good futures might look like! I found this essay helpful for getting more concrete about my own expectations, and many of my beliefs about good futures look like “X is probably physically possible; X is probably usable-for-good by a powerful civilization; therefore probably we’ll see some X” rather than having any kind of clear narrative about how the path to that point looks.
humanity as a whole probably doesn’t want things to get incredibly crazy so fast, and so we’re likely to see something tamer
Doesn’t this require a pretty strong and unprecedented level of international coordination on stopping an obviously immediately extremely valuable and militarily relevent technology? I think a US backed entente could impose this on the rest of the world, but that would also be an unprecedentedly large effort.
I think this is certainly possible and I hope this level of coordination happens, but I don’t exactly think this is likely in timelines this short.
More minimally, I think the vast majority of readers won’t read this essay and understand that this refers to a world in which there was a massive effort to slow down AI or in which AI was quite surprisingly slow in various ways. (Insofar as this is what Dario thinks.) Dario seems to be trying to make realistic/plausible predictions and often talks about what is “possible” (not what is possible with a desirable level of AI progress), so not mentioning this seems to greatly undermine the prediction aspect of the essay.
I agree it seems unlikely that we’ll see coordination on slowing down before one actor or coalition has a substantial enough lead over other actors that it can enforce such a slowdown unilaterally, but I think it’s reasonably likely that such a lead will arise before things get really insane.
A few different stories under which one might go from aligned “genius in a datacenter” level AI at time t to outcomes merely at the level of weirdness in this essay at t + 5-10y:
The techniques that work to align “genius in a datacenter” level AI don’t scale to wildly superhuman intelligence (eg because they lose some value fidelity from human-generated oversight signals that’s tolerable at one remove but very risky at ten). The alignment problem for serious ASI is quite hard to solve at the mildly superintelligent level, and it genuinely takes a while to work out enough that we can scale up (since the existing AIs, being aligned, won’t design unaligned successors).
If people ask their only-somewhat-superhuman AI what to do next, the AIs say “A bunch of the decisions from this point on hinge on pretty subtle philosophical questions, and frankly it doesn’t seem like you guys have figured all this out super well, have you heard of this thing called a long reflection?” That’s what I’d say if I were a million copies of me in a datacenter advising a 2024-era US government on what to do about Dyson swarms!
A leading actor uses their AI to ensure continued strategic dominance and prevent competing AI projects from posing a meaningful threat. Having done so, they just… don’t really want crazy things to happen really fast, because the actor in question is mostly composed of random politicians or whatever. (I’m personally sympathetic to astronomical waste arguments, but it’s not clear to me that people likely to end up with the levers of power here are.)
The serial iteration times and experimentation loops are just kinda slow and annoying, and mildly-superhuman AI isn’t enough to circumvent experimentation time bottlenecks (some of which end up being relatively slow), and there are stupid zoning restrictions on the land you want to use for datacenters, and some regulation adds lots of mandatory human overhead to some critical iteration loop, etc.
This isn’t a claim that maximal-intelligence-per-cubic-meter ASI initialized in one datacenter would face long delays in making efficient use of its lightcone, just that it might be tough for a not-that-much-better-than-human AGI that’s aligned and trying to respect existing regulations and so on to scale itself all that rapidly.
Among the tech unlocked in relatively early-stage AGI is better coordination, and that helps Earth get out of unsavory race dynamics and decide to slow down.
The alignment tax at the superhuman level is pretty steep, and doing self-improvement while preserving alignment goes much slower than unrestricted self-improvement would; since at this point we have many fewer ongoing moral catastrophes (eg everyone who wants to be cryopreserved is, we’ve transitioned to excellent cheap lab-grown meat), there’s little cost to proceeding very cautiously.
This is sort of a continuous version of the first bullet point with a finite rather than infinite alignment tax.
All that said, upon reflection I think I was probably lowballing the odds of crazy stuff on the 10y timescale, and I’d go to more like 50-60% that we’re seeing mind uploads and Kardashev level 1.5-2 civilizations etc. a decade out from the first powerful AIs.
I do think it’s fair to call out the essay for not highlighting the ways in which it might be lowballing things or rolling in an assumption of deliberate slowdown; I’d rather it have given more of a nod to these considerations and made the conditions of its prediction clearer.
“It’s plausible that things could go much faster than this, but as a prediction about what will actually happen, humanity as a whole probably doesn’t want things to get incredibly crazy so fast, and so we’re likely to see something tamer.” I basically agree with that.
I feel confused about how this squares with Dario’s view that AI is “inevitable,” and “driven by powerful market forces.” Like, if humanity starts producing a technology which makes practically all aspects of life better, the idea is that this will just… stop? I’m sure some people will be scared of how fast it’s going, but it’s hard for me to see the case for the market in aggregate incentivizing less of a technology which fixes ~all problems and creates tremendous value. Maybe the idea, instead, is that governments will step in...? Which seems plausible to me, but as Ryan notes, Dario doesn’t say this.
Credit where credit is due: this is much better in terms of sharing one’s models than one could say of Sam Altman, in recent days.
As noted above the footnotes, many people at Anthropic reviewed the essay. I’m surprised that Dario would hire so many people he thinks need to “touch grass” (because they think the scenario he describes in the essay sounds tame), as I’m pretty sure that describes a very large percentage of Anthropic’s first ~150 employees (certainly over 20%, maybe 50%).
My top hypothesis is that this is a snipe meant to signal Dario’s (and Anthropic’s) factional alliance with Serious People; I don’t think Dario actually believes that “less tame” scenarios are fundamentally implausible[1]. Other possibilities that occur to me, with my not very well considered probability estimates:
I’m substantially mistaken about how many early Anthropic employees think “less tame” outcomes are even remotely plausible (20%), and Anthropic did actively try to avoid hiring people with those models early on (1%).
I’m not mistaken about early employee attitudes, but Dario does actually believe AI is extremely likely to be substantially transformative, and extremely unlikely to lead to the “sci-fi”-like scenarios he derides (20%). Conditional on that, he didn’t think it mattered whether his early employees had those models (20%) or might have slightly preferred not, all else equal, but wasn’t that fussed about it compared to recruiting strong technical talent (60%).
I’m just having a lot of trouble reconciling what I know of the beliefs of Anthropic employees, and the things Dario says and implies in this essay. Do Anthropic employees who think less tame outcomes are plausible believe Dario when he says they should “touch grass”? If you don’t feel comfortable answering that question in public, or can’t (due to NDA), please consider whether this is a good situation to be in.
He has not, as far as I know, deigned to offer any public argument on the subject.
I mean I guess this is literally true, but to be clear I think it’s broadly not much less deceptive (edit: or at least, ‘filtered’).
I remind you of this Thiel quote:
I think Amodei did not ask himself “What about my models of the situation would be most relevant to the average person trying to understand the world and the AI industry?” but “What about my models of the situation would be most helpful in building a positive narrative for AI with the average person.” I imagine this is roughly the same algorithm that Altman is running, but Amodei is a much stronger intellectual so is able to write an essay this detailed and thoughtful.
He does start out by saying he thinks & worries a lot about the risks (first paragraph):
He then explains (second paragraph) that the essay is meant to sketch out what things could look like if things go well:
I think this is a coherent thing to do?
My current belief is that this essay is optimized to be understandable by a much broader audience than any comparable public writing from Anthropic on extinction-level risk.
For instance, did you know that the word ‘extinction’ doesn’t appear anywhere on Anthropic’s or Dario’s websites? Nor do ‘disempower’ or ‘disempowerment’. The words ‘existential’ and ‘existentially’ only come up three times: when describing the work of an external organization (ARC), in one label in a paper, and one mention in the Constitutional AI. In its place they always talk about ‘catastrophic’ risk, which of course for most readers spans a range many orders of magnitude less serious (e.g. damages of $100M). Now, if Amodei doesn’t believe that existential threats are legitimate, then I think there are many people at his organization who have gone there on the trust that it is indeed a primary concern of his and who will be betrayed in that. If he does, as I think more likely, how has he managed to ensure basically no discussion of it on the company website or in its research, yet has published a long narrative of how AI can help with “poverty” “inequality” “peace” “meaning” “health” and other broad positives? This seems to me very likely to be heavily filtered sharing of his models and beliefs, with highly distortionary impacts on the rest of the worlds’ models of AI in the positive direction, which is to be expected from this $10B+ company that sells AI products. Rather than (say) Amodei merely getting to things out of order and of course he’ll soon be following-up with just as thorough an account of his models of the existential threats he believes are on the horizon in just as optimized a fashion for broad understanding. And the rest of the organization just never thought to write about the extinction/disempowerment risk explicitly in their various posts and papers.
(I would be interested in a link to whatever the best and broadly-readable piece by Anthropic or its leadership on existential risk from AI is. Some chance it is better than I am modeling it as. I have not listened to any of Amodei’s podcasts, perhaps he speaks more straightforwardly there.)
Added: As a small contrast, OpenAI mentions extinction and human disempowerment directly, in the 2nd paragraph on their Superalignment page, and an OpenAI blogpost by Altman links to a Karnofsky Cold Takes piece titled “AI Could Defeat All Of Us Combined”. Altman also wrote two posts in 2014 on the topic of existential threats from Machine Intelligence. I would be interested to know the most direct things that Amodei has published about the topic.
There is Dario’s written testimony before Congress, which mentions existential risk as a serious possibility: https://www.judiciary.senate.gov/imo/media/doc/2023-07-26_-_testimony_-_amodei.pdf
He also signed the CAIS statement on x-risk: https://www.safe.ai/work/statement-on-ai-risk
The reason I think this dynamic exists for the Machines of Loving Grace posts is a combination of 2 reasons:
It’s intentionally not talking about misalignment, and assumes as a premise that the AI we do get is aligned by some method that is low tax enough that basically everyone else also adopts the solution.
You can’t get a lot of nuance/future shock in public facing posts, for the reasons laid out by Raemon here, which summarized is that even in a context where people aren’t adversarial and are just unreliable, it’s very hard to communicate nuanced ideas, and when there are adversarial forces, you really need to avoid giving out too much nuance to your policy, because people will exploit that.
See here for full story:
https://www.lesswrong.com/posts/4ZvJab25tDebB8FGE/you-get-about-five-words#tREaGcLsrtdz3WHnd
The dynamic I want explaining is why it persists over the entire written publications by Anthropic, not this one post.
This is explainable by the fact that the essay is a weird mix of both a call to action to bring about a positive vision of an AI future, combined with it also claiming/predicting some important things of what he thinks AI could do.
He is both importantly doing predictions/model sharing in the essay, and also shaping the prediction/scenario to make the positive vision more likely to be true (more cynically, one could argue that it’s merely a narrative optimized for consumption to the broader public where the essay broadly doesn’t have a purpose of being truth-tracking).
It’s a confusing essay, ultimately.
(I work at Anthropic.) My read of the “touch grass” comment is informed a lot by the very next sentences in the essay:
which I read as saying something like “It’s plausible that things could go much faster than this, but as a prediction about what will actually happen, humanity as a whole probably doesn’t want things to get incredibly crazy so fast, and so we’re likely to see something tamer.” I basically agree with that.
FWIW, I don’t read the footnote as saying “if you think crazier stuff is possible, touch grass”—I read it as saying “if you think the stuff in this essay is ‘tame’, touch grass”. The stuff in this essay is in fact pretty wild!
That said, I think I have historically underrated questions of how fast things will go given realistic human preferences about the pace of change, and that I might well have updated more in the above direction if I’d chatted with ordinary people about what they want out of the future, so “I needed to touch grass” isn’t a terrible summary. But IMO believing “really crazy scenarios are plausible on short timescales and likely on long timescales” is basically the correct opinion, and to the extent the essay can be read as casting shade on such views it’s wrong to do so. I would have worded this bit of the essay differently.
Re: honesty and signaling, I think it’s true that this essay’s intended audience is not really the crowd that’s already gamed out Mercury disassembly timelines, and its focus is on getting people up to shock level 2 or so rather than SL4, but as far as I know everything in it is an honest reflection of what Dario believes. (I don’t claim any special insight into Dario’s opinions here, just asserting that nothing I’ve seen internally feels in tension with this essay.) Like, it isn’t going out of its way to talk about the crazy stuff, but I don’t read that omission as dishonest.
For my own part:
I think it’s likely that we’ll get nanotech, von Neumann probes, Dyson spheres, computronium planets, acausal trade, etc in the event of aligned AGI.
Whether that stuff happens within the 5-10y timeframe of the essay is much less obvious to me—I’d put it around 30-40% odds conditional on powerful AI from roughly the current paradigm, maybe?
In the other 60-70% of worlds, I think this essay does a fairly good job of describing my 80th percentile expectations (by quality-of-outcome rather than by amount-of-progress).
I would guess that I’m somewhat more Dyson-sphere-pilled than Dario.
I’d be pretty excited to see competing forecasts for what good futures might look like! I found this essay helpful for getting more concrete about my own expectations, and many of my beliefs about good futures look like “X is probably physically possible; X is probably usable-for-good by a powerful civilization; therefore probably we’ll see some X” rather than having any kind of clear narrative about how the path to that point looks.
Doesn’t this require a pretty strong and unprecedented level of international coordination on stopping an obviously immediately extremely valuable and militarily relevent technology? I think a US backed entente could impose this on the rest of the world, but that would also be an unprecedentedly large effort.
I think this is certainly possible and I hope this level of coordination happens, but I don’t exactly think this is likely in timelines this short.
More minimally, I think the vast majority of readers won’t read this essay and understand that this refers to a world in which there was a massive effort to slow down AI or in which AI was quite surprisingly slow in various ways. (Insofar as this is what Dario thinks.) Dario seems to be trying to make realistic/plausible predictions and often talks about what is “possible” (not what is possible with a desirable level of AI progress), so not mentioning this seems to greatly undermine the prediction aspect of the essay.
I agree it seems unlikely that we’ll see coordination on slowing down before one actor or coalition has a substantial enough lead over other actors that it can enforce such a slowdown unilaterally, but I think it’s reasonably likely that such a lead will arise before things get really insane.
A few different stories under which one might go from aligned “genius in a datacenter” level AI at time t to outcomes merely at the level of weirdness in this essay at t + 5-10y:
The techniques that work to align “genius in a datacenter” level AI don’t scale to wildly superhuman intelligence (eg because they lose some value fidelity from human-generated oversight signals that’s tolerable at one remove but very risky at ten). The alignment problem for serious ASI is quite hard to solve at the mildly superintelligent level, and it genuinely takes a while to work out enough that we can scale up (since the existing AIs, being aligned, won’t design unaligned successors).
If people ask their only-somewhat-superhuman AI what to do next, the AIs say “A bunch of the decisions from this point on hinge on pretty subtle philosophical questions, and frankly it doesn’t seem like you guys have figured all this out super well, have you heard of this thing called a long reflection?” That’s what I’d say if I were a million copies of me in a datacenter advising a 2024-era US government on what to do about Dyson swarms!
A leading actor uses their AI to ensure continued strategic dominance and prevent competing AI projects from posing a meaningful threat. Having done so, they just… don’t really want crazy things to happen really fast, because the actor in question is mostly composed of random politicians or whatever. (I’m personally sympathetic to astronomical waste arguments, but it’s not clear to me that people likely to end up with the levers of power here are.)
The serial iteration times and experimentation loops are just kinda slow and annoying, and mildly-superhuman AI isn’t enough to circumvent experimentation time bottlenecks (some of which end up being relatively slow), and there are stupid zoning restrictions on the land you want to use for datacenters, and some regulation adds lots of mandatory human overhead to some critical iteration loop, etc.
This isn’t a claim that maximal-intelligence-per-cubic-meter ASI initialized in one datacenter would face long delays in making efficient use of its lightcone, just that it might be tough for a not-that-much-better-than-human AGI that’s aligned and trying to respect existing regulations and so on to scale itself all that rapidly.
Among the tech unlocked in relatively early-stage AGI is better coordination, and that helps Earth get out of unsavory race dynamics and decide to slow down.
The alignment tax at the superhuman level is pretty steep, and doing self-improvement while preserving alignment goes much slower than unrestricted self-improvement would; since at this point we have many fewer ongoing moral catastrophes (eg everyone who wants to be cryopreserved is, we’ve transitioned to excellent cheap lab-grown meat), there’s little cost to proceeding very cautiously.
This is sort of a continuous version of the first bullet point with a finite rather than infinite alignment tax.
All that said, upon reflection I think I was probably lowballing the odds of crazy stuff on the 10y timescale, and I’d go to more like 50-60% that we’re seeing mind uploads and Kardashev level 1.5-2 civilizations etc. a decade out from the first powerful AIs.
I do think it’s fair to call out the essay for not highlighting the ways in which it might be lowballing things or rolling in an assumption of deliberate slowdown; I’d rather it have given more of a nod to these considerations and made the conditions of its prediction clearer.
I feel confused about how this squares with Dario’s view that AI is “inevitable,” and “driven by powerful market forces.” Like, if humanity starts producing a technology which makes practically all aspects of life better, the idea is that this will just… stop? I’m sure some people will be scared of how fast it’s going, but it’s hard for me to see the case for the market in aggregate incentivizing less of a technology which fixes ~all problems and creates tremendous value. Maybe the idea, instead, is that governments will step in...? Which seems plausible to me, but as Ryan notes, Dario doesn’t say this.