Katja, many thanks for writing this, and Oliver, thanks for this comment pointing out that everyday people are in fact worried about AI x-risk. Since around 2017 when I left MIRI to rejoin academia, I have been trying continually to point out that everyday people are able to easily understand the case for AI x-risk, and that it’s incorrect to assume the existence of AI x-risk can only be understood by a very small and select group of people. My arguments have often been basically the same as yours here: in my case, informal conversations with Uber drivers, random academics, and people at random public social events. Plus, the argument is very simple: If things are smarter than us, they can outsmart us and cause us trouble. It’s always seemed strange to say there’s an “inferential gap” of substance here.
However, for some reason, the idea that people outside the LessWrong community might recognize the existence of AI x-risk — and therefore be worth coordinating with on the issue — has felt not only poorly received on LessWrong, but also fraught to even suggest. For instance, I tried to point it out in this previous post:
I wrote the following, targeting multiple LessWrong-adjacent groups in the EA/rationality communities who thought “pivotal acts” with AGI were the only sensible way to reduce AI x-risk:
In fact, before you get to AGI, your company will probably develop other surprising capabilities, and you can demonstrate those capabilities to neutral-but-influential outsiders who previously did not believe those capabilities were possible or concerning. In other words, outsiders can start to help you implement helpful regulatory ideas, rather than you planning to do it all on your own by force at the last minute using a super-powerful AI system.
That particular statement was very poorly received, with a 139-karma retort from John Wentworth arguing,
What exactly is the model by which some AI organization demonstrating AI capabilities will lead to world governments jointly preventing scary AI from being built, in a world which does not actually ban gain-of-function research?
I’m not sure what’s going on here, but it seems to me like the idea of coordinating with “outsiders” or placing trust or hope in judgement of “outsiders” has been a bit of taboo here, and that arguments that outsiders are dumb or wrong or can’t be trusted will reliably get a lot of cheering in the form of Karma.
Thankfully, it now also seems to me that perhaps the core LessWrong team has started to think that ideas from outsiders matter more to the LessWrong community’s epistemics and/or ability to get things done than previously represented, such as by including material written outside LessWrong in the 2021 LessWrong review posted just a few weeks ago, for the first time:
I consider this a move in a positive direction, but I am wondering if I can draw the LessWrong team’s attention to a more serious trend here. @Oliver, @Raemon, @Ruby, and @Ben Pace, and others engaged in curating and fostering intellectual progress on LessWrong:
Could it be that the LessWrong community, or the EA community, or the rationality community, has systematically discounted the opinions and/or agency of people outside that community, in a way that has lead the community to plan more drastic actions in the world than would otherwise be reasonable if outsiders of that community could also be counted upon to take reasonable actions?
This is a leading question, and my gut and deliberate reasoning have both been screaming “yes” at me for about 5 or 6 years straight, but I am trying to get you guys to take a fresh look at this hypothesis now, in question-form. Thanks in any case for considering it.
The question feels leading enough that I don’t really know how to respond. Many of these sentences sound pretty crazy to me, so I feel like I primarily want to express frustration and confusion that you assign those sentences to me or “most of the LessWrong community”.
However, for some reason, the idea that people outside the LessWrong community might recognize the existence of AI x-risk — and therefore be worth coordinating with on the issue — has felt not only poorly received on LessWrong, but also fraught to even suggest. For instance, I tried to point it out in this previous post:
I think John Wentworth’s question is indeed the obvious question to ask. It does really seem like our prior should be that the world will not react particularly sanely here.
I also think it’s really not true that coordination has been “fraught to even suggest”. I think it’s been suggested all the time, and certain coordination plans seem more promising than others. Like, even Eliezer was for a long time apparently thinking that Deepmind having a monopoly on AGI development was great and something to be protected, which very much involves coordinating with people outside of the LessWrong community.
The same is true for whether “outsiders might recognize the existence of AI x-risk”. Of course outsiders might recognize the existence of AI X-risk. I don’t think this is uncontroversial or disputed. The question is what happens next. Many people seem to start working on AI capabilities research as the next step, which really doesn’t improve things.
That particular statement was very poorly received, with a 139-karma retort from John Wentworth arguing,
I don’t think your summary of how your statement was received is accurate. Your overall post has ~100 karma, so was received quite positively, and while John responded to this particular statement and was upvoted, I don’t think this really reflects much of a negative judgement that specific statement.
John’s question is indeed the most important question to ask about this kind of plan, and it seems correct for it to be upvoted, even if people agree with the literal sentence it is responding to (your original sentence itself was weak enough that I would be surprised if almost anyone on LW disagreed with its literal meaning, and if there is disagreement, it is with the broader implied statement of the help being useful enough to actually be worth it to have as a main-line plan and to forsake other plans that are more pivotal-act shaped).
Thankfully, it now also seems to me that perhaps the core LessWrong team has started to think that ideas from outsiders matter more to the LessWrong community’s epistemics and/or ability to get things done than previously represented, such as by including material written outside LessWrong in the 2021 LessWrong review posted just a few weeks ago, for the first time:
I don’t know man, I have always put a huge emphasis on reading external writing, learning from others, and doing practical concrete things in the external world. Including external material has been more of a UI question, and I’ve been interested in it for a long while, it just didn’t reach the correct priority level (and also, I think it isn’t really working this year for UI reasons, given that basically no nominated posts are externally linked posts, and it was correct for us to not try without putting in substantially more effort to make it work).
I think if anything I updated over the last few years that rederiving stuff for yourself is more important and trying to coordinate with the external world has less hope, given the extreme way the world was actively hostile to cooperation as well as epistemically highly corrupting during the pandemic. I also think the FTX situation made me think it’s substantially more likely that we will get fucked over again in the future when trying to coordinate with parties that have different norms, and don’t seem to care about honesty and integrity very much. I also think RLHF turning out to be the key to ChatGPT and via that OpenAI getting something like product-market fit and probably making OpenAI $10+ billion dollars in-expectation, showing that actually the “alignment ” team at OpenAI had among the worst consequences of any of the teams in the org, was an update in an opposite direction to me. I think these events also made me less hopeful about the existing large LessWrong/EA/Longtermist/Rationality community as a coherent entity that can internally coordinate, but I think that overall results in a narrowing of my circle of coordination, not a widening.
I have models here, but I guess I feel like your comment is in some ways putting words in my mouth in a way that feels bad to me, and I am interested in explaining my models, but I don’t this comment thread is the right context.
I think there is a real question about whether both me and others in the community have a healthy relationship to the rest of the world. I think the answer is pretty messy. I really don’t think it’s ideal, and indeed think it’s probably quite bad, and I have a ton of different ways I would criticize what is currently happening. But I also really don’t think that the current primary problem with the way the community relates to the rest of the world is underestimating the sanity of the rest of the world. I think mostly I expect us to continue to overestimate the sanity and integrity of most of the world, then get fucked over like we got fucked over by OpenAI or FTX. I think there are ways to relating to the rest of the world that would be much better, but a naive update in the direction of “just trust other people more” would likely make things worse.
Again, I think the question you are raising is crucial, and I have giant warning flags about a bunch of the things that are going on (the foremost one is that it sure really is a time to reflect on your relation to the world when a very prominent member of your community just stole 8 billion dollars of innocent people’s money and committed the largest fraud since Enron), so I do think there are good and important conversations to be had here.
I think mostly I expect us to continue to overestimate the sanity and integrity of most of the world, then get fucked over like we got fucked over by OpenAI or FTX. I think there are ways to relating to the rest of the world that would be much better, but a naive update in the direction of “just trust other people more” would likely make things worse.
[...] Again, I think the question you are raising is crucial, and I have giant warning flags about a bunch of the things that are going on (the foremost one is that it sure really is a time to reflect on your relation to the world when a very prominent member of your community just stole 8 billion dollars of innocent people’s money and committed the largest fraud since Enron), [...]
I very much agree with the sentiment of the second paragraph.
Regarding the first paragraph, my own take is that (many) EAs and rationalists might be wise to trust themselves and their allies less.[1]
The main update of the FTX fiasco (and other events I’ll describe later) I’d make is that perhaps many/most EAs and rationalists aren’t very good at character judgment. They probably trust other EAs and rationalists too readily because they are part of the same tribe and automatically assume that agreeing with noble ideas in the abstract translates to noble behavior in practice.
(To clarify, you personally seem to be good at character judgment, so this message is not directed at you. (I base that mostly on your comments I read about the SBF situation, big kudos for that, btw!)
It seems like a non-trivial fraction of people that joined the EA and rationalist community very early turned out to be of questionable character, and this wasn’t noticed for years by large parts of the community. I have in mind people like Anissimov, Helm, Dill, SBF, Geoff Anders, arguably Vassar—these are just the known ones. Most of them were not just part of the movement, they were allowed to occupy highly influential positions. I don’t know what the base rate for such people is in other movements—it’s plausibly even higher—but as a whole our movements don’t seem to be fantastic at spotting sketchy people quickly. (FWIW, my personal experiences with a sketchy, early EA (not on the above list) inspired this post.)
My own takeaway is that perhaps EAs and rationalists aren’t that much better in terms of integrity than the outside world and—given that we probably have to coordinate with some people to get anything done—I’m now more willing to coordinate with “outsiders” than I was, say, eight years ago.
Though I would be hesitant to spread this message; the kinds of people who should trust themselves and their character judgment less are more likely the ones who will not take this message to heart, and vice versa.
Thanks, Oliver. The biggest update for me here — which made your entire comment worth reading, for me — was that you said this:
I also think it’s really not true that coordination has been “fraught to even suggest”.
I’m surprised that you think that, but have updated on your statement at face value that you in fact do. By contrast, my experience around a bunch common acquaintances of ours has been much the same as Katja’s, like this:
Some people: AI might kill everyone. We should design a godlike super-AI of perfect goodness to prevent that.
Others: wow that sounds extremely ambitious
Some people: yeah but it’s very important and also we are extremely smart so idk it could work
[Work on it for a decade and a half]
Some people: ok that’s pretty hard, we give up
Others: oh huh shouldn’t we maybe try to stop the building of this dangerous AI?
Some people: hmm, that would involve coordinating numerous people—we may be arrogant enough to think that we might build a god-machine that can take over the world and remake it as a paradise, but we aren’t delusional
In fact I think I may have even heard the world “delusional” specifically applied to people working on AI governance (though not by you) for thinking that coordination on AI regulation is possible / valuable / worth pursuing in service of existential safety.
As for the rest of your narrative of what’s been happening in the world, to me it seems like a random mix of statements that are clearly correct (e.g., trying to coordinate with people who don’t care about honestly or integrity will get you screwed) and other statements that seem, as you say,
pretty crazy to me,
and I agree that for the purpose of syncing world models,
I don’t [think] this comment thread is the right context.
Anyway, cheers for giving me some insight into your thinking here.
Critch, I agree it’s easy for most people to understand the case for AI being risky. I think the core argument for concern—that it seems plausibly unsafe to build something far smarter than us—is simple and intuitive, and personally, that simple argument in fact motivates a plurality of my concern. That said:
I think it often takes weirder, less intuitive arguments to address many common objections—e.g., that this seems unlikely to happen within our lifetimes, that intelligence far superior to ours doesn’t even seem possible, that we’re safe because software can’t affect physical reality, that this risk doesn’t seem more pressing than other risks, that alignment seems easy to solve if we just x, etc.
It’s also remarkably easy to convince many people that aliens visit Earth on a regular basis, that the theory of evolution via natural selection is bunk, that lottery tickets are worth buying, etc. So while I definitely think some who engage with these arguments come away having good reason to believe the threat is likely, for values of “good” and “believe” and “likely” at least roughly similar those common around here, I suspect most update something more like their professed belief-in-belief, than their real expectations—and that even many who do update their real expectations do so via symmetric arguments that leave them with poor models of the threat.
These factors make me nervous about strategies that rely heavily on convincing everyday people, or people in government, to care about AI risk, for reasons I don’t think are well described as “systematically discounting their opinions/agency.” Personally, I’ve engaged a lot with people working in various corners of politics and government, and decently much with academics, and I respect and admire many of them, including in ways I rarely admire rationalists or EA’s.
(For example, by my lights, the best ops teams in government are much more competent than the best ops teams around here; the best policy wonks, lawyers, and economists are genuinely really quite smart, and have domain expertise few R/EA’s have without which it’s hard to cause many sorts of plausibly-relevant societal change; perhaps most spicily, I think academics affiliated with the Santa Fe Institute have probably made around as much progress on the alignment problem so far as alignment researchers, without even trying to, and despite being (imo) deeply epistemically confused in a variety of relevant ways).
But there are also a number of respects in which I think rationalists and EA’s tend to far outperform any other group I’m aware of—for example, in having beliefs that actually reflect their expectations, trying seriously to make sure those beliefs are true, being open to changing their mind, thinking probabilistically, “actually trying” to achieve their goals as a behavior distinct from “trying their best,” etc. My bullishness about these traits is why e.g. I live and work around here, and read this website.
And on the whole, I am bullish about this culture. But it’s mostly the relative scarcity of these and similar traits in particular, not my overall level of enthusiasm or respect for other groups, that causes me to worry they wouldn’t take helpful actions if persuaded of AI risk.
My impression is that it’s unusually difficult to figure out how to take actions that reduce AI risk without substantial epistemic skill of a sort people sometimes have around here, but only rarely have elsewhere. On my models, this is mostly because:
There are many more ways to make the situation worse than better;
A number of key considerations are super weird and/or terrifying, such that it’s unusually hard to reason well about them;
It seems easier for people to grok the potential importance of transformative AI, than the potential danger.
My strong prior is that, to accomplish large-scale societal change, you nearly always need to collaborate with people who disagree with you, even about critical points. And I’m sympathetic to the view that this is true here, too; I think some of it probably is. But I think the above features make this more fraught than usual, in a way that makes it easy for people who grok the (simpler) core argument for concern, but not some of the (typically more complex) ancillary considerations, to accidentally end up making the situation even worse.
Here are some examples of (what seem to me like) this happening:
The closest thing I’m aware of to an official US government position on AI risk is described in the 2016 and 2017 National Science and Technology Council reports. I haven’t read all of them, but the parts I have read struck me as a strange mix of claims like “maybe this will be a big deal, like mobile phones were,” and “maybe this will be a big deal, in the sense that life on Earth will cease to exist.” And like, I can definitely imagine explanations for this that don’t much involve the authors misjudging the situation—maybe their aim was more to survey experts than describe their own views, or maybe they were intentionally underplaying the threat for fear of starting an arms race, etc. But I think my lead hypothesis is more that the authors just didn’t actually, viscerally consider that the sentences they were writing might be true, in the sense of describing a reality they might soon inhabit.
I think rationalists and EA’s tend to make this sort of mistake less often, since the “taking beliefs seriously”-style epistemic orientation common around here has the effect of making it easier for people to viscerally grasp that trend lines on graphs and so forth might actually reflect reality. (Like, one frame on EA as a whole, is “an exercise in avoiding the ‘learning about the death of a million feels like a statistic, not a tragedy’ error”). And this makes me at least somewhat more confident they won’t do dumb things upon becoming worried about AI risk, since without this epistemic skill, I think it’s easier to make critical errors like overestimating how much time we have, or underestimating the magnitude or strangeness of the threat.
As I understand it, OpenAI is named what it is because, at least at first, its founders literally hoped to make AGI open source. (Elon Musk: “I think the best defense against the misuse of AI is to empower as many people as possible to have AI. If everyone has AI powers, then there’s not any one person or a small set of individuals who can have AI superpower.”)
By my lights, there are unfortunately a lot of examples of rationalists and EA’s making big mistakes while attempting to reduce AI risk. But it’s at least… hard for me to imagine most of them making this one? Maybe I’m being insufficiently charitable here, but from my perspective, this just fails a really basic “wait, but then what happens next?” sanity check, that I think should have occurred to them more or less immediately, and that I suspect would have to most rationalists and EA’s.
For me, the most striking aspect of the AI Impacts poll, was that all those ML researchers who reported thinking ML had a substantial chance of killing everyone, still research ML. I’m not sure why they do this; I’d guess some of them are convinced for some reason or another that working on it still makes sense, even given that. But my perhaps-uncharitable guess is that most of them actually don’t—that they don’t even have arguments which feel compelling to them that justify their actions, but that they for some reason press on anyway. This too strikes me as a sort of error R/EA’s are less likely to make.
(When Bostrom asked Geoffrey Hinton why he still worked on AI, if he thought governments would likely use it to terrorize people, he replied, “I could give you the usual arguments, but the truth is that the prospect of discovery is too sweet”).
Sam Altman recently suggested, on the topic of whether to slow down AI, that “either we figure out how to make AGI go well or we wait for the asteroid to hit.”
Maybe he was joking, or meant “asteroid” as a stand-in for all potentially civilization-ending threats, or something? But that’s not my guess, because his follow-up comment is about how we need AGI to colonize space, which makes me suspect he actually considers asteroid risk in particular a relevant consideration for deciding when to deploy advanced AI. Which if true, strikes me as… well, more confused than any comment in this thread strikes me. And it seems like the kind of error that might, for example, cause someone to start an org with the hope of reducing existential risk, that mostly just ends up exacerbating it.
Obviously our social network doesn’t have a monopoly on good reasoning, intelligence, or competence, and lord knows it has plenty of its own pathologies. But as I understand it, most of the reason the rationality project exists is to help people reason more clearly about the strange, horrifying problem of AI risk. And I do think it has succeeded to some degree, such that empirically, people with less exposure to this epistemic environment far more often take actions which seem terribly harmful to me.
the best ops teams in government are much more competent than the best ops teams around here;
This is a candidate for the most surprising sentence in the whole comments section! I’d be interested in knowing more about why you believe this. One sort of thing I’d be quite interested in is things you’ve seen government ops teams do fast(even if they’re small things, accomplishments that would surprise many of us in this thread that they could be done so quickly).
Recruitment—in my experience often a weeks long process from start to finish, well oiled and systematic and using all the tips from the handbook on organizational behaviour on selection, often with feedback given too. By comparison, some tech companies can take several months to hire, with lots of ad hoc decision-making, no processes around biases or conflicts of interest, and no feedback.
Happy to give more examples if you want by DM.
I should say my sample size is tiny here—I know one gov dept in depth, one tech company in depth and a handful of other tech companies and gov depts not fully from the inside but just from talking with friends that work there, etc.
I think academics affiliated with the Santa Fe Institute have probably made around as much progress on the alignment problem so far as alignment researchers, without even trying to, and despite being (imo) deeply epistemically confused in a variety of relevant-seeming ways).
This is an important optimistic update, because it implies alignment might be quite easier than we think, given that even under unfavorable circumstances, reasonable progress still gets done.
For me, the most striking aspect of the AI Impacts poll, was that all those ML researchers who reported thinking ML had a substantial chance of someday killing everyone, still research ML. I’m not sure what’s going on with them; I’d guess some of them buy arguments such that their continued work still makes sense somehow, even given that. But my perhaps-uncharitable guess is that most of them don’t—that they don’t even have arguments which feel compelling to them that justify their actions, but that they for some reason press on anyway. This too strikes me as a sort of error R/EA’s are less likely to make.
I think that this isn’t an error in rationality, and instead very different goals drive EAs/LWers compared to AI researchers. A low chance of high utility and a high chance of death is pretty rational to take, assuming you only care about yourself. And this is the default, absent additional assumptions.
From an altruistic perspective, it’s insane to take this risk, especially if you care about the future.
However, for some reason, the idea that people outside the LessWrong community might recognize the existence of AI x-risk — and therefore be worth coordinating with on the issue — has felt not only poorly received on LessWrong, but also fraught to even suggest.
I object to this hyperbolic and unfair accusation. The entire AI Governance field is founded on this idea; this idea is not only fine to suggest, but completely uncontroversial accepted wisdom. That is, if by “this idea” you really mean literally what you said—“people outside the LW community might recognize the existence of AI x-risk and be worth coordinating with on the issue.” Come on.
I am frustated by what appears to me to be constant straw-manning of those who disagree with you on these matters. Just because people disagree with you doesn’t mean there’s a sinister bias at play. I mean, there’s usually all sorts of sinister biases at play at all sides of every dispute, but the way to cut through them isn’t to go around slinging insults at each other about who might be biased, it’s to stay on the object level and sort through the arguments.
This makes sense to me if you feel my comment is meant as a description of you or people-like-you. It is not, and quite the opposite. As I see it, you are not a representative member of the LessWrong community, or at least, not a representative source of the problem I’m trying to point at. For one thing, you are willing to work for OpenAI, which many (dozens of) LessWrong-adjacent people I’ve personally met would consider a betrayal of allegiance to “the community”. Needless to say, the field of AI governance as it exists is not uncontroversially accepted by the people I am reacting to with the above complaint. In fact, I had you in mind as a person I wanted to defend by writing the complaint, because you’re willing to engage and work full-time (seemingly) in good faith with people who do not share many of the most centrally held views of “the community” in question, be it LessWrong, Effective Altruism, or the rationality community.
It would help if you specified which subset of “the community” you’re arguing against. I had a similar reaction to your comment as Daniel did, since in my circles (AI safety researchers in Berkeley), governance tends to be well-respected, and I’d be shocked to encounter the sentiment that working for OpenAI is a “betrayal of allegiance to ‘the community’”.
To be clear, I do think most people who have historically worked on “alignment” at OpenAI have probably caused great harm! And I do think I am broadly in favor of stronger community norms against working at AI capability companies, even in so called “safety positions”. So I do think there is something to the sentiment that Critch is describing.
Agreed! But the words he chose were hyperbolic and unfair. Even an angrier more radical version of Habryka would still endorse “the idea that people outside the LessWrong community might recognize the existence of AI risk.”
Separately from my other reply explaining that you are not the source of what I’m complaining about here, I thought I’d add more color to explain why I think my assessment here is not “hyperbolic”. Specifically, regarding your claim that reducing AI x-risk through coordination is “not only fine to suggest, but completely uncontroversial accepted wisdom”, please see the OP. Perhaps you have not witnessed such conversations yourself, but I have been party to many of these:
Some people: AI might kill everyone. We should design a godlike super-AI of perfect goodness to prevent that.
Others: wow that sounds extremely ambitious
Some people: yeah but it’s very important and also we are extremely smart so idk it could work
[Work on it for a decade and a half]
Some people: ok that’s pretty hard, we give up
Others: oh huh shouldn’t we maybe try to stop the building of this dangerous AI?
Some people: hmm, that would involve coordinating numerous people—we may be arrogant enough to think that we might build a god-machine that can take over the world and remake it as a paradise, but we aren’t delusional
In other words, I’ve seen people in AI governance being called or treated as “delusional” by loads of people (1-2 dozen?) core to the LessWrong community (not you). I wouldn’t say by a majority, but by an influential minority to say the least, and by more people than would be fair to call “just institution X” for any X, or “just person Y and their friends” for any Y. The pattern is strong enough that for me, pointing to governance as an approach to existential safety on LessWrong indeed feels fraught, because influential people (online or offline) will respond to the idea as “delusional” as Katja puts it. Being called delusional is stressful, and hence “fraught”.
@Oliver, the same goes for your way of referring to sentences you disagree with as “crazy”, such as here.
Generally speaking, on the LessWrong blog itself I’ve observed too many instances of people using insults in response to dissenting views on the epistemic health of the LessWrong community, and receiving applause and karma for doing so, for me to think that there’s not a pattern or problem here.
That’s not to say I think LessWrong has this problem worse than other online communities (i.e., using insults or treating people as ‘crazy’ or ‘delusional’ for dissenting or questioning the status quo); only that I think it’s a problem worth addressing, and a problem I see strongly at play on the topic of coordination and governance.
Just to clarify, the statements that I described as crazy were not statements you professed, but statements that you said I or “the LessWrong community” believe. I am not sure whether that got across (since like, in that context it doesn’t really make sense to say I described sentences I disagree with as crazy, since like, I don’t think you believe those sentences either, that’s why you are criticizing them).
It did not get accross! Interesting. Procedurally I still object to calling people’s arguments “crazy”, but selfishly I guess I’m glad they were not my arguments? At a meta level though I’m still concerned that LessWrong culture is too quick to write off views as “crazy”. Even the the “coordination is delusional”-type views that Katja highlights in her post do not seem “crazy” to me, more like misguided or scarred or something, in a way that warrants a closer look but not being called “crazy”.
Seems plausible that LessWrong culture is too quick to write off views as “crazy”, though I have a bunch of conflicting feeling here. Might be worth going into at some point.
I do think there is something pretty qualitatively different about calling a paraphrase or an ITT of my own opinions “crazy” than to call someone’s actual opinion crazy. In-general my sense is for reacting to paraphrases it’s less bad for the social dynamics to give an honest impression and more important to give a blunt evocative reaction, but I’ll still try to clarify more in the future when I am referring to the meat of my interlocutors opinion vs. their representation of my opinion.
That particular statement was very poorly received, with a 139-karma retort from John Wentworth arguing,
What exactly is the model by which some AI organization demonstrating AI capabilities will lead to world governments jointly preventing scary AI from being built, in a world which does not actually ban gain-of-function research?
I’m not sure what’s going on here
So, wait, what’s actually the answer to this question? I read that entire comment thread and didn’t find one. The question seems to me to be a good one!
“What exactly” seems a bit weird type of question. For example, consider nukes: it was hard to predict what exactly is the model by which governments will not blow everyone up after use of nukes in Japan. But also: while the resulting equilibrium is not great, we haven’t died in nuclear WWIII so far.
As in my comment here, if you have a model that simultaneously both explains the fact that governments are funding GoF research right now, and predicts that governments would nevertheless react helpfully to AGI, I’m very interested to hear it. It seems to me that defunding GoF is a dramatically easier problem in practically every way.
The only responses I can think of right now are (1) “Basically nobody in or near government is working hard to defund GoF but people in or near government will be working hard to spur on a helpful response to AGI” (really? if so, what’s upstream of that supposed difference?) or (2) “It’s all very random—who happens to be in what position of power and when, etc.—and GoF is just one example, so we shouldn’t generalize too far from it” (OK maybe, but if so, then can we pile up more examples into a reference class to get a base rate or something? and what are the interventions to improve the odds, and can we also try those same interventions on GoF?)
I think it’s worth updating on the fact that the US government has already launched a massive, disruptive, costly, unprecedented policy of denying AI-training chips to China. I’m not aware of any similar-magnitude measure happening in the GoF domain.
IMO that should end the debate about whether the government will treat AI dev the way it has GoF—it already has moved it to a different reference class.
Some wild speculation on upstream attributes of advanced AI’s reference class that might explain the difference in the USG’s approach:
a perception of new AI as geoeconomically disruptive; that new AI has more obvious natsec-relevant use-cases than GoF; that powerful AI is more culturally salient than powerful bio (“evil robots are scarier than evil germs”).
Not all of these are cause for optimism re: a global ASI ban, but (by selection) they point to governments treating AI “seriously”.
One big difference is GoF currently does not seem that dangerous to governments. If you look on it from a perspective not focusing on the layer of individual humans as agents, but instead states, corporations, memplexes and similar creatures as the agents, GoF maybe does not look that scary? Sure, there was covid, but while it was clearly really bad for humans, it mostly made governments/states relatively stronger.
Taking this difference into account, my model was and still is governments will react to AI.
This does not imply reacting in a helpful way, but I think whether the reaction will be helpful, harmful or just random is actually one of the higher variance parameters, and a point of leverage. (And the common-on-LW stance governments are stupid and evil and you should mostly ignore them is unhelpful in both understanding and influencing the situation.)
Personally I haven’t thought about how strong the analogy to GoF is, but another thing that feels worth noting is that there may be a bunch of other cases where the analogy is similarly strong and where major government efforts aimed at risk-reduction have occurred. And my rough sense is that that’s indeed the case, e.g. some of the examples here.
In general, at least for important questions worth spending time on, it seems very weird to say “You think X will happen, but we should be very confident it won’t because in analogous case Y it didn’t”, without also either (a) checking for other analogous cases or other lines of argument or (b) providing an argument for why this one case is far more relevant evidence than any other available evidence. I do think it totally makes sense to flag the analogous case and to update in light of it, but stopping there and walking away feeling confident in the answer seems very weird.
I haven’t read any of the relevant threads in detail, so perhaps the arguments made are stronger than I imply here, but my guess is they weren’t. And it seems to me that it’s unfortunately decently common for AI risk discussions on LessWrong to involve this pattern I’m sketching here.
(To be clear, all I’m arguing here is that these arguments often seem weak, not that their conclusions are false.)
(This comment is raising an additional point to Jan’s, not disagreeing.)
Update: Oh, I just saw Steve Byrnes also the following in this thread, which I totally agree with:
“[Maybe one could argue] “It’s all very random—who happens to be in what position of power and when, etc.—and GoF is just one example, so we shouldn’t generalize too far from it” (OK maybe, but if so, then can we pile up more examples into a reference class to get a base rate or something? and what are the interventions to improve the odds, and can we also try those same interventions on GoF?)”
What exactly” seems a bit weird type of question. For example, consider nukes: it was hard to predict what exactly is the model by which governments will not blow everyone up after use of nukes in Japan. But also: while the resulting equilibrium is not great, we haven’t died in nuclear WWIII so far.
This would be useful if the main problem was misuse, and while this problem is arguably serious, there is another problem, called the alignment problem, that doesn’t care who uses AGI, only that it exists.
Biotech is probably the best example of technology being slowed down in the manner required, and suffice it to say it only happened because eugenics and anything related to that became taboo after WW2. I obviously don’t want a WW3 to slow down AI progress, but the main criticism remains: The examples of tech that were slowed down in the manner required for alignment required massive death tolls, ala a pivotal act.
The analogy I had in mind is not so much in exact nature of the problem, but in the aspect it’s hard to make explicit precise models of such situations in advance. In case of nukes, consider the fact that smartest minds of the time, like von Neumann or Feynman, spent decent amount of time thinking about the problems, had clever explicit models, and were wrong—in case of von Neumann to the extent that if US followed his advice, they would have launched nuclear armageddon.
I think it’s uncharitable to psychoanalyze why people upvoted John’s comment; his object-level point about GoF seems good and merits an upvote IMO. Really, I don’t know what to make of GoF. It’s not just that governments have failed to ban it, they haven’t even stopped funding it, or in the USA case they stopped funding it and then restarted I think. My mental models can’t explain that. Anyone on the street can immediately understand why GoF is dangerous. GoF is a threat to politicians and national security. GoF has no upsides that stand up to scrutiny, and has no politically-powerful advocates AFAIK. And we’re just getting over a pandemic which consumed an extraordinary amount of money, comfort, lives, and attention for the past couple years, and which was either a direct consequence of GoF research, or at the very least the kind of thing that GoF research could have led to. And yet, here we are, with governments funding GoF research right now. Again, I can’t explain this, and pending a detailed model that can, the best I can do right now is say “Gee I guess I should just be way more cynical about pretty much everything.”
Anyway, back to your post, if Option 1 is unilateral pivotal act and Option 2 is government-supported pivotal outcome, then one ought to try to weigh the pros and cons (particularly, probability of failure) of both options; your post was only about why Option 1 might fail but IIRC didn’t say anything about whether Option 2 might fail too. Maybe every option is doomed and the task is to choose the slightly-less-doomed option, right?
I’m not an expert, and maybe you have a gears-y model in which it’s natural & expected that governments are funding GoF right now and also simultaneously in which it’s natural & expected that the government-sanctioned-EMP-thing story you told in your post is likely to actually happen. (Or at least, less likely to fail than Option 1.) If so, I would be very interested for you to share that model!
I also have found that almost everyone I talk to outside the field of AI has found it obvious that AI could kill us all. They also find it obvious that AI is about to surpass us, and are generally not surprised by my claims of a coming discontinuity; in contrast, almost anyone working in ai thinks I’m crazy. I suspect that people think I’m claiming I can do it, when in fact I’m trying to tell them they are about to do it. it’s really frustrating! also, the majority of opinion in the world doesn’t come from AI researchers.
That said. I cannot state this hard enough: THE COMING DISCONTINUITY WILL NOT WAIT BEHIND REGULATION.
I know of multiple groups who already know what they need to in order to figure it out! regulation will not stop them unless it is broad enough to somehow catch every single person who has tried to participate in creating it, and that is not going to happen, no matter how much the public wishes for it. I don’t believe any form of pivotal act could save humanity. anything that attempts to use control to prevent control will simple cause a cascade of escalatory retaliations, starting with whatever form of attack is used to try to stop ai progress, escalating from accelerationists, escalating from attempted shutdown, possibly an international war aided by ai happening in parallel, and ending with the ai executing the last one.
Your attempts to slow the trickle of sand into the gravity well of increasing thermal efficiency will utterly fail. there are already enough GPUs in the world, and it only takes one. we must solve alignment so hard that it causes the foom, nothing else could possibly save us.
The good news is, alignment is capabilities in a deep way. Solving alignment at full strength would suddenly stabilize AI in a way that makes it much stronger at a micro level, and would simultaneously allow for things like “hey, can you get the carbon out of the air please?” without worry about damaging those inside protected boundaries.
I’m not sure what’s going on here, but it seems to me like the idea of coordinating with “outsiders” or placing trust or hope in judgement of “outsiders” has been a bit of taboo here, and that arguments that outsiders are dumb or wrong or can’t be trusted will reliably get a lot of cheering in the form of Karma.
No, you’re misunderstanding John Wentworth’s comment and then applying that straw man to the rest of less wrong based on the comment’s upvote total. It’s not that laypeople’s can’t understand the dangers inherent in engineered viruses, and that leads to governments continuing to finance and leak them. You can probably convince your Uber driver that lab leaks are bad, too. It’s a lack of ability to translate that understanding into positive regulatory and legal outcomes, instead of completely net negative ones.
Probably this opinion of LWers is shaped by their experience communicating with outsiders. Almost all my attempts to communicate AI x-risk to outsiders, from family members to friends to random acquaintances, have not been understood for sure. Your experience (talking to random people at social events, walking away from you with the thought “AI x-risk is indeed a thing!”, and starting to worry about it in the slightest afterwards) is highly surprising to me. Maybe there is a huge bias in this regard in the Bay Area, where even normal people generally understand and appreciate the power of technology more than in other places, or have had some similar encounters before, or it’s just in the zeitgeist of the place. (My experience is outside the US, primarily with Russians and some Europeans.)
All that being said, ChatGPT (if people have experienced it first-hand) and especially GPT-4 could potentially make communication of the AI x-risk case much easier.
I’ve had >50% hit rate for “this person now takes AI x-risk seriously after one conversation” from people at totally non-EA parties (subculturally alternative/hippeish, in not particularly tech-y parts of the UK). I think it’s mostly about having a good pitch (but not throwing it at them until there is some rapport, ask them about their stuff first), being open to their world, modeling their psychology, and being able to respond to their first few objections clearly and concisely in a way they can frame within their existing world-model.
Edit: Since I’ve been asked in DM:
My usual pitch been something like this. I expect Critch’s version is very useful for the “but why would it be a threat” thing but have not tested it as much myself.
I think being open and curious about them + being very obviously knowledgeable and clear thinking on AI x-risk is basically all of it, with the bonus being having a few core concepts to convey. Truth-seek with them, people can detect when you’re pushing something in epistemically unsound ways, but tend to love it if you’re going into the conversation totally willing to update but very knowledgeable.
Katja, many thanks for writing this, and Oliver, thanks for this comment pointing out that everyday people are in fact worried about AI x-risk. Since around 2017 when I left MIRI to rejoin academia, I have been trying continually to point out that everyday people are able to easily understand the case for AI x-risk, and that it’s incorrect to assume the existence of AI x-risk can only be understood by a very small and select group of people. My arguments have often been basically the same as yours here: in my case, informal conversations with Uber drivers, random academics, and people at random public social events. Plus, the argument is very simple: If things are smarter than us, they can outsmart us and cause us trouble. It’s always seemed strange to say there’s an “inferential gap” of substance here.
However, for some reason, the idea that people outside the LessWrong community might recognize the existence of AI x-risk — and therefore be worth coordinating with on the issue — has felt not only poorly received on LessWrong, but also fraught to even suggest. For instance, I tried to point it out in this previous post:
“Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments
I wrote the following, targeting multiple LessWrong-adjacent groups in the EA/rationality communities who thought “pivotal acts” with AGI were the only sensible way to reduce AI x-risk:
That particular statement was very poorly received, with a 139-karma retort from John Wentworth arguing,
I’m not sure what’s going on here, but it seems to me like the idea of coordinating with “outsiders” or placing trust or hope in judgement of “outsiders” has been a bit of taboo here, and that arguments that outsiders are dumb or wrong or can’t be trusted will reliably get a lot of cheering in the form of Karma.
Thankfully, it now also seems to me that perhaps the core LessWrong team has started to think that ideas from outsiders matter more to the LessWrong community’s epistemics and/or ability to get things done than previously represented, such as by including material written outside LessWrong in the 2021 LessWrong review posted just a few weeks ago, for the first time:
https://www.lesswrong.com/posts/qCc7tm29Guhz6mtf7/the-lesswrong-2021-review-intellectual-circle-expansion
I consider this a move in a positive direction, but I am wondering if I can draw the LessWrong team’s attention to a more serious trend here. @Oliver, @Raemon, @Ruby, and @Ben Pace, and others engaged in curating and fostering intellectual progress on LessWrong:
Could it be that the LessWrong community, or the EA community, or the rationality community, has systematically discounted the opinions and/or agency of people outside that community, in a way that has lead the community to plan more drastic actions in the world than would otherwise be reasonable if outsiders of that community could also be counted upon to take reasonable actions?
This is a leading question, and my gut and deliberate reasoning have both been screaming “yes” at me for about 5 or 6 years straight, but I am trying to get you guys to take a fresh look at this hypothesis now, in question-form. Thanks in any case for considering it.
The question feels leading enough that I don’t really know how to respond. Many of these sentences sound pretty crazy to me, so I feel like I primarily want to express frustration and confusion that you assign those sentences to me or “most of the LessWrong community”.
I think John Wentworth’s question is indeed the obvious question to ask. It does really seem like our prior should be that the world will not react particularly sanely here.
I also think it’s really not true that coordination has been “fraught to even suggest”. I think it’s been suggested all the time, and certain coordination plans seem more promising than others. Like, even Eliezer was for a long time apparently thinking that Deepmind having a monopoly on AGI development was great and something to be protected, which very much involves coordinating with people outside of the LessWrong community.
The same is true for whether “outsiders might recognize the existence of AI x-risk”. Of course outsiders might recognize the existence of AI X-risk. I don’t think this is uncontroversial or disputed. The question is what happens next. Many people seem to start working on AI capabilities research as the next step, which really doesn’t improve things.
I don’t think your summary of how your statement was received is accurate. Your overall post has ~100 karma, so was received quite positively, and while John responded to this particular statement and was upvoted, I don’t think this really reflects much of a negative judgement that specific statement.
John’s question is indeed the most important question to ask about this kind of plan, and it seems correct for it to be upvoted, even if people agree with the literal sentence it is responding to (your original sentence itself was weak enough that I would be surprised if almost anyone on LW disagreed with its literal meaning, and if there is disagreement, it is with the broader implied statement of the help being useful enough to actually be worth it to have as a main-line plan and to forsake other plans that are more pivotal-act shaped).
I don’t know man, I have always put a huge emphasis on reading external writing, learning from others, and doing practical concrete things in the external world. Including external material has been more of a UI question, and I’ve been interested in it for a long while, it just didn’t reach the correct priority level (and also, I think it isn’t really working this year for UI reasons, given that basically no nominated posts are externally linked posts, and it was correct for us to not try without putting in substantially more effort to make it work).
I think if anything I updated over the last few years that rederiving stuff for yourself is more important and trying to coordinate with the external world has less hope, given the extreme way the world was actively hostile to cooperation as well as epistemically highly corrupting during the pandemic. I also think the FTX situation made me think it’s substantially more likely that we will get fucked over again in the future when trying to coordinate with parties that have different norms, and don’t seem to care about honesty and integrity very much. I also think RLHF turning out to be the key to ChatGPT and via that OpenAI getting something like product-market fit and probably making OpenAI $10+ billion dollars in-expectation, showing that actually the “alignment ” team at OpenAI had among the worst consequences of any of the teams in the org, was an update in an opposite direction to me. I think these events also made me less hopeful about the existing large LessWrong/EA/Longtermist/Rationality community as a coherent entity that can internally coordinate, but I think that overall results in a narrowing of my circle of coordination, not a widening.
I have models here, but I guess I feel like your comment is in some ways putting words in my mouth in a way that feels bad to me, and I am interested in explaining my models, but I don’t this comment thread is the right context.
I think there is a real question about whether both me and others in the community have a healthy relationship to the rest of the world. I think the answer is pretty messy. I really don’t think it’s ideal, and indeed think it’s probably quite bad, and I have a ton of different ways I would criticize what is currently happening. But I also really don’t think that the current primary problem with the way the community relates to the rest of the world is underestimating the sanity of the rest of the world. I think mostly I expect us to continue to overestimate the sanity and integrity of most of the world, then get fucked over like we got fucked over by OpenAI or FTX. I think there are ways to relating to the rest of the world that would be much better, but a naive update in the direction of “just trust other people more” would likely make things worse.
Again, I think the question you are raising is crucial, and I have giant warning flags about a bunch of the things that are going on (the foremost one is that it sure really is a time to reflect on your relation to the world when a very prominent member of your community just stole 8 billion dollars of innocent people’s money and committed the largest fraud since Enron), so I do think there are good and important conversations to be had here.
I very much agree with the sentiment of the second paragraph.
Regarding the first paragraph, my own take is that (many) EAs and rationalists might be wise to trust themselves and their allies less.[1]
The main update of the FTX fiasco (and other events I’ll describe later) I’d make is that perhaps many/most EAs and rationalists aren’t very good at character judgment. They probably trust other EAs and rationalists too readily because they are part of the same tribe and automatically assume that agreeing with noble ideas in the abstract translates to noble behavior in practice.
(To clarify, you personally seem to be good at character judgment, so this message is not directed at you. (I base that mostly on your comments I read about the SBF situation, big kudos for that, btw!)
It seems like a non-trivial fraction of people that joined the EA and rationalist community very early turned out to be of questionable character, and this wasn’t noticed for years by large parts of the community. I have in mind people like Anissimov, Helm, Dill, SBF, Geoff Anders, arguably Vassar—these are just the known ones. Most of them were not just part of the movement, they were allowed to occupy highly influential positions. I don’t know what the base rate for such people is in other movements—it’s plausibly even higher—but as a whole our movements don’t seem to be fantastic at spotting sketchy people quickly. (FWIW, my personal experiences with a sketchy, early EA (not on the above list) inspired this post.)
My own takeaway is that perhaps EAs and rationalists aren’t that much better in terms of integrity than the outside world and—given that we probably have to coordinate with some people to get anything done—I’m now more willing to coordinate with “outsiders” than I was, say, eight years ago.
Though I would be hesitant to spread this message; the kinds of people who should trust themselves and their character judgment less are more likely the ones who will not take this message to heart, and vice versa.
Thanks, Oliver. The biggest update for me here — which made your entire comment worth reading, for me — was that you said this:
I’m surprised that you think that, but have updated on your statement at face value that you in fact do. By contrast, my experience around a bunch common acquaintances of ours has been much the same as Katja’s, like this:
In fact I think I may have even heard the world “delusional” specifically applied to people working on AI governance (though not by you) for thinking that coordination on AI regulation is possible / valuable / worth pursuing in service of existential safety.
As for the rest of your narrative of what’s been happening in the world, to me it seems like a random mix of statements that are clearly correct (e.g., trying to coordinate with people who don’t care about honestly or integrity will get you screwed) and other statements that seem, as you say,
and I agree that for the purpose of syncing world models,
Anyway, cheers for giving me some insight into your thinking here.
Oliver, see also this comment; I tried to @ you on it, but I don’t think LessWrong has that functionality?
Critch, I agree it’s easy for most people to understand the case for AI being risky. I think the core argument for concern—that it seems plausibly unsafe to build something far smarter than us—is simple and intuitive, and personally, that simple argument in fact motivates a plurality of my concern. That said:
I think it often takes weirder, less intuitive arguments to address many common objections—e.g., that this seems unlikely to happen within our lifetimes, that intelligence far superior to ours doesn’t even seem possible, that we’re safe because software can’t affect physical reality, that this risk doesn’t seem more pressing than other risks, that alignment seems easy to solve if we just x, etc.
It’s also remarkably easy to convince many people that aliens visit Earth on a regular basis, that the theory of evolution via natural selection is bunk, that lottery tickets are worth buying, etc. So while I definitely think some who engage with these arguments come away having good reason to believe the threat is likely, for values of “good” and “believe” and “likely” at least roughly similar those common around here, I suspect most update something more like their professed belief-in-belief, than their real expectations—and that even many who do update their real expectations do so via symmetric arguments that leave them with poor models of the threat.
These factors make me nervous about strategies that rely heavily on convincing everyday people, or people in government, to care about AI risk, for reasons I don’t think are well described as “systematically discounting their opinions/agency.” Personally, I’ve engaged a lot with people working in various corners of politics and government, and decently much with academics, and I respect and admire many of them, including in ways I rarely admire rationalists or EA’s.
(For example, by my lights, the best ops teams in government are much more competent than the best ops teams around here; the best policy wonks, lawyers, and economists are genuinely really quite smart, and have domain expertise few R/EA’s have without which it’s hard to cause many sorts of plausibly-relevant societal change; perhaps most spicily, I think academics affiliated with the Santa Fe Institute have probably made around as much progress on the alignment problem so far as alignment researchers, without even trying to, and despite being (imo) deeply epistemically confused in a variety of relevant ways).
But there are also a number of respects in which I think rationalists and EA’s tend to far outperform any other group I’m aware of—for example, in having beliefs that actually reflect their expectations, trying seriously to make sure those beliefs are true, being open to changing their mind, thinking probabilistically, “actually trying” to achieve their goals as a behavior distinct from “trying their best,” etc. My bullishness about these traits is why e.g. I live and work around here, and read this website.
And on the whole, I am bullish about this culture. But it’s mostly the relative scarcity of these and similar traits in particular, not my overall level of enthusiasm or respect for other groups, that causes me to worry they wouldn’t take helpful actions if persuaded of AI risk.
My impression is that it’s unusually difficult to figure out how to take actions that reduce AI risk without substantial epistemic skill of a sort people sometimes have around here, but only rarely have elsewhere. On my models, this is mostly because:
There are many more ways to make the situation worse than better;
A number of key considerations are super weird and/or terrifying, such that it’s unusually hard to reason well about them;
It seems easier for people to grok the potential importance of transformative AI, than the potential danger.
My strong prior is that, to accomplish large-scale societal change, you nearly always need to collaborate with people who disagree with you, even about critical points. And I’m sympathetic to the view that this is true here, too; I think some of it probably is. But I think the above features make this more fraught than usual, in a way that makes it easy for people who grok the (simpler) core argument for concern, but not some of the (typically more complex) ancillary considerations, to accidentally end up making the situation even worse.
Here are some examples of (what seem to me like) this happening:
The closest thing I’m aware of to an official US government position on AI risk is described in the 2016 and 2017 National Science and Technology Council reports. I haven’t read all of them, but the parts I have read struck me as a strange mix of claims like “maybe this will be a big deal, like mobile phones were,” and “maybe this will be a big deal, in the sense that life on Earth will cease to exist.” And like, I can definitely imagine explanations for this that don’t much involve the authors misjudging the situation—maybe their aim was more to survey experts than describe their own views, or maybe they were intentionally underplaying the threat for fear of starting an arms race, etc. But I think my lead hypothesis is more that the authors just didn’t actually, viscerally consider that the sentences they were writing might be true, in the sense of describing a reality they might soon inhabit.
I think rationalists and EA’s tend to make this sort of mistake less often, since the “taking beliefs seriously”-style epistemic orientation common around here has the effect of making it easier for people to viscerally grasp that trend lines on graphs and so forth might actually reflect reality. (Like, one frame on EA as a whole, is “an exercise in avoiding the ‘learning about the death of a million feels like a statistic, not a tragedy’ error”). And this makes me at least somewhat more confident they won’t do dumb things upon becoming worried about AI risk, since without this epistemic skill, I think it’s easier to make critical errors like overestimating how much time we have, or underestimating the magnitude or strangeness of the threat.
As I understand it, OpenAI is named what it is because, at least at first, its founders literally hoped to make AGI open source. (Elon Musk: “I think the best defense against the misuse of AI is to empower as many people as possible to have AI. If everyone has AI powers, then there’s not any one person or a small set of individuals who can have AI superpower.”)
By my lights, there are unfortunately a lot of examples of rationalists and EA’s making big mistakes while attempting to reduce AI risk. But it’s at least… hard for me to imagine most of them making this one? Maybe I’m being insufficiently charitable here, but from my perspective, this just fails a really basic “wait, but then what happens next?” sanity check, that I think should have occurred to them more or less immediately, and that I suspect would have to most rationalists and EA’s.
For me, the most striking aspect of the AI Impacts poll, was that all those ML researchers who reported thinking ML had a substantial chance of killing everyone, still research ML. I’m not sure why they do this; I’d guess some of them are convinced for some reason or another that working on it still makes sense, even given that. But my perhaps-uncharitable guess is that most of them actually don’t—that they don’t even have arguments which feel compelling to them that justify their actions, but that they for some reason press on anyway. This too strikes me as a sort of error R/EA’s are less likely to make.
(When Bostrom asked Geoffrey Hinton why he still worked on AI, if he thought governments would likely use it to terrorize people, he replied, “I could give you the usual arguments, but the truth is that the prospect of discovery is too sweet”).
Sam Altman recently suggested, on the topic of whether to slow down AI, that “either we figure out how to make AGI go well or we wait for the asteroid to hit.”
Maybe he was joking, or meant “asteroid” as a stand-in for all potentially civilization-ending threats, or something? But that’s not my guess, because his follow-up comment is about how we need AGI to colonize space, which makes me suspect he actually considers asteroid risk in particular a relevant consideration for deciding when to deploy advanced AI. Which if true, strikes me as… well, more confused than any comment in this thread strikes me. And it seems like the kind of error that might, for example, cause someone to start an org with the hope of reducing existential risk, that mostly just ends up exacerbating it.
Obviously our social network doesn’t have a monopoly on good reasoning, intelligence, or competence, and lord knows it has plenty of its own pathologies. But as I understand it, most of the reason the rationality project exists is to help people reason more clearly about the strange, horrifying problem of AI risk. And I do think it has succeeded to some degree, such that empirically, people with less exposure to this epistemic environment far more often take actions which seem terribly harmful to me.
This is a candidate for the most surprising sentence in the whole comments section! I’d be interested in knowing more about why you believe this. One sort of thing I’d be quite interested in is things you’ve seen government ops teams do fast (even if they’re small things, accomplishments that would surprise many of us in this thread that they could be done so quickly).
Recruitment—in my experience often a weeks long process from start to finish, well oiled and systematic and using all the tips from the handbook on organizational behaviour on selection, often with feedback given too. By comparison, some tech companies can take several months to hire, with lots of ad hoc decision-making, no processes around biases or conflicts of interest, and no feedback.
Happy to give more examples if you want by DM.
I should say my sample size is tiny here—I know one gov dept in depth, one tech company in depth and a handful of other tech companies and gov depts not fully from the inside but just from talking with friends that work there, etc.
This is an important optimistic update, because it implies alignment might be quite easier than we think, given that even under unfavorable circumstances, reasonable progress still gets done.
I think that this isn’t an error in rationality, and instead very different goals drive EAs/LWers compared to AI researchers. A low chance of high utility and a high chance of death is pretty rational to take, assuming you only care about yourself. And this is the default, absent additional assumptions.
From an altruistic perspective, it’s insane to take this risk, especially if you care about the future.
Thus, differing goals are at play.
I object to this hyperbolic and unfair accusation. The entire AI Governance field is founded on this idea; this idea is not only fine to suggest, but completely uncontroversial accepted wisdom. That is, if by “this idea” you really mean literally what you said—“people outside the LW community might recognize the existence of AI x-risk and be worth coordinating with on the issue.” Come on.
I am frustated by what appears to me to be constant straw-manning of those who disagree with you on these matters. Just because people disagree with you doesn’t mean there’s a sinister bias at play. I mean, there’s usually all sorts of sinister biases at play at all sides of every dispute, but the way to cut through them isn’t to go around slinging insults at each other about who might be biased, it’s to stay on the object level and sort through the arguments.
This makes sense to me if you feel my comment is meant as a description of you or people-like-you. It is not, and quite the opposite. As I see it, you are not a representative member of the LessWrong community, or at least, not a representative source of the problem I’m trying to point at. For one thing, you are willing to work for OpenAI, which many (dozens of) LessWrong-adjacent people I’ve personally met would consider a betrayal of allegiance to “the community”. Needless to say, the field of AI governance as it exists is not uncontroversially accepted by the people I am reacting to with the above complaint. In fact, I had you in mind as a person I wanted to defend by writing the complaint, because you’re willing to engage and work full-time (seemingly) in good faith with people who do not share many of the most centrally held views of “the community” in question, be it LessWrong, Effective Altruism, or the rationality community.
If it felt otherwise to you, I apologize.
It would help if you specified which subset of “the community” you’re arguing against. I had a similar reaction to your comment as Daniel did, since in my circles (AI safety researchers in Berkeley), governance tends to be well-respected, and I’d be shocked to encounter the sentiment that working for OpenAI is a “betrayal of allegiance to ‘the community’”.
To be clear, I do think most people who have historically worked on “alignment” at OpenAI have probably caused great harm! And I do think I am broadly in favor of stronger community norms against working at AI capability companies, even in so called “safety positions”. So I do think there is something to the sentiment that Critch is describing.
Agreed! But the words he chose were hyperbolic and unfair. Even an angrier more radical version of Habryka would still endorse “the idea that people outside the LessWrong community might recognize the existence of AI risk.”
Separately from my other reply explaining that you are not the source of what I’m complaining about here, I thought I’d add more color to explain why I think my assessment here is not “hyperbolic”. Specifically, regarding your claim that reducing AI x-risk through coordination is “not only fine to suggest, but completely uncontroversial accepted wisdom”, please see the OP. Perhaps you have not witnessed such conversations yourself, but I have been party to many of these:
In other words, I’ve seen people in AI governance being called or treated as “delusional” by loads of people (1-2 dozen?) core to the LessWrong community (not you). I wouldn’t say by a majority, but by an influential minority to say the least, and by more people than would be fair to call “just institution X” for any X, or “just person Y and their friends” for any Y. The pattern is strong enough that for me, pointing to governance as an approach to existential safety on LessWrong indeed feels fraught, because influential people (online or offline) will respond to the idea as “delusional” as Katja puts it. Being called delusional is stressful, and hence “fraught”.
@Oliver, the same goes for your way of referring to sentences you disagree with as “crazy”, such as here.
Generally speaking, on the LessWrong blog itself I’ve observed too many instances of people using insults in response to dissenting views on the epistemic health of the LessWrong community, and receiving applause and karma for doing so, for me to think that there’s not a pattern or problem here.
That’s not to say I think LessWrong has this problem worse than other online communities (i.e., using insults or treating people as ‘crazy’ or ‘delusional’ for dissenting or questioning the status quo); only that I think it’s a problem worth addressing, and a problem I see strongly at play on the topic of coordination and governance.
Just to clarify, the statements that I described as crazy were not statements you professed, but statements that you said I or “the LessWrong community” believe. I am not sure whether that got across (since like, in that context it doesn’t really make sense to say I described sentences I disagree with as crazy, since like, I don’t think you believe those sentences either, that’s why you are criticizing them).
It did not get accross! Interesting. Procedurally I still object to calling people’s arguments “crazy”, but selfishly I guess I’m glad they were not my arguments? At a meta level though I’m still concerned that LessWrong culture is too quick to write off views as “crazy”. Even the the “coordination is delusional”-type views that Katja highlights in her post do not seem “crazy” to me, more like misguided or scarred or something, in a way that warrants a closer look but not being called “crazy”.
Oops, yeah, sorry about that not coming across.
Seems plausible that LessWrong culture is too quick to write off views as “crazy”, though I have a bunch of conflicting feeling here. Might be worth going into at some point.
I do think there is something pretty qualitatively different about calling a paraphrase or an ITT of my own opinions “crazy” than to call someone’s actual opinion crazy. In-general my sense is for reacting to paraphrases it’s less bad for the social dynamics to give an honest impression and more important to give a blunt evocative reaction, but I’ll still try to clarify more in the future when I am referring to the meat of my interlocutors opinion vs. their representation of my opinion.
So, wait, what’s actually the answer to this question? I read that entire comment thread and didn’t find one. The question seems to me to be a good one!
The GoF analogy is quite weak.
“What exactly” seems a bit weird type of question. For example, consider nukes: it was hard to predict what exactly is the model by which governments will not blow everyone up after use of nukes in Japan. But also: while the resulting equilibrium is not great, we haven’t died in nuclear WWIII so far.
As in my comment here, if you have a model that simultaneously both explains the fact that governments are funding GoF research right now, and predicts that governments would nevertheless react helpfully to AGI, I’m very interested to hear it. It seems to me that defunding GoF is a dramatically easier problem in practically every way.
The only responses I can think of right now are (1) “Basically nobody in or near government is working hard to defund GoF but people in or near government will be working hard to spur on a helpful response to AGI” (really? if so, what’s upstream of that supposed difference?) or (2) “It’s all very random—who happens to be in what position of power and when, etc.—and GoF is just one example, so we shouldn’t generalize too far from it” (OK maybe, but if so, then can we pile up more examples into a reference class to get a base rate or something? and what are the interventions to improve the odds, and can we also try those same interventions on GoF?)
I think it’s worth updating on the fact that the US government has already launched a massive, disruptive, costly, unprecedented policy of denying AI-training chips to China. I’m not aware of any similar-magnitude measure happening in the GoF domain.
IMO that should end the debate about whether the government will treat AI dev the way it has GoF—it already has moved it to a different reference class.
Some wild speculation on upstream attributes of advanced AI’s reference class that might explain the difference in the USG’s approach:
a perception of new AI as geoeconomically disruptive; that new AI has more obvious natsec-relevant use-cases than GoF; that powerful AI is more culturally salient than powerful bio (“evil robots are scarier than evil germs”).
Not all of these are cause for optimism re: a global ASI ban, but (by selection) they point to governments treating AI “seriously”.
One big difference is GoF currently does not seem that dangerous to governments. If you look on it from a perspective not focusing on the layer of individual humans as agents, but instead states, corporations, memplexes and similar creatures as the agents, GoF maybe does not look that scary? Sure, there was covid, but while it was clearly really bad for humans, it mostly made governments/states relatively stronger.
Taking this difference into account, my model was and still is governments will react to AI.
This does not imply reacting in a helpful way, but I think whether the reaction will be helpful, harmful or just random is actually one of the higher variance parameters, and a point of leverage. (And the common-on-LW stance governments are stupid and evil and you should mostly ignore them is unhelpful in both understanding and influencing the situation.)
Personally I haven’t thought about how strong the analogy to GoF is, but another thing that feels worth noting is that there may be a bunch of other cases where the analogy is similarly strong and where major government efforts aimed at risk-reduction have occurred. And my rough sense is that that’s indeed the case, e.g. some of the examples here.
In general, at least for important questions worth spending time on, it seems very weird to say “You think X will happen, but we should be very confident it won’t because in analogous case Y it didn’t”, without also either (a) checking for other analogous cases or other lines of argument or (b) providing an argument for why this one case is far more relevant evidence than any other available evidence. I do think it totally makes sense to flag the analogous case and to update in light of it, but stopping there and walking away feeling confident in the answer seems very weird.
I haven’t read any of the relevant threads in detail, so perhaps the arguments made are stronger than I imply here, but my guess is they weren’t. And it seems to me that it’s unfortunately decently common for AI risk discussions on LessWrong to involve this pattern I’m sketching here.
(To be clear, all I’m arguing here is that these arguments often seem weak, not that their conclusions are false.)
(This comment is raising an additional point to Jan’s, not disagreeing.)
Update: Oh, I just saw Steve Byrnes also the following in this thread, which I totally agree with:
This would be useful if the main problem was misuse, and while this problem is arguably serious, there is another problem, called the alignment problem, that doesn’t care who uses AGI, only that it exists.
Biotech is probably the best example of technology being slowed down in the manner required, and suffice it to say it only happened because eugenics and anything related to that became taboo after WW2. I obviously don’t want a WW3 to slow down AI progress, but the main criticism remains: The examples of tech that were slowed down in the manner required for alignment required massive death tolls, ala a pivotal act.
The analogy I had in mind is not so much in exact nature of the problem, but in the aspect it’s hard to make explicit precise models of such situations in advance. In case of nukes, consider the fact that smartest minds of the time, like von Neumann or Feynman, spent decent amount of time thinking about the problems, had clever explicit models, and were wrong—in case of von Neumann to the extent that if US followed his advice, they would have launched nuclear armageddon.
I think it’s uncharitable to psychoanalyze why people upvoted John’s comment; his object-level point about GoF seems good and merits an upvote IMO. Really, I don’t know what to make of GoF. It’s not just that governments have failed to ban it, they haven’t even stopped funding it, or in the USA case they stopped funding it and then restarted I think. My mental models can’t explain that. Anyone on the street can immediately understand why GoF is dangerous. GoF is a threat to politicians and national security. GoF has no upsides that stand up to scrutiny, and has no politically-powerful advocates AFAIK. And we’re just getting over a pandemic which consumed an extraordinary amount of money, comfort, lives, and attention for the past couple years, and which was either a direct consequence of GoF research, or at the very least the kind of thing that GoF research could have led to. And yet, here we are, with governments funding GoF research right now. Again, I can’t explain this, and pending a detailed model that can, the best I can do right now is say “Gee I guess I should just be way more cynical about pretty much everything.”
Anyway, back to your post, if Option 1 is unilateral pivotal act and Option 2 is government-supported pivotal outcome, then one ought to try to weigh the pros and cons (particularly, probability of failure) of both options; your post was only about why Option 1 might fail but IIRC didn’t say anything about whether Option 2 might fail too. Maybe every option is doomed and the task is to choose the slightly-less-doomed option, right?
I’m not an expert, and maybe you have a gears-y model in which it’s natural & expected that governments are funding GoF right now and also simultaneously in which it’s natural & expected that the government-sanctioned-EMP-thing story you told in your post is likely to actually happen. (Or at least, less likely to fail than Option 1.) If so, I would be very interested for you to share that model!
I also have found that almost everyone I talk to outside the field of AI has found it obvious that AI could kill us all. They also find it obvious that AI is about to surpass us, and are generally not surprised by my claims of a coming discontinuity; in contrast, almost anyone working in ai thinks I’m crazy. I suspect that people think I’m claiming I can do it, when in fact I’m trying to tell them they are about to do it. it’s really frustrating! also, the majority of opinion in the world doesn’t come from AI researchers.
That said. I cannot state this hard enough: THE COMING DISCONTINUITY WILL NOT WAIT BEHIND REGULATION.
I know of multiple groups who already know what they need to in order to figure it out! regulation will not stop them unless it is broad enough to somehow catch every single person who has tried to participate in creating it, and that is not going to happen, no matter how much the public wishes for it. I don’t believe any form of pivotal act could save humanity. anything that attempts to use control to prevent control will simple cause a cascade of escalatory retaliations, starting with whatever form of attack is used to try to stop ai progress, escalating from accelerationists, escalating from attempted shutdown, possibly an international war aided by ai happening in parallel, and ending with the ai executing the last one.
Your attempts to slow the trickle of sand into the gravity well of increasing thermal efficiency will utterly fail. there are already enough GPUs in the world, and it only takes one. we must solve alignment so hard that it causes the foom, nothing else could possibly save us.
The good news is, alignment is capabilities in a deep way. Solving alignment at full strength would suddenly stabilize AI in a way that makes it much stronger at a micro level, and would simultaneously allow for things like “hey, can you get the carbon out of the air please?” without worry about damaging those inside protected boundaries.
No, you’re misunderstanding John Wentworth’s comment and then applying that straw man to the rest of less wrong based on the comment’s upvote total. It’s not that laypeople’s can’t understand the dangers inherent in engineered viruses, and that leads to governments continuing to finance and leak them. You can probably convince your Uber driver that lab leaks are bad, too. It’s a lack of ability to translate that understanding into positive regulatory and legal outcomes, instead of completely net negative ones.
Probably this opinion of LWers is shaped by their experience communicating with outsiders. Almost all my attempts to communicate AI x-risk to outsiders, from family members to friends to random acquaintances, have not been understood for sure. Your experience (talking to random people at social events, walking away from you with the thought “AI x-risk is indeed a thing!”, and starting to worry about it in the slightest afterwards) is highly surprising to me. Maybe there is a huge bias in this regard in the Bay Area, where even normal people generally understand and appreciate the power of technology more than in other places, or have had some similar encounters before, or it’s just in the zeitgeist of the place. (My experience is outside the US, primarily with Russians and some Europeans.)
All that being said, ChatGPT (if people have experienced it first-hand) and especially GPT-4 could potentially make communication of the AI x-risk case much easier.
I’ve had >50% hit rate for “this person now takes AI x-risk seriously after one conversation” from people at totally non-EA parties (subculturally alternative/hippeish, in not particularly tech-y parts of the UK). I think it’s mostly about having a good pitch (but not throwing it at them until there is some rapport, ask them about their stuff first), being open to their world, modeling their psychology, and being able to respond to their first few objections clearly and concisely in a way they can frame within their existing world-model.
Edit: Since I’ve been asked in DM:
My usual pitch been something like this. I expect Critch’s version is very useful for the “but why would it be a threat” thing but have not tested it as much myself.
I think being open and curious about them + being very obviously knowledgeable and clear thinking on AI x-risk is basically all of it, with the bonus being having a few core concepts to convey. Truth-seek with them, people can detect when you’re pushing something in epistemically unsound ways, but tend to love it if you’re going into the conversation totally willing to update but very knowledgeable.