Why does MIRI believe that an “AI Pause” would contribute anything of substance to the goal of protecting the human race? It seems to me that an AI pause would:
Drive capabilities research further underground, especially in military contexts
Force safety researchers to operate on weaker models, which could hamper their ability to conduct effective research
Create a hardware overhang which would significantly increase the chance of a sudden catastrophic jump in capability that we are not prepared to handle
Create widespread backlash against the AI Safety community among interest groups that would like to see AI development continued
Be politically contentious, creating further points for tension between nations that could spark real conflict; at worst, you are handing the reins to the future to foreign countries, especially ones that don’t care about international agreements—which are the countries you would probably least want to be in control of AGI.
In any case, I think you are going to have an extremely difficult time in your messaging. I think this strategy will not succeed and will most likely, like most other AI safety efforts, actively harm your efforts.
Every movement thinks they just need people to “get it”. Including, and especially, lunatics. If you behave like lunatics, people will treat you as such. This is especially true when there is a severe lack of evidence as to your conclusions. Classical AI Alignment theory does not apply to LLM-derived AI systems and I have not seen anything substantial to replace it. I find no compelling evidence to suggest even a 1% chance of x-risk from LLM-based systems. Anthropogenic climate change has mountains of evidence to support it, and yet a significant chunk of the population does not believe in it.
You are not telling people what they want to hear. Concerns around AI revolve around copyright infringement, job displacement, the shift of power between labor and capital, AI impersonation, data privacy, and just plain low-quality AI slop taking up space online and assaulting their eyeballs. The message every single news outlet has been publishing is: “AI is not AGI and it’s not going to kill us all, but it might take your job in a few years”—that is, I think, the consensus opinion. Reframing some of your argument in these terms might make them a lot more palatable, at least to the people in the mainstream who already lean anti-AI. As it stands, even though the majority of Americans have a negative opinion on AI, they are very unlikely to support the kind of radical policies you propose, and lawmakers, who have an economic interest in the success of AI product companies, will be even less convinced.
I’m sorry if this takes on an insolent tone but surely you guys understandwhy everyone else plays the game, right? They’re not doing it for fun, they’re doing it because that’s the best and only way to get anyone to agree with your political ideas. If it takes time, then you had better startright now. If a shortcut existed,everyone would take it. And then it would cease to be a shortcut. You have not found a trick to expedite the process, you have stumbled into a trap for fanatics. People will tune you out among the hundreds of other groups that also believe the world will end and that their radical actions are necessary to save it. Doomsday cults are a dime a dozen. Behaving like them will produce the same results as them: ridicule.
There’s a dramatic difference between this message and the standard fanatic message: a big chunk of it is both true, and intuitively so.
The idea that genuine smarter-than-humans-in-every-way AGI is dangerous is quite intuitive. How many people would say that, if we were visited by a more capable alien species, that would be totally safe for us?
The reason people don’t intuitively see AI as dangerous is that they imagine it won’t become fully agentic and genuinely outclass humans in all relevant ways. Convincing them otherwise is a complex argument, but continued progress will make that argument for us (unless it’s all underground, which is a real risk as you say).
Now, that’s not the part of their message that MIRI tends to emphasize. I think they had better, and I think they probably will.
That message actually benefits from not mixing it with any of the complex risks from sub-sapient tool AI that you mention. Doing what you suggest and using existing fears has dramatic downsides (although it still might be wise on a careful analysis—I haven’t seen one that’s convincing).
I agree with you that technical alignment of LLM-based AGI is quite achievable. I think we have plans for it that are underappreciated. Take a look at my publications, where I argue that technical alignment is probably quite achievable. I think you’re overlooking some major danger points if you somehow get to a less than 1% risk.
LLMs aren’t the end of powerful AI, they’re the start. Your use of “LLM-derived AI” indicates that you’re not even thinking about the real AGI that will follow them. Even agents built out of of LLMs have novel alignment risks that the LLMs do not
People do foolish things. For evidence, see history.
Complex projects are hard to get right on the first try.
Even if we solve technical intent alignment, we may very well kill ourselves in an escalating conflict powered by AGIs controlled by conflicting factions, terrorists, or idiots. Or combinations of those three.
Those last two are major components of why Eliezer and Nate are pessimistic. I wish it were a bigger part of their message. I think YK is quite wrong that technical alignment is very difficult, but I’m afraid he’s right that obtaining a good outcome is going to be difficult.
Finally and separately: you appear to not understand that MIRI leadership thinks that AGI spells doom no matter who accomplishes it. They might be right or wrong, but that’s what they think and they do have reasons. So handing the lead to bad actors isn’t really a downside for them (it is to me, given my relative optimism, but this isn’t my plan).
No message is intuitively obvious; the inferential distance between the AI safety community and the general public is wide, and even if many people do broadly dislike AI, they will tend to think that apocalyptic predictions of the future, especially ones that don’t have as much hard evidence to back them as climate change (which is already very divisive!) belong in the same pile as the rest of them. I am sure many people will be convinced, especially if they were already predisposed to it, but such a radical message will alienate many potential supporters.
I think the suggestion that contact with non-human intelligence is inherently dangerous is not actually widely intuitive. A large portion of people across the world believe they regularly commune with non-human intelligence (God/s) which they consider benevolent. I also think this is a case of generalizing from fictional evidence—mentioning “aliens” conjures up stories like the War of the Worlds. So I think that, while this is definitely a valid concern, it will be far from a universally understood one.
I mainly think that using existing risks to convince people of their message would help because it would lower the inferential distance between them and their audience. Most people are not thinking about dangerous, superhuman AI, and will not until it’s too late (potentially). Forming coalitions is a powerful tool in politics and I think throwing this out of the window is a mistake.
The reason I say LLM-derived AI is that I do think that to some extent, LLMs are actually a be-all-end-all. Not language models in particular, but the idea of using neural networks to model vast quantities of data, generating a model of the universe. That is what an LLM is and it has proven wildly successful. I agree that agents derived from them will not behave like current-day LLMs, but will be more like them than different. Major, classical misalignment risks would stem from something like a reinforcement learning optimizer.
I am aware of the argument of dangerous AI in the hands of ne’er do wells, but such people already exist and in many cases, are able to—with great effort—obtain means of harming vast amounts of people. Gwern Branwen covered this; there are a few terrorist vectors that would require relatively minuscule amounts of effort but that would result in a tremendous expected value of terror output. I think in part, being a madman hampers one’s ability to rationally plan the greatest terror attack one’s means could allow, and also that the efforts dedicated to suppressing such individuals vastly exceed the efforts of those trying to destroy the world. In practice, I think there would be many friendly AGI systems that would protect the earth from a minority of ones tasked to rogue purposes.
I also agree with your other points, but they are weak points compared to the rock-solid reasoning of misalignment theory. They apply to many other historical situations, and yet, we have ultimately survived; more people do sensible things than foolish things, and we do often get complex projects right the first time around as long as there is a theoretical underpinning to them that is well understood—I think proto-AGI is almost as well understood as it needs to be, and that Anthropic is something like 80% of the way to cracking the code.
I am afraid I did forget in my original post that MIRI would believe that the person who holds AGI is of no consequence. It simply struck me as so obvious I didn’t think anyone could disagree with this.
In any case, I plan to write a longer post in collaboration with some friends who will help me edit it to not sound quite like the comment I left yesterday, in opposition of the PauseAI movement, which MIRI is a part of.
This comment doesn’t seem to be responding to the contents of the post at all, nor does it seem to understand very basic elements of the relevant worldview it’s trying to argue against (i.e. “which are the countries you would probably least want to be in control of AGI”; no, it doesn’t matter which country ends up building an ASI, because the end result is the same).
It also tries to leverage arguments that depend on assumptions not shared by MIRI (such as that research on stronger models is likely to produce enough useful output to avert x-risk, or that x-risk is necessarily downstream of LLMs).
I am sorry for the tone I had to take, but I don’t know how to be any clearer—when people start telling me they’re going to “break the overton window” and bypass politics, this is nothing but crazy talk. This strategy will ruin any chances of success you may have had. I also question the efficacy of a Pause AI policy in the first place—and one argument against it is that some countries may defect, which could lead to worse outcomes in the long term.
Why does MIRI believe that an “AI Pause” would contribute anything of substance to the goal of protecting the human race? It seems to me that an AI pause would:
Drive capabilities research further underground, especially in military contexts
Force safety researchers to operate on weaker models, which could hamper their ability to conduct effective research
Create a hardware overhang which would significantly increase the chance of a sudden catastrophic jump in capability that we are not prepared to handle
Create widespread backlash against the AI Safety community among interest groups that would like to see AI development continued
Be politically contentious, creating further points for tension between nations that could spark real conflict; at worst, you are handing the reins to the future to foreign countries, especially ones that don’t care about international agreements—which are the countries you would probably least want to be in control of AGI.
In any case, I think you are going to have an extremely difficult time in your messaging. I think this strategy will not succeed and will most likely, like most other AI safety efforts, actively harm your efforts.
Every movement thinks they just need people to “get it”. Including, and especially, lunatics. If you behave like lunatics, people will treat you as such. This is especially true when there is a severe lack of evidence as to your conclusions. Classical AI Alignment theory does not apply to LLM-derived AI systems and I have not seen anything substantial to replace it. I find no compelling evidence to suggest even a 1% chance of x-risk from LLM-based systems. Anthropogenic climate change has mountains of evidence to support it, and yet a significant chunk of the population does not believe in it.
You are not telling people what they want to hear. Concerns around AI revolve around copyright infringement, job displacement, the shift of power between labor and capital, AI impersonation, data privacy, and just plain low-quality AI slop taking up space online and assaulting their eyeballs. The message every single news outlet has been publishing is: “AI is not AGI and it’s not going to kill us all, but it might take your job in a few years”—that is, I think, the consensus opinion. Reframing some of your argument in these terms might make them a lot more palatable, at least to the people in the mainstream who already lean anti-AI. As it stands, even though the majority of Americans have a negative opinion on AI, they are very unlikely to support the kind of radical policies you propose, and lawmakers, who have an economic interest in the success of AI product companies, will be even less convinced.
I’m sorry if this takes on an insolent tone but surely you guys understand why everyone else plays the game, right? They’re not doing it for fun, they’re doing it because that’s the best and only way to get anyone to agree with your political ideas. If it takes time, then you had better start right now. If a shortcut existed, everyone would take it. And then it would cease to be a shortcut. You have not found a trick to expedite the process, you have stumbled into a trap for fanatics. People will tune you out among the hundreds of other groups that also believe the world will end and that their radical actions are necessary to save it. Doomsday cults are a dime a dozen. Behaving like them will produce the same results as them: ridicule.
There’s a dramatic difference between this message and the standard fanatic message: a big chunk of it is both true, and intuitively so.
The idea that genuine smarter-than-humans-in-every-way AGI is dangerous is quite intuitive. How many people would say that, if we were visited by a more capable alien species, that would be totally safe for us?
The reason people don’t intuitively see AI as dangerous is that they imagine it won’t become fully agentic and genuinely outclass humans in all relevant ways. Convincing them otherwise is a complex argument, but continued progress will make that argument for us (unless it’s all underground, which is a real risk as you say).
Now, that’s not the part of their message that MIRI tends to emphasize. I think they had better, and I think they probably will.
That message actually benefits from not mixing it with any of the complex risks from sub-sapient tool AI that you mention. Doing what you suggest and using existing fears has dramatic downsides (although it still might be wise on a careful analysis—I haven’t seen one that’s convincing).
I agree with you that technical alignment of LLM-based AGI is quite achievable. I think we have plans for it that are underappreciated. Take a look at my publications, where I argue that technical alignment is probably quite achievable. I think you’re overlooking some major danger points if you somehow get to a less than 1% risk.
LLMs aren’t the end of powerful AI, they’re the start. Your use of “LLM-derived AI” indicates that you’re not even thinking about the real AGI that will follow them. Even agents built out of of LLMs have novel alignment risks that the LLMs do not
People do foolish things. For evidence, see history.
Complex projects are hard to get right on the first try.
Even if we solve technical intent alignment, we may very well kill ourselves in an escalating conflict powered by AGIs controlled by conflicting factions, terrorists, or idiots. Or combinations of those three.
Those last two are major components of why Eliezer and Nate are pessimistic. I wish it were a bigger part of their message. I think YK is quite wrong that technical alignment is very difficult, but I’m afraid he’s right that obtaining a good outcome is going to be difficult.
Finally and separately: you appear to not understand that MIRI leadership thinks that AGI spells doom no matter who accomplishes it. They might be right or wrong, but that’s what they think and they do have reasons. So handing the lead to bad actors isn’t really a downside for them (it is to me, given my relative optimism, but this isn’t my plan).
No message is intuitively obvious; the inferential distance between the AI safety community and the general public is wide, and even if many people do broadly dislike AI, they will tend to think that apocalyptic predictions of the future, especially ones that don’t have as much hard evidence to back them as climate change (which is already very divisive!) belong in the same pile as the rest of them. I am sure many people will be convinced, especially if they were already predisposed to it, but such a radical message will alienate many potential supporters.
I think the suggestion that contact with non-human intelligence is inherently dangerous is not actually widely intuitive. A large portion of people across the world believe they regularly commune with non-human intelligence (God/s) which they consider benevolent. I also think this is a case of generalizing from fictional evidence—mentioning “aliens” conjures up stories like the War of the Worlds. So I think that, while this is definitely a valid concern, it will be far from a universally understood one.
I mainly think that using existing risks to convince people of their message would help because it would lower the inferential distance between them and their audience. Most people are not thinking about dangerous, superhuman AI, and will not until it’s too late (potentially). Forming coalitions is a powerful tool in politics and I think throwing this out of the window is a mistake.
The reason I say LLM-derived AI is that I do think that to some extent, LLMs are actually a be-all-end-all. Not language models in particular, but the idea of using neural networks to model vast quantities of data, generating a model of the universe. That is what an LLM is and it has proven wildly successful. I agree that agents derived from them will not behave like current-day LLMs, but will be more like them than different. Major, classical misalignment risks would stem from something like a reinforcement learning optimizer.
I am aware of the argument of dangerous AI in the hands of ne’er do wells, but such people already exist and in many cases, are able to—with great effort—obtain means of harming vast amounts of people. Gwern Branwen covered this; there are a few terrorist vectors that would require relatively minuscule amounts of effort but that would result in a tremendous expected value of terror output. I think in part, being a madman hampers one’s ability to rationally plan the greatest terror attack one’s means could allow, and also that the efforts dedicated to suppressing such individuals vastly exceed the efforts of those trying to destroy the world. In practice, I think there would be many friendly AGI systems that would protect the earth from a minority of ones tasked to rogue purposes.
I also agree with your other points, but they are weak points compared to the rock-solid reasoning of misalignment theory. They apply to many other historical situations, and yet, we have ultimately survived; more people do sensible things than foolish things, and we do often get complex projects right the first time around as long as there is a theoretical underpinning to them that is well understood—I think proto-AGI is almost as well understood as it needs to be, and that Anthropic is something like 80% of the way to cracking the code.
I am afraid I did forget in my original post that MIRI would believe that the person who holds AGI is of no consequence. It simply struck me as so obvious I didn’t think anyone could disagree with this.
In any case, I plan to write a longer post in collaboration with some friends who will help me edit it to not sound quite like the comment I left yesterday, in opposition of the PauseAI movement, which MIRI is a part of.
This comment doesn’t seem to be responding to the contents of the post at all, nor does it seem to understand very basic elements of the relevant worldview it’s trying to argue against (i.e. “which are the countries you would probably least want to be in control of AGI”; no, it doesn’t matter which country ends up building an ASI, because the end result is the same).
It also tries to leverage arguments that depend on assumptions not shared by MIRI (such as that research on stronger models is likely to produce enough useful output to avert x-risk, or that x-risk is necessarily downstream of LLMs).
I am sorry for the tone I had to take, but I don’t know how to be any clearer—when people start telling me they’re going to “break the overton window” and bypass politics, this is nothing but crazy talk. This strategy will ruin any chances of success you may have had. I also question the efficacy of a Pause AI policy in the first place—and one argument against it is that some countries may defect, which could lead to worse outcomes in the long term.
I don’t think people laugh at the “nuclear war = doomsday” people.