Epistemic status: Sudden public attitude shift seems quite possible, but I haven’t seen it much in discussion, so I thought I’d float the idea again. This is somewhat dashed off since the goal is just to toss out a few possibilities and questions.
In Current AIs Provide Nearly No Data Relevant to AGI Alignment, Thane Ruthenis argues that current AI is almost irrelevant to the project of aligning AGIs. Current AI is simply not what we’re talking about when we worry about alignment, he says. And I halfway agree.[1]
By a similar token, we haven’t yet seen the thing we’re worried about, so attitudes now provide limited data about attitudes toward the real deal. It looks to me like people are rarely really thinking of superhuman intelligence with human-like goals and agency, and when they are, they usually find it highly concerning. We’ve tried to get people to think about powerful AGI, but I think that’s largely failed, at least among people familiar with AI and ML.
People are typically bad at imagining hypothetical scenarios, and we are notoriously shortsighted. People who refuse to worry about AGI existential risks appear to be some combination of not really imagining it, and assuming it won’t happen soon. Seeing dramatic evidence of actual AGI, with agency, competence, and strange intelligence, might quickly change public beliefs.
Leopold Aschenbrenner noted in Nobody’s on the ball on AGI alignment that the public’s attitude toward COVID turned on a dime, so a similar attitude shift is possible on AGI and alignment as well. This sudden change happened in response to evidence, but was made more rapid by nonlinear dynamics in the public discourse: a few people got very concerned and told others rather forcefully that they should also be very concerned, citing evidence; this spread rapidly. The same could happen for AGI.
As Connor Leahy put it (approximately, maybe):[2] when we talk to AI and ML people about AGI x-risk, they scoff. When we tell regular people that we’re building machines smarter than us, they often say something like “You fucking what?!” I think this happens because the AI/ML people think of existing and previous AI systems, while the public thinks of AIs from fiction—which actually have the properties of agency and goal-directedness we’re worried about. This class of people won’t change their beliefs but rather their urgency, if and when they see evidence that such sci-fi AI has been achieved.
That leaves the “expert” doubtersWhen I look closely at public statements, I think that most people who say they don’t believe in AGI x-risk simply don’t believe in real full AGI happening soon enough to bother thinking about now. If that is visibly proven false (before it’s too late), that could create a massive change in public opinion.
People are prone to see faces in the clouds, see ghosts, and attribute intelligence and intention to their pets. People who do talk extensively with LLMs are sure they have personalities and consciousness. So you’d think people would over-attribute agency and therefore danger to current AIs.
We can be impressed by the intelligence of an LLM without worrying about it taking over. They are clearly non-agentic in the sense of not having the capacity to affect the real world. And they don’t sit and think to themselves when we’re not around, so we don’t wonder what it’s thinking about or planning. o1 with its hidden train of thought is stretching that—but it does summarize, and it doesn’t think for long. It still seems to be thinking about only and exactly what we asked it to. And LLMs don’t have persistent goals to motivate their agency. It’s hard to believe that something like that could be an existential threat. The relative ease of adding agency and continuous learning is not at all obvious.
If any of those conditions change, emotional reactions might change. If the thing you’re talking to has not only intelligence, but agency, goals, and persistance (“real AGI”), the average person might think about it very differently. Could they fail to notice if they don’t look closely or think long? Sure. Some of them certainly will. But it only takes a few to take it seriously and tell their friends how creeped out they are by the whole thing. That’s how the panic around COVID spread and changed average attitudes dramatically within weeks.
Addenda: possible effects and causes of public freakout
The above was my main point here: we might see dramatic shifts in public opinion if there’s evidence of real AGI while public opinion might still be relevant. You can reach your own conclusions on what might cause this, and what effects it might have.
I can’t resist exploring the logic a little more. If you find this all credible, it leaves two questions: will this happen before it’s too late, and will it actually be helpful if the public goes from blissfully ignoring the whole thing to freaking out about it?
Effects
Here I’ll indulge in some speculation: I think a public freakout could be very helpful. It could be harnessed to insist that the government take control of all AGI projects and use them responsibly. This to me seems like a least-bad scenario. It seems overwhelmingly likely to me that government takes over AGI before it takes over government, at least in the slow-takeoff scenarios resulting from LLM-based[3] AGI in shorter timelines.
There are other scenarios in which public freakout is bad. It could cause a severe slowdown in AGI progress in the US. This could either make the race with China close, causing corner-cutting on safety, quite possibly causing doom from misaligned AGI. Or it could even cause China to decisively win the race for AGI.[4]
It’s worth noting that the possibility of rapid attitude shift applies to people in government as well as the public.
Causes
Finally: will it happen before it’s too late? It probably will if language model agents are the route to first AGI, which also seems fairly likely. Language model agents are creepily human-like, even when they’re thoroughly stupid and amnesic, and so not dangerous.
I think people would recognize the danger if we have parahuman AGI that’s not yet smart enough to be dangerous, but has the agency and persistence that current AI lacks. This would trigger people to recognize it as a parahuman entity and therefore interesting and dangerous — like humans.
This is a weak argument to actually advance language model agent progress; if it reaches AGI first, it might be the easiest sort to align and interpret. If it doesn’t, progress on that route could still cause people to start taking AGI x-risk seriously. An ideal scenario would be a dead-end at semi-competent, agentic LLM agents that are too slow and error-prone to succeed at takeover, but which cause major damage (hopefully just by spending millions of their users’ money) or are deployed in unsuccessful mischief, a la the ChaosGPT joke/demonstration.
Notable job loss is another possible cause of public freakout.
Conclusion
Humans are unpredictable and shortsighted. Opinions don’t change, until they do. And humans in societies seem to possibly be even more mercurial and shortsighted. We should take our best guesses and plan accordingly.
I agree with Ruthenis that current AI provide little insight on alignment of the real, dangerous AGI that seems inevitable. But I do think it provides nontrivial relevant data. If AGI is built based on or even related to current AI (e.g. if language model agents reach real AGI) then current AI has something valuable to say about aligning AGI—but it isn’t the full story, since full AGI will have very different properties.
Following this metaphor, I’d agree that attitudes toward current AI do provide some evidence of attitudes toward real AGI — but not much.
I’m not finding Connor’s original quote, but that’s my vivid-but-possibly-flawed memory. If I’m totally wrong about his intended statement, I’d just substitute my own claim: when I tell non-AI people that we’re building AI smarter than us, they usually think it sounds dangerous as fuck. Educated people often think of current AI concerns like deepfakes and bias they’ve heard about in the news, but people who haven’t thought about AI much at all often understand the direction of my x-risk concerns as being about sci-fi, fully agentic AI entities, and just say “yeah, holy shit”.
Technically this should probably be “foundation model-based AGI”. I continue to use LLM even when multimodal capacities are trained into the foundation model, because it’s shorter, and because language continues to be the foundation of their intelligence. Language condenses the conceptual aspect of human cognition very well. I think that’s key to understanding the a-priori surprising result that simply predicting next words gives rise to substantial human-like intelligence.
Would Xi Jinping be a disastrous emperor-for-eternity? I certainly don’t know. The excellent 80,000 Hours interview with Sihao Huang clarified (among many other China/AI issues) one reason we don’t know what Xi is thinking: he plays his cards close to his chest. He may be a reasonably well-intentioned human being who’s willing to break a lot of eggs to make a really big omelette. Or he could be the sort of sociopath and sadist that Stalin and Putin seem to be. I’d rather have someone really trustworthy in charge—but how much risk of misalignment would I take to put the US government in charge over China’s? I don’t know. I’d love sources for real insight on his and the CCP’s true character; it might be important.
Current Attitudes Toward AI Provide Little Data Relevant to Attitudes Toward AGI
Epistemic status: Sudden public attitude shift seems quite possible, but I haven’t seen it much in discussion, so I thought I’d float the idea again. This is somewhat dashed off since the goal is just to toss out a few possibilities and questions.
In Current AIs Provide Nearly No Data Relevant to AGI Alignment, Thane Ruthenis argues that current AI is almost irrelevant to the project of aligning AGIs. Current AI is simply not what we’re talking about when we worry about alignment, he says. And I halfway agree.[1]
By a similar token, we haven’t yet seen the thing we’re worried about, so attitudes now provide limited data about attitudes toward the real deal. It looks to me like people are rarely really thinking of superhuman intelligence with human-like goals and agency, and when they are, they usually find it highly concerning. We’ve tried to get people to think about powerful AGI, but I think that’s largely failed, at least among people familiar with AI and ML.
People are typically bad at imagining hypothetical scenarios, and we are notoriously shortsighted. People who refuse to worry about AGI existential risks appear to be some combination of not really imagining it, and assuming it won’t happen soon. Seeing dramatic evidence of actual AGI, with agency, competence, and strange intelligence, might quickly change public beliefs.
Leopold Aschenbrenner noted in Nobody’s on the ball on AGI alignment that the public’s attitude toward COVID turned on a dime, so a similar attitude shift is possible on AGI and alignment as well. This sudden change happened in response to evidence, but was made more rapid by nonlinear dynamics in the public discourse: a few people got very concerned and told others rather forcefully that they should also be very concerned, citing evidence; this spread rapidly. The same could happen for AGI.
As Connor Leahy put it (approximately, maybe):[2] when we talk to AI and ML people about AGI x-risk, they scoff. When we tell regular people that we’re building machines smarter than us, they often say something like “You fucking what?!” I think this happens because the AI/ML people think of existing and previous AI systems, while the public thinks of AIs from fiction—which actually have the properties of agency and goal-directedness we’re worried about. This class of people won’t change their beliefs but rather their urgency, if and when they see evidence that such sci-fi AI has been achieved.
That leaves the “expert” doubtersWhen I look closely at public statements, I think that most people who say they don’t believe in AGI x-risk simply don’t believe in real full AGI happening soon enough to bother thinking about now. If that is visibly proven false (before it’s too late), that could create a massive change in public opinion.
People are prone to see faces in the clouds, see ghosts, and attribute intelligence and intention to their pets. People who do talk extensively with LLMs are sure they have personalities and consciousness. So you’d think people would over-attribute agency and therefore danger to current AIs.
We can be impressed by the intelligence of an LLM without worrying about it taking over. They are clearly non-agentic in the sense of not having the capacity to affect the real world. And they don’t sit and think to themselves when we’re not around, so we don’t wonder what it’s thinking about or planning. o1 with its hidden train of thought is stretching that—but it does summarize, and it doesn’t think for long. It still seems to be thinking about only and exactly what we asked it to. And LLMs don’t have persistent goals to motivate their agency. It’s hard to believe that something like that could be an existential threat. The relative ease of adding agency and continuous learning is not at all obvious.
If any of those conditions change, emotional reactions might change. If the thing you’re talking to has not only intelligence, but agency, goals, and persistance (“real AGI”), the average person might think about it very differently. Could they fail to notice if they don’t look closely or think long? Sure. Some of them certainly will. But it only takes a few to take it seriously and tell their friends how creeped out they are by the whole thing. That’s how the panic around COVID spread and changed average attitudes dramatically within weeks.
Addenda: possible effects and causes of public freakout
The above was my main point here: we might see dramatic shifts in public opinion if there’s evidence of real AGI while public opinion might still be relevant. You can reach your own conclusions on what might cause this, and what effects it might have.
I can’t resist exploring the logic a little more. If you find this all credible, it leaves two questions: will this happen before it’s too late, and will it actually be helpful if the public goes from blissfully ignoring the whole thing to freaking out about it?
Effects
Here I’ll indulge in some speculation: I think a public freakout could be very helpful. It could be harnessed to insist that the government take control of all AGI projects and use them responsibly. This to me seems like a least-bad scenario. It seems overwhelmingly likely to me that government takes over AGI before it takes over government, at least in the slow-takeoff scenarios resulting from LLM-based[3] AGI in shorter timelines.
There are other scenarios in which public freakout is bad. It could cause a severe slowdown in AGI progress in the US. This could either make the race with China close, causing corner-cutting on safety, quite possibly causing doom from misaligned AGI. Or it could even cause China to decisively win the race for AGI.[4]
It’s worth noting that the possibility of rapid attitude shift applies to people in government as well as the public.
Causes
Finally: will it happen before it’s too late? It probably will if language model agents are the route to first AGI, which also seems fairly likely. Language model agents are creepily human-like, even when they’re thoroughly stupid and amnesic, and so not dangerous.
I think people would recognize the danger if we have parahuman AGI that’s not yet smart enough to be dangerous, but has the agency and persistence that current AI lacks. This would trigger people to recognize it as a parahuman entity and therefore interesting and dangerous — like humans.
This is a weak argument to actually advance language model agent progress; if it reaches AGI first, it might be the easiest sort to align and interpret. If it doesn’t, progress on that route could still cause people to start taking AGI x-risk seriously. An ideal scenario would be a dead-end at semi-competent, agentic LLM agents that are too slow and error-prone to succeed at takeover, but which cause major damage (hopefully just by spending millions of their users’ money) or are deployed in unsuccessful mischief, a la the ChaosGPT joke/demonstration.
Notable job loss is another possible cause of public freakout.
Conclusion
Humans are unpredictable and shortsighted. Opinions don’t change, until they do. And humans in societies seem to possibly be even more mercurial and shortsighted. We should take our best guesses and plan accordingly.
I agree with Ruthenis that current AI provide little insight on alignment of the real, dangerous AGI that seems inevitable. But I do think it provides nontrivial relevant data. If AGI is built based on or even related to current AI (e.g. if language model agents reach real AGI) then current AI has something valuable to say about aligning AGI—but it isn’t the full story, since full AGI will have very different properties.
Following this metaphor, I’d agree that attitudes toward current AI do provide some evidence of attitudes toward real AGI — but not much.
I’m not finding Connor’s original quote, but that’s my vivid-but-possibly-flawed memory. If I’m totally wrong about his intended statement, I’d just substitute my own claim: when I tell non-AI people that we’re building AI smarter than us, they usually think it sounds dangerous as fuck. Educated people often think of current AI concerns like deepfakes and bias they’ve heard about in the news, but people who haven’t thought about AI much at all often understand the direction of my x-risk concerns as being about sci-fi, fully agentic AI entities, and just say “yeah, holy shit”.
Technically this should probably be “foundation model-based AGI”. I continue to use LLM even when multimodal capacities are trained into the foundation model, because it’s shorter, and because language continues to be the foundation of their intelligence. Language condenses the conceptual aspect of human cognition very well. I think that’s key to understanding the a-priori surprising result that simply predicting next words gives rise to substantial human-like intelligence.
Would Xi Jinping be a disastrous emperor-for-eternity? I certainly don’t know. The excellent 80,000 Hours interview with Sihao Huang clarified (among many other China/AI issues) one reason we don’t know what Xi is thinking: he plays his cards close to his chest. He may be a reasonably well-intentioned human being who’s willing to break a lot of eggs to make a really big omelette. Or he could be the sort of sociopath and sadist that Stalin and Putin seem to be. I’d rather have someone really trustworthy in charge—but how much risk of misalignment would I take to put the US government in charge over China’s? I don’t know. I’d love sources for real insight on his and the CCP’s true character; it might be important.