I think one big mistake the AI safety movement is currently making is not paying attention to the concerns of the wider population about AI right now. People do not believe that a misaligned AGI will kill them, but are worried about job displacement or the possibility of tyrannical actors using AGI to consolidate power. They’re worried about AI impersonation and the proliferation of misinformation or just plain shoddy computer generated content.
Much like the difference between more local environmental movements and the movement to stop climate change, focusing on far-off, global-scale issues causes people to care less. It’s easy to deny climate change when it’s something that’s going to happen in decades. People want answers to problems they face now. I also think there’s an element of people’s innate anti-scam defenses going off; the more serious, catastrophic, and consequential a prediction is, the more evidence they will want to prove that it is real. The priors one should have of apocalyptic events are quite low; it doesn’t actually make sense that “They said coffee would end the world, so AGI isn’t a threat” but it does in a way contribute Bayesian evidence towards the inefficacy of apocalypse predictions.
On the topic of evidence, I think it is also problematic that the AI safety community has been extremely short on messaging for the past 3 or so years. People are simply not convinced that an AGI would spell doom for them. The consensus appears to be that LLMs do not represent a significant threat no matter how advanced they become. It is “not real AI”, it’s “just a glorified autocomplete”. Traditional AI safety arguments hold little water because they describe a type of AI that does not actually exist. LLMs and AI systems derived from them do not possess utility functions, do understand human commands and obey them, and exhibit a comprehensive understanding of social norms, which they follow. LLMs are trained on human data, so they behave like humans. I have yet to see any convincing argument other than a simple rejection that explains why RLHF or related practices like constitutional AI do not actually constitute a successful form of AI alignment. All of the “evidence” for misalignment is shaky at best or an outright fabrication at worst. This lack of an argument is really the key problem behind AI safety. It strikes outsiders as delusional.
I think one big mistake the AI safety movement is currently making is not paying attention to the concerns of the wider population about AI right now. People do not believe that a misaligned AGI will kill them, but are worried about job displacement or the possibility of tyrannical actors using AGI to consolidate power. They’re worried about AI impersonation and the proliferation of misinformation or just plain shoddy computer generated content.
Much like the difference between more local environmental movements and the movement to stop climate change, focusing on far-off, global-scale issues causes people to care less. It’s easy to deny climate change when it’s something that’s going to happen in decades. People want answers to problems they face now. I also think there’s an element of people’s innate anti-scam defenses going off; the more serious, catastrophic, and consequential a prediction is, the more evidence they will want to prove that it is real. The priors one should have of apocalyptic events are quite low; it doesn’t actually make sense that “They said coffee would end the world, so AGI isn’t a threat” but it does in a way contribute Bayesian evidence towards the inefficacy of apocalypse predictions.
On the topic of evidence, I think it is also problematic that the AI safety community has been extremely short on messaging for the past 3 or so years. People are simply not convinced that an AGI would spell doom for them. The consensus appears to be that LLMs do not represent a significant threat no matter how advanced they become. It is “not real AI”, it’s “just a glorified autocomplete”. Traditional AI safety arguments hold little water because they describe a type of AI that does not actually exist. LLMs and AI systems derived from them do not possess utility functions, do understand human commands and obey them, and exhibit a comprehensive understanding of social norms, which they follow. LLMs are trained on human data, so they behave like humans. I have yet to see any convincing argument other than a simple rejection that explains why RLHF or related practices like constitutional AI do not actually constitute a successful form of AI alignment. All of the “evidence” for misalignment is shaky at best or an outright fabrication at worst. This lack of an argument is really the key problem behind AI safety. It strikes outsiders as delusional.