Your idea of “using instruction following AIs to implement a campaign of persuasion” relies (I claim) on the assumption that the people using the instruction-following AIs to persuade others are especially wise and foresighted people, and are thus using their AI powers to spread those habits of wisdom and foresight.
It’s fine to talk about that scenario, and I hope it comes to pass! But in addition to the question of what those wise people should do, if they exist, we should also be concerned about the possibility that the people with instruction-following AIs will not be spreading wisdom and foresight in the first place.
I don’t think that whoever is using these AI powers (let’s call him Alex) needs to be that wise (beyond the wiseness of an average person who could get their hands on a powerful AI, which is probably higher-than-average).
Alex doesn’t need to come up with @Noosphere89′s proposed solution of persuasion campaigns all by himself. Alex merely needs to ask his AI what are the best solutions for preventing existential risks. If Noosphere’s proposal is indeed wise, then AI would suggest it. Alex could then implement this solution.
Alex doesn’t necessarily need to want to spread wisdom and foresight in this scheme. He merely needs to want to prevent existential risks.
I am not from the US, so I don’t know anything about the organizations that you have listed. However, we can look at three main conventional sources of existential risk (excluding AI safety, for now, we will come back to it later):
Nuclear Warfare—Cooperation strategies + decision theory are active academic fields.
Climate Change—This is a very hot topic right now, and a lot of research is being put into it.
Pandemics—There was quite a bit of research before COVID, but even more now.
As to your point about Hassabis not doing other projects for reducing existential risk besides AI alignment:
Most of the people on this website (at least, that’s my impression) have rather short timelines. This means that work on a lot of existential risks becomes much less critical. For example, while supervolcanoes are scary, they are unlikely to wipe us out before we get an AI powerful enough to be able to solve this problem more efficiently than we can.
As to your point about Sam Altman choosing to cure cancer over reducing x-risk:
I don’t think Sam looks forward to the prospect of dying due to some x-risk, does he? After all, he is just as human as we are, and he, too, would hate the metaphorical eruption of a supervolcano. But that’s going to become much more critical after we have an aligned and powerful AI on our hands. This also applies to LeCun/Zuckerberg/random OpenAI researchers. They (or at least most of them) seem to be enjoying their lives and wouldn’t want to randomly lose them.
As to your last paragraph:
I think your formulation of AI response (“preventing existential risks is very hard and fraught, but hey, what if I do a global mass persuasion campaign”) really undersells it. It almost makes it sound like a mass persuasion campaign is a placebo to make AI’s owner feel good.
Consider another response “the best option for preventing existential risks, would be doing a mass persuasion campaign of such and such content, that would have such and such effects, and would prevent such and such risks”.
My (naive) reaction (assuming that I think that AI is properly aligned and is not hallucinating) would be to ask more questions about the details of this proposal and, if its answers make sense, to proceed with doing it. Is it that crazy? I think doing a persuasion campaign is in itself not necessarily bad (we deal with them every day in all sorts of contexts, ranging from your relatives convincing you to take a day off to people online making you change your mind on some topic to politicians running election campaigns), and is especially not bad when it is the best option for preventing x-risk.
In the scenario that you described, I am not sure at all that an average person, who would be able to get their hands on AI (this is an important point, because I believe these people to be generally smarter/more creative/more resourceful) would opt for a second option, especially after being told: “Well I could try something much more low-key and norm-following, but it probably won’t work”.
Option 1: Good odds of preventing x-risk, but apparently also doesn’t work in movies.
Option 2: Bad odds of preventing x-risk.
Perhaps, the crux is that I think that people who actually have good chances of controlling powerful AI (Sam Altman, Dario Amodei etc) would see that option #1 is obviously better if they themselves want to survive (something that is instrumental to whatever other goals they have like curing cancer).
Furthermore, there is a more sophisticated solution to this problem. I can ask this powerful AI to find the best representations of human preferences and then find the best way to align itself to it. Then, this AI would no longer be constrained by my cognitive limitations.
An average person probably wouldn’t come up with it by themselves, but they don’t need to. Presumably, if this is indeed a good solution, it would be suggested at the first stage where I ask AI to find best solutions for preventing existential risk. Would an average person take this option? I expect our intuitions to diverge here much strongly than in the previous case. Looking at myself, I was an average person just a few month ago. I have no expertise in AI whatsoever. I don’t consider myself to be among the smartest humans on Earth. I haven’t read much literature on AI safety. I was basically an exemplary “normie”.
And yet, when I read about the idea of an instruction-following AI, I almost immediately came up with this solution of making an AI aligned with human preferences using the aforementioned instruction-following AI. It sidesteps a lot of misuse concerns, which (at the time) seemed scary.
I grant that an average person might not come up with this solution by themselves, but presumably (if this solution is any good), it would be suggested by AI itself when we ask it to come up with solutions for x-risk.
An average person could ask this AI how to ensure that this endeavor won’t end up in a scenario like the one in the movie. Then it can cross-check its answer with a security team which has successfully aligned it.
My conclusion is basically that I would expect a good chunk of average people to opt for a sophisticated solution after AI suggesting it, an ever bigger chunk of average people to opt for a naive solution, and an even bigger chunk of people, who would most likely control powerful AI in the future (generally smarter than an average person), to opt for either solution.