I would like to be given pointers to prior art on the following question: how do we communicate about potential risks from AI in a net-positive way, taking into account the negative that is giving people ideas? I can easily see how someone describing steps an AI takeover could happen influences someone who is malicious or just cares less about safety. This directly relates to the research I intend to pursue, relating specific capabilities AI systems do have or can acquire and the risk factors associated with them.
[Question] How not to write the Cookbook of Doom?
No knowledge of prior art, but what do you mean by negative things that give people ideas? I was under impression that most of the examples people talk about involved things that were pretty limited to superhuman capabilities for the time being—self-replicating nanotech and so on. Or are you asking about something other than extinction risk, like chatbots manipulating people or something along those lines? Could you clarify?
As soon as someone managed to turn ChatGPT into an agent (AutoGPT), someone created an agent, ChaosGPT, with the explicit goal to destroy humankind. This is the kind of person that might benefit from having what I intend to produce: an overview of AI capabilities required to end the world, how far along we are in obtaining them, and so on. I want this information to be used to prevent an existential catastrophe, not precipitate it.
No answer for you yet but I’m trying to achieve something similar in my book. I want to avoid infohazards but communicate the threat, and also provide hope. A tricky thing to navigate for sure, especially with diverse audiences.
That’s the fun part: you can’t. It’s not even an AI problem, it’s a “we as a society are at this point with a lot of tech” problem. You can see it playing out with guns, to a certain extent, but guns are a miniscule threat compared to CRISPR and AI. Guns are just one low ladder rung on the “destroys things” stairway to heaven. Rock < knife < arrow < muzzle loader < automatic weapon < grenade < missile < nuke < CRISPR, in threat level right now, and AI is sitting beside each of them ready to give them a nudge into the hands of anyone with basic typing skills and a bit of dedication to engineering.
The only way I can see is if you present the info to a limited audience of validated individuals, but who has both the ethical maturity and the power to do the right thing at the same time right now?
I hear you, thank you for your comment.
I guess I don’t have a clear model for how big is the pool of people who:
have malicious intent;
access LessWrong and other spaces tightly linked to this;
don’t yet have the kind of ideas that my research could provide them with.