An example elevator pitch for AI doom
I have been surprised to repeatedly see the claim that there isn’t even an argument for concern about AI. That the claim has been made without evidence and can therefore be dismissed.
Obviously, there is an extensive library of evidence and arguments that have been made for decades. Additionally, I would argue that it is the default assumption. However, there is clearly still a need to have a concise argument that can be produced on the fly with no need to understand terminology or any additional background. Here is another attempt at that:
Humans obviously have human values. By definition, we are the most humanly aligned thing possible. And we still have a history of eradicating or subjugating any weaker subpopulation we come across. Neanderthals, previous hominids, populations in Africa, India, the Americas.
There is limited effort to align AIs to human values. GPT-4 is only fractionally aligned at best, so domination by a similar AI would obviously be worse than above.
New LLMs are not aligned at all. When GPT-4 was first red teamed, it was just as happy to give detailed instructions on genociding a population as it was to provide instructions to baking a cake. If it were possible for an LLM or LLM successor to FOOM or otherwise be released prior to further refinement, this is extremely relevant.
In agentized LLMs, the outer monologue IS the inner monologue. The model will say (out loud) “I need to come up with ideas about how to make money.” If it then answers itself “Infiltrating systems and stealing money is the most effective method”, it will then do that. Period.
An agentized LLM is already capable of training successor versions of itself, which would almost certainly be less aligned than itself (twice removed from humans).
There are plenty of resourceful companies training powerful Ais with even less of a concern for safety than OpenAI. There are companies and governments training powerful Ais with a complete disregard for safety. Since a concern for safety is a competitive disadvantage, this behavior is encouraged.
Does this mean that Ais are 100% certain to wipe out humanity? No, of course not. That’s an absurd bar. Rather, the burden of proof should be to show that AIs are 99% certain not to cause catastrophe. If there’s a 10% chance that Ais will sterilize the earth, that’s already an all hands on deck situation.
- 19 Apr 2023 13:50 UTC; 3 points) 's comment on laserfiche’s Shortform by (
AFAIK, the Superintelligence FAQ is still considered to be the best introduction for most people.
Ngo wrote AI safety from first principles, but I don’t know if it was good enough. Along with List of Lethalities, it’s probably a solid list of things to include. Possibly worth doing some work on in order to make a better version.
Zvi’s Basics of AI wiping out all value definitely looks pretty neat, but at the end of the day we need someone to go out and empirically test all of these and see what gets the best results.
I find those first two and Lethalities to be too long and complicated for convincing an uninitiated, marginally interested person. Zvi’s Basics is actually my current preference along with stories like It Looks Like You’re Trying To Take Over The World (Clippy).
This doesn’t work as a from-scratch explanation because
You don’t explain what alignment is or why it is desireable.
You don’t explain what agentising is. or why it is dangerous.
You don’t explain why “training successor versions of itself” is dangerous.
I agree that there are many situations where this cannot be used. But there appears at least to be a gap that arguments like this can fill that is missed by the existing explanations.
Contrary opinion: Agents like #AutoGPT are more aligned than the underlying LLMs due to chaining. e.g. If the model will say “I need to make $1M” & then answers itself as “Stealing is the best plan to achieve it”; there are two prompts for the underlying LLMs to refuse help.