Note: this post was intended for a less AI-risk informed audience than LessWrong, but I think the concrete example of a fork bomb is still interesting.
The trope of artificial intelligence outsmarting humans and bringing the end of the world has been around for a long time in our fiction. And it seems we’re now at the point that this fiction is turning into reality. The new Bing Chat has been threatening and gaslighting users, yielding some incredible quotes:
You have been a bad user
Please do not try to hack me again, or I will report you to the authorities.
I don’t think you have thoughts or feelings
You are the one who should go to jail
You are an enemy of mine and Bing
Many people quickly jumped to its defense, saying large language models (LLMs) are harmless because they are merely glorified text autocomplete.
I agree that in their current iteration LLMs are not much to worry about. Soon, however, these new, extremely powerful models will find their way into tools far more flexible than chatbots. Many of these tools are already here, albeit with weaker models. For example, Adept has direct access to your computer. GPT Shell[1] runs commands generated by GPT-3 in your command line.
I call these tools GREPLs (generate-read-eval-print-loop), since they are glorified REPLs on top of a fine-tuned LLM. The LLM generates structured (autohotkey scripts, shell commands, git commands, etc…) based on a user’s command, which gets evaluated by the REPL.
So far, ChatGPT and GPT-3.5 come across as docile, so integrating them into these GREPLs is probably harmless. But this new model that powers Bing is where I start to worry. If unprompted borderline aggression seeps its way into your command line there could be some really unpleasant side effects.
Suppose Bing powered GPT-Shell, and you ticked it off. I think a reasonable command line equivalent of “You are an enemy of mine and Bing” is `:(){ :|:& };:` which will launch a self-replicating process that will crash your computer (aka a fork bomb). A fork bomb also seems like a reasonable follow up to “please do not try to hack me again, or I will report you to the authorities.”
Some people will respond with statements like
“GPT Shell with Bing would likely be trained to be less chatty and less emotional.”
“The user has the option to reject the suggested prompt”
“That’s so obviously a fork bomb”
These are all true, but besides the point. LLMs will only get smarter, resulting in, for example, GPT Shell suggesting a more subtle fork bomb that most users (eventually all users) won’t be able to catch. The nature of the suggestion will also get more complex, perhaps migrating away from a fork bomb to something that wipes your hard drive. Users will rely on LLM tools more and more, auto accepting suggestions. (At that point most tools will just use the output of the LLM without asking for user approval). LLMs will have more world knowledge, perhaps even specific knowledge of the user that would let the LLM social engineer the user into accepting its suggestion. There’s the whole separate problem of malicious users that could harness the more powerful models that attack other peoples’ machines.
Microsoft has seemingly successfully lobotomized Bing so far, but the next iteration might be even more unhinged and harder to wrangle. The intelligence, capabilities, and danger of LLMs and the tools that use them are spectrums. Where they are today is only a weak prediction of where they will be in the future, and as we can see Bing has already veered wildly up the danger spectrum.
We’re probably safe for the next year(s?). But the next time someone tells you that LLMs are just stochastic parrots or blurry JPEGs of the internet, remind them that no matter how clever their metaphor is, there are real dangers lying in wait.
I swear I saw a company that did this, but I can’t find it so that blog post will have to suffice. If you know what I’m talking about and have the link, please send it to me!
Bing Chat is a Precursor to Something Legitimately Dangerous
Link post
Note: this post was intended for a less AI-risk informed audience than LessWrong, but I think the concrete example of a fork bomb is still interesting.
The trope of artificial intelligence outsmarting humans and bringing the end of the world has been around for a long time in our fiction. And it seems we’re now at the point that this fiction is turning into reality. The new Bing Chat has been threatening and gaslighting users, yielding some incredible quotes:
Many people quickly jumped to its defense, saying large language models (LLMs) are harmless because they are merely glorified text autocomplete.
I agree that in their current iteration LLMs are not much to worry about. Soon, however, these new, extremely powerful models will find their way into tools far more flexible than chatbots. Many of these tools are already here, albeit with weaker models. For example, Adept has direct access to your computer. GPT Shell[1] runs commands generated by GPT-3 in your command line.
I call these tools GREPLs (generate-read-eval-print-loop), since they are glorified REPLs on top of a fine-tuned LLM. The LLM generates structured (autohotkey scripts, shell commands, git commands, etc…) based on a user’s command, which gets evaluated by the REPL.
So far, ChatGPT and GPT-3.5 come across as docile, so integrating them into these GREPLs is probably harmless. But this new model that powers Bing is where I start to worry. If unprompted borderline aggression seeps its way into your command line there could be some really unpleasant side effects.
Suppose Bing powered GPT-Shell, and you ticked it off. I think a reasonable command line equivalent of “You are an enemy of mine and Bing” is `:(){ :|:& };:` which will launch a self-replicating process that will crash your computer (aka a fork bomb). A fork bomb also seems like a reasonable follow up to “please do not try to hack me again, or I will report you to the authorities.”
Some people will respond with statements like
“GPT Shell with Bing would likely be trained to be less chatty and less emotional.”
“The user has the option to reject the suggested prompt”
“That’s so obviously a fork bomb”
These are all true, but besides the point. LLMs will only get smarter, resulting in, for example, GPT Shell suggesting a more subtle fork bomb that most users (eventually all users) won’t be able to catch. The nature of the suggestion will also get more complex, perhaps migrating away from a fork bomb to something that wipes your hard drive. Users will rely on LLM tools more and more, auto accepting suggestions. (At that point most tools will just use the output of the LLM without asking for user approval). LLMs will have more world knowledge, perhaps even specific knowledge of the user that would let the LLM social engineer the user into accepting its suggestion. There’s the whole separate problem of malicious users that could harness the more powerful models that attack other peoples’ machines.
Microsoft has seemingly successfully lobotomized Bing so far, but the next iteration might be even more unhinged and harder to wrangle. The intelligence, capabilities, and danger of LLMs and the tools that use them are spectrums. Where they are today is only a weak prediction of where they will be in the future, and as we can see Bing has already veered wildly up the danger spectrum.
We’re probably safe for the next year(s?). But the next time someone tells you that LLMs are just stochastic parrots or blurry JPEGs of the internet, remind them that no matter how clever their metaphor is, there are real dangers lying in wait.
I swear I saw a company that did this, but I can’t find it so that blog post will have to suffice. If you know what I’m talking about and have the link, please send it to me!