Solving for “A viable attack, maximum impact” given an exhaustive list of resources and constraints seems like precisely the sort of thing GPT-4-level AI can solve with aplomb when working hand in hand with a human operator. As the example of shooting a substation, humans could probably solve this in a workshop-style discussion with some Operations Research principles applied, but I assume the type of people wanting to do those things probably don’t operate in such functional and organized ways. When they do, it seems to get very bad.
The LLM can easily supply cross-domain knowledge and think within constraints. With a bit of prompting and brainstorming, one could probably come up with a dozen viable attacks in a few hours. So the lone bad actor doesn’t have to assemble a group of five or six people who are intelligent, perhaps educated, and also want to do an attack. I suspect the only reason people already aren’t prompting for such methods and then setting up automation of them is the existence of guardrails. When truly opensource LLMs get to GPT-4.5 capability and good interfaces to the internet and other software tools (such as phones), we may see a lot of trouble. Fewer people would have the drive and intellect needed (at least early on) to carry out such an attack, but those few could cause very outsized trouble.
TL;DR: The “Fun” starts waaaaaaay before we get to AGI.
As I think more about this, the LLM as a collaborator alone might have a major impact. Just off the top of my head, a kind of Rube Goldberg attack might be <redacted for info hazard>. Thinking about it in one’s isolated mind, someone might never consider carrying something like that out. Again, I am trying to model the type of person who carries out a real attack, and I don’t estimate that person having above-average levels of self confidence. I suspect the default is to doubt themselves enough to avoid acting in the same way most people do about their entreprenurial ideas.
However, if they either presented it to an LLM for refinement, or if the LLM suggested it, there could be just enough psychological boost of validity to push them over the edge to trying it. And after a few successes on the news of either “dumb” or “bizarre” or “innovative” attacks being successful due to “AI telling these people how to do it” then the effect might get even stronger.
To my knowledge, one could have bought an AR-15 since the mid to late 1970s. My cousin has a Colt from 1981 he bought when he was 19. Yet people weren’t mass shooting each other, even during times when the overall crime/murder rate was higher than it is now. Some confluence of factors has driven the surge, one of them probably being a strong meme, “Oh, this actually tends to ‴work.″” Basically, a type of social proofing of efficacy.
And I am willing to bet $100 that the media will report big on the first few cases of “Weird Attacks Designed by AI.”
It seems obvious to me that the biggest problems in alignment are going to be the humans, both long before the robots, and probably long after.
Solving for “A viable attack, maximum impact” given an exhaustive list of resources and constraints seems like precisely the sort of thing GPT-4-level AI can solve with aplomb when working hand in hand with a human operator. As the example of shooting a substation, humans could probably solve this in a workshop-style discussion with some Operations Research principles applied, but I assume the type of people wanting to do those things probably don’t operate in such functional and organized ways. When they do, it seems to get very bad.
The LLM can easily supply cross-domain knowledge and think within constraints. With a bit of prompting and brainstorming, one could probably come up with a dozen viable attacks in a few hours. So the lone bad actor doesn’t have to assemble a group of five or six people who are intelligent, perhaps educated, and also want to do an attack. I suspect the only reason people already aren’t prompting for such methods and then setting up automation of them is the existence of guardrails. When truly opensource LLMs get to GPT-4.5 capability and good interfaces to the internet and other software tools (such as phones), we may see a lot of trouble. Fewer people would have the drive and intellect needed (at least early on) to carry out such an attack, but those few could cause very outsized trouble.
TL;DR: The “Fun” starts waaaaaaay before we get to AGI.
As I think more about this, the LLM as a collaborator alone might have a major impact. Just off the top of my head, a kind of Rube Goldberg attack might be <redacted for info hazard>. Thinking about it in one’s isolated mind, someone might never consider carrying something like that out. Again, I am trying to model the type of person who carries out a real attack, and I don’t estimate that person having above-average levels of self confidence. I suspect the default is to doubt themselves enough to avoid acting in the same way most people do about their entreprenurial ideas.
However, if they either presented it to an LLM for refinement, or if the LLM suggested it, there could be just enough psychological boost of validity to push them over the edge to trying it. And after a few successes on the news of either “dumb” or “bizarre” or “innovative” attacks being successful due to “AI telling these people how to do it” then the effect might get even stronger.
To my knowledge, one could have bought an AR-15 since the mid to late 1970s. My cousin has a Colt from 1981 he bought when he was 19. Yet people weren’t mass shooting each other, even during times when the overall crime/murder rate was higher than it is now. Some confluence of factors has driven the surge, one of them probably being a strong meme, “Oh, this actually tends to ‴work.″” Basically, a type of social proofing of efficacy.
And I am willing to bet $100 that the media will report big on the first few cases of “Weird Attacks Designed by AI.”
It seems obvious to me that the biggest problems in alignment are going to be the humans, both long before the robots, and probably long after.