Sorry, but you’re overthinking what’s required. Simply being able to reliably use existing techniques is more than enough to hack the minds of large groups of people, no complex new research needed.
Here is a concrete example.
First, if you want someone’s attention, just make them feel listened to. ELIZA could already successfully do this in the 1970s, ChatGPT is better. The result is what therapists call transference, and causes the person to wish to please the AI.
Now the AI can use the same basic toolkit mastered by demagogues throughout history. Use simple and repetitive language to hit emotional buttons over and over again. Try to get followers to form a social group. Switch positions every so often. Those that pay insufficient attention will have the painful experience of being attacked by their friends, and it forces everyone to pay more attention.
All of this is known and effective. What AI brings is that it can use individualized techniques, at scale, to suck people into many target groups. And once they are in those groups, it can use the demagogue’s techniques to erase differences and get them aligned into ever bigger groups.
The result is that, as Sam Altman predicted, LLMs will prove superhumanly persuasive. They can beat the demagogues at their own game by seeding the mass persuasion techniques by individual attention at scale.
Do you think that this isn’t going to happen? Social media accidentally did a lot of this at scale. Now it is just a question of weaponizing something like TikTok.
Yes, like the example of “Clown Attacks” isn’t at all novel or limited to AI, it’s old stuff. And it’s not even true that you can’t be resistant to them, though these days going actively against peer pressure in these things isn’t very fashionable. That said, the biggest risk of LLMs right now is indeed IMO how well they can enact certain forms of propaganda and sentiment analysis en masse. No longer can I say “the government wouldn’t literally read ALL your emails to figure out what you think, you’re not worth the work it would take”: now it might, because the cost has dramatically dropped.
That said, the biggest risk of LLMs right now is indeed IMO how well they can enact certain forms of propaganda and sentiment analysis en masse.
I agree with the contents of this comment in general, but not the idea that propaganda generation is the greatest risk. Lots of people know about that already, and I’d argue that the risk to democracy via persuading the masses isn’t very tractable, whereas the risk to the AI safety community via manipulating elites in random ways with automated high-level psychology research, is very tractable (minimize sensor exposure).
My point wasn’t that it would be a very new capability in general, but it could be deployed at a scale and cost impossible before. Armies of extremely smart and believable bots flooding social media all around. The “huh, everyone else except me thinks X, maybe they do have a point/my own belief is hopeless” gregariousness effect is real, and has often been used already, but this allows bad actors to take it to a whole new level. This could also be deployed as you say against AI safety itself, but not exclusively.
I think that this is pretty appropriately steel manned, given that there are tons of people here who do ML daily way better than I ever could in my best moments, but they’re clueless about the industry/economics/natsec side and that’s a quick fix on my part. If I’m underestimating the severity of the situation then they’d know.
I think that this is more like zero days in the human brain; like performing a magic trick that prevents an entire room full of string theorists from thinking about something that’s important to them e.g. lab leak hypothesis. Making people talk about specific concepts in their sleep. Lie detectors that only need a voice recording from a distance. Seeing which concepts scare people and which concepts don’t based on small changes in their heart rate. etc.
My intuition says that engineering problems like these are hard and all sorts of problems crop up, not just Goodhart’s law. Yann Lecun wrote a great post on this:
1. Building something that works in the real world is harder than what most armchair AI safety folks think.
2. there is a natural tendency to exaggerate the potential risks of your own work, because it makes you feel powerful and important.
This seems pretty in-line with some pretty fundamental engineering principles e.g. spaghetti code systems. I think that a big part of it is that, unlike in China, western bigtech companies and intelligence agencies are strangled from small workforces of psychology researchers/data labellers/hypothesis generators due to Snowden risk. Although discovering the human internal causality interpretability dynamic was a big update in the other direction, it being really easy to get away with doing a ton of stuff to people with very little data and older AI systems.
I disagree that people who do ML daily would be in a good position to judge the risks here. The key issue is not the capabilities of AI, but rather the level of vulnerability of the brain. Since they don’t study that, they can’t judge it.
It is like how scientists proved to be terrible at unmasking charlatans like Uri Geller. Nature doesn’t actively try to fool us, charlatans do. The people with actual relevant expertise were people who studied how people can be fooled. Which meant magicians like James Randi. Similarly, to judge this risk, I think you should look at how dictators, cult leaders, and MLM companies operate.
A century ago Benito Mussolini figured out how to use mass media to control the minds of a mass audience. He used this to generate a mass following, and become dictator of Italy.. The same vulnerabilities exploited the same way have become a staple for demagogues and would-be dictators ever since. But human brains haven’t been updated. And so Donald Trump has managed to use the same basic rootkit to amass about 70 million devoted followers. As we near the end of 2023, he still has a chance of successfully overthrowing our democracy if he can avoid jail.
Your thinking about zero days is a demonstration of how thinking in terms of computers can mislead you. What matters for an attack is the availability of vulnerable potential victims. In computers there is a correlation between novelty and availability. Before anyone knows about a vulnerability, everyone is available for your attack. Then it is discovered, a patch is created, and availability goes down as people update. But humans don’t simply upgrade to brain 2.1.8 to fix the vulnerabilities found in brain 2.1.7. People can be brainwashed today by the same techniques that the CIA was studying when they funded the Reverend Sun Moon back in the 1960s.
You do make an excellent point about the difficulty of building something that can work at scale in the real world. Which is why I focused my scenario on techniques that have worked, repeatedly, at scale. We know that they can work, because they have worked. We see it in operation whenever we study the propaganda techniques used by dictators like Putin.
Given these examples, the question stops being an abstract, “Can AI find vulnerabilities by which we can be exploited?” It then switches to, “Is AI capable of executing effectrive variants on the strategies that dictators, cult leaders and MLM founders already have shown works at scale against human minds?”
I think that the answer is a pretty clear yes. Properly directed, ChatGPT should be more than capable of doing this. We then have the hallmark of a promising technology, we know that nothing fundamentally new is required. It is just a question of execution.
My thinking about this (and other people like Tristan Harris who can’t think about superintelligence) is that the big difference is that persuasion, as a science, is getting amplified by orders of magnitude greater than the 20th century.
As a result, the AI safety community is at risk of getting blindsided by manipulation strategies that we’re vulnerable to because we don’t recognize them.
I don’t imagine CFAR’s founders as being particularly vulnerable to clown attacks, for example, but they also would also fail to notice clown attacks being repeatedly tested against them; so it stands to reason that today’s AI would be able to locate something that would both work on them AND prevent them from noticing, if it had enough social media scrolling data to find novel strategies based on results.
I’m less interested in the mass psychology stuff from the 2020s because a lot of that was meant to target elites who influenced more people downstream, and elites are now harder to fool than in the 20th century; and also, if democracy dies, then it dies, and it’s up to us to not die with it. One of the big issues with AI targeting people based on bayes-predicted genes is that it can find one-shot strategies, including selecting 20th century tactics with the best odds of success.
This is why I think that psychology is critical, especially for interpreting causal data, but also we shouldn’t expect things to be too similar to the 20th century because it’s a new dimension, and the 21st century is OOD anyway (OOD is similar to the butterfly effect, changes cause a cascade of other changes).
With all due respect, I see no evidence that elites are harder to fool now than they were in the past. For concrete examples, look at the ones who flipped to Trump over several years. The Corruption of Lindsey Graham gives an especially clear portrayal about how one elite went from condemning Trump to becoming a die-hard supporter.
I dislike a lot about Mr. Graham. But there is no question that he was smart and well aware of how authoritarians gain power. He saw the risk posed by Trump very clearly. However he knew himself to be smart, and thought he could ride the tiger. Instead, his mind got eaten.
Moving on, I believe that you are underestimating the mass psychology stuff. Remember, I’m suggesting it as a floor to what could already be done. New capabilities and discoveries allow us to do more. But what should already be possible is scary enough.
However that is a big topic. I went into it in AI as Super-Demagogue which you will hopefully find interesting.
I think that it’s generally really hard to get a good sense of what’s going on when it comes to politicians, because so much of what they do is intended to make a persona believable and disguise the fact that most of the policymaking happens elsewhere.
Moving on, I believe that you are underestimating the mass psychology stuff. Remember, I’m suggesting it as a floor to what could already be done. New capabilities and discoveries allow us to do more. But what should already be possible is scary enough.
That’s right, the whole point of the situation is that everything I’ve suggested is largely just a floor for what’s possible, and in order to know the current state of the limitations, you need to have the actual data sets + watch the innovation as it happens. Hence why the minimum precautions are so important.
Sorry, but you’re overthinking what’s required. Simply being able to reliably use existing techniques is more than enough to hack the minds of large groups of people, no complex new research needed.
Here is a concrete example.
First, if you want someone’s attention, just make them feel listened to. ELIZA could already successfully do this in the 1970s, ChatGPT is better. The result is what therapists call transference, and causes the person to wish to please the AI.
Now the AI can use the same basic toolkit mastered by demagogues throughout history. Use simple and repetitive language to hit emotional buttons over and over again. Try to get followers to form a social group. Switch positions every so often. Those that pay insufficient attention will have the painful experience of being attacked by their friends, and it forces everyone to pay more attention.
All of this is known and effective. What AI brings is that it can use individualized techniques, at scale, to suck people into many target groups. And once they are in those groups, it can use the demagogue’s techniques to erase differences and get them aligned into ever bigger groups.
The result is that, as Sam Altman predicted, LLMs will prove superhumanly persuasive. They can beat the demagogues at their own game by seeding the mass persuasion techniques by individual attention at scale.
Do you think that this isn’t going to happen? Social media accidentally did a lot of this at scale. Now it is just a question of weaponizing something like TikTok.
Yes, like the example of “Clown Attacks” isn’t at all novel or limited to AI, it’s old stuff. And it’s not even true that you can’t be resistant to them, though these days going actively against peer pressure in these things isn’t very fashionable. That said, the biggest risk of LLMs right now is indeed IMO how well they can enact certain forms of propaganda and sentiment analysis en masse. No longer can I say “the government wouldn’t literally read ALL your emails to figure out what you think, you’re not worth the work it would take”: now it might, because the cost has dramatically dropped.
I agree with the contents of this comment in general, but not the idea that propaganda generation is the greatest risk. Lots of people know about that already, and I’d argue that the risk to democracy via persuading the masses isn’t very tractable, whereas the risk to the AI safety community via manipulating elites in random ways with automated high-level psychology research, is very tractable (minimize sensor exposure).
My point wasn’t that it would be a very new capability in general, but it could be deployed at a scale and cost impossible before. Armies of extremely smart and believable bots flooding social media all around. The “huh, everyone else except me thinks X, maybe they do have a point/my own belief is hopeless” gregariousness effect is real, and has often been used already, but this allows bad actors to take it to a whole new level. This could also be deployed as you say against AI safety itself, but not exclusively.
I think that this is pretty appropriately steel manned, given that there are tons of people here who do ML daily way better than I ever could in my best moments, but they’re clueless about the industry/economics/natsec side and that’s a quick fix on my part. If I’m underestimating the severity of the situation then they’d know.
I think that this is more like zero days in the human brain; like performing a magic trick that prevents an entire room full of string theorists from thinking about something that’s important to them e.g. lab leak hypothesis. Making people talk about specific concepts in their sleep. Lie detectors that only need a voice recording from a distance. Seeing which concepts scare people and which concepts don’t based on small changes in their heart rate. etc.
My intuition says that engineering problems like these are hard and all sorts of problems crop up, not just Goodhart’s law. Yann Lecun wrote a great post on this:
This seems pretty in-line with some pretty fundamental engineering principles e.g. spaghetti code systems. I think that a big part of it is that, unlike in China, western bigtech companies and intelligence agencies are strangled from small workforces of psychology researchers/data labellers/hypothesis generators due to Snowden risk. Although discovering the human internal causality interpretability dynamic was a big update in the other direction, it being really easy to get away with doing a ton of stuff to people with very little data and older AI systems.
I disagree that people who do ML daily would be in a good position to judge the risks here. The key issue is not the capabilities of AI, but rather the level of vulnerability of the brain. Since they don’t study that, they can’t judge it.
It is like how scientists proved to be terrible at unmasking charlatans like Uri Geller. Nature doesn’t actively try to fool us, charlatans do. The people with actual relevant expertise were people who studied how people can be fooled. Which meant magicians like James Randi. Similarly, to judge this risk, I think you should look at how dictators, cult leaders, and MLM companies operate.
A century ago Benito Mussolini figured out how to use mass media to control the minds of a mass audience. He used this to generate a mass following, and become dictator of Italy.. The same vulnerabilities exploited the same way have become a staple for demagogues and would-be dictators ever since. But human brains haven’t been updated. And so Donald Trump has managed to use the same basic rootkit to amass about 70 million devoted followers. As we near the end of 2023, he still has a chance of successfully overthrowing our democracy if he can avoid jail.
Your thinking about zero days is a demonstration of how thinking in terms of computers can mislead you. What matters for an attack is the availability of vulnerable potential victims. In computers there is a correlation between novelty and availability. Before anyone knows about a vulnerability, everyone is available for your attack. Then it is discovered, a patch is created, and availability goes down as people update. But humans don’t simply upgrade to brain 2.1.8 to fix the vulnerabilities found in brain 2.1.7. People can be brainwashed today by the same techniques that the CIA was studying when they funded the Reverend Sun Moon back in the 1960s.
You do make an excellent point about the difficulty of building something that can work at scale in the real world. Which is why I focused my scenario on techniques that have worked, repeatedly, at scale. We know that they can work, because they have worked. We see it in operation whenever we study the propaganda techniques used by dictators like Putin.
Given these examples, the question stops being an abstract, “Can AI find vulnerabilities by which we can be exploited?” It then switches to, “Is AI capable of executing effectrive variants on the strategies that dictators, cult leaders and MLM founders already have shown works at scale against human minds?”
I think that the answer is a pretty clear yes. Properly directed, ChatGPT should be more than capable of doing this. We then have the hallmark of a promising technology, we know that nothing fundamentally new is required. It is just a question of execution.
My thinking about this (and other people like Tristan Harris who can’t think about superintelligence) is that the big difference is that persuasion, as a science, is getting amplified by orders of magnitude greater than the 20th century.
As a result, the AI safety community is at risk of getting blindsided by manipulation strategies that we’re vulnerable to because we don’t recognize them.
I don’t imagine CFAR’s founders as being particularly vulnerable to clown attacks, for example, but they also would also fail to notice clown attacks being repeatedly tested against them; so it stands to reason that today’s AI would be able to locate something that would both work on them AND prevent them from noticing, if it had enough social media scrolling data to find novel strategies based on results.
I’m less interested in the mass psychology stuff from the 2020s because a lot of that was meant to target elites who influenced more people downstream, and elites are now harder to fool than in the 20th century; and also, if democracy dies, then it dies, and it’s up to us to not die with it. One of the big issues with AI targeting people based on bayes-predicted genes is that it can find one-shot strategies, including selecting 20th century tactics with the best odds of success.
This is why I think that psychology is critical, especially for interpreting causal data, but also we shouldn’t expect things to be too similar to the 20th century because it’s a new dimension, and the 21st century is OOD anyway (OOD is similar to the butterfly effect, changes cause a cascade of other changes).
With all due respect, I see no evidence that elites are harder to fool now than they were in the past. For concrete examples, look at the ones who flipped to Trump over several years. The Corruption of Lindsey Graham gives an especially clear portrayal about how one elite went from condemning Trump to becoming a die-hard supporter.
I dislike a lot about Mr. Graham. But there is no question that he was smart and well aware of how authoritarians gain power. He saw the risk posed by Trump very clearly. However he knew himself to be smart, and thought he could ride the tiger. Instead, his mind got eaten.
Moving on, I believe that you are underestimating the mass psychology stuff. Remember, I’m suggesting it as a floor to what could already be done. New capabilities and discoveries allow us to do more. But what should already be possible is scary enough.
However that is a big topic. I went into it in AI as Super-Demagogue which you will hopefully find interesting.
I think that it’s generally really hard to get a good sense of what’s going on when it comes to politicians, because so much of what they do is intended to make a persona believable and disguise the fact that most of the policymaking happens elsewhere.
That’s right, the whole point of the situation is that everything I’ve suggested is largely just a floor for what’s possible, and in order to know the current state of the limitations, you need to have the actual data sets + watch the innovation as it happens. Hence why the minimum precautions are so important.
I’ll read this tomorrow or the day after, this research area has tons of low-hanging fruit and few people looking into it.