307th comments on AI Safety is Dropping the Ball on Clown Attacks

307th 22 Oct 2023 15:03 UTC
24 points
6
This post is fun but I think it’s worth pointing out that basically nothing in it is true.

-”Clown attacks” are not a common or particularly effective form of persuasion
-They are certainly not a zero day exploit; having a low status person say X because you don’t want people to believe X has been available to humans for our entire evolutionary history
-Zero day exploits in general are not a thing you have to worry about; it isn’t an analogy that applies to humans because we’re far more robust than software. A zero day exploit on an operating system can give you total control of it; a ‘zero day exploit’ like junk food can make you consume 5% more calories per day than you otherwise would.
-AI companies have not devoted significant effort to human thought steering, unless you mean “try to drive engagement on a social media website”; they are too busy working on AI.
-AI companies are not going to try to weaponize “human thought steering” against AI safety
-Reading the sequences wouldn’t protect you from mind control if it did exist
-Attempts at manipulation certainly do exist but it will mostly be mass manipulation aimed at driving engagement and selling you things based off of your browser history, rather than a nefarious actor targeting AI safety in particular
- lc 22 Oct 2023 17:49 UTC
  13 points
  11
  Parent
  Zero day exploits in general are not a thing you have to worry about; it isn’t an analogy that applies to humans because we’re far more robust than software. A zero day exploit on an operating system can give you total control of it; a ‘zero day exploit’ like junk food can make you consume 5% more calories per day than you otherwise do.
  
  The “just five percent more calories” example reveals nicely how meaningless this heuristic is. The vast majority of people alive today are the effective mental subjects of some religion, political party, national identity, or combination of the three, no magical backdoor access necessary; the confirmed tools and techniques are sufficient to ruin lives or convince people to do things completely counter to their own interests. And there are intermediate stages of effectiveness that political lobbying can ratchet up along, between the ones they’re at now and total control.
  AI companies have not devoted significant effort to human thought steering, unless you mean “try to drive engagement on a social media website”; they are too busy working on AI.
  AI companies are not going to try to weaponize “human thought steering” against AI safety
  The premise of the above post is not that AI companies are going to try to weaponize “human thought steering” against AI safety. The premise of the above post is that AI companies are going to develop technology that can be used to manipulate people’s affinities and politics, Intel agencies will pilfer it or ask for it, and then it’s going to be weaponized, to a degree of much greater effectiveness than they have been able to afford historically. I’m ambivalent about the included story in particular being carried out, but if you care about anything (such as AI safety), it’s probably necessary that you keep your utilityfunction intact.
- trevor 22 Oct 2023 15:26 UTC
  11 points
  1
  Parent
  1. Yes they are, clown attacks are an incredibly powerful and flexible form of Overton window manipulation. They can even become a self-fulfilling prophecy by selectively sorting domains of thought among winners and losers in real life, e.g. only losers think about lab leak hypothesis.
  2. It’s a zero-day exploit because it’s a flaw in the human brain that modern systems are extremely capable of utilizing to steer people’s thinking without their knowledge (in this case, denial of certain lines of cognition). You’re right that it’s not new enough to count days, like a zero day in computers, but it’s still less than a decade old that it’s been exploited this powerfully (orders of magnitude more effective than ever before).
  3. Like LLMs, the human mind is sloppy and slimy; clown attacks are an example of something that multi-armed bandit algorithms can repeatedly try until something works (the results always have to be measure able though).
  4. I’m thinking the big 5 tech companies, Facebook Amazon Apple Google Microsoft, and intelligence agencies like the NSA and Chinese agencies. I am NOT thinking about e.g. OpenAI here.
  5. I made the case that these agencies have historically unprecedented amounts of power, and since AI is the keys to their kingdom, trying to establish an AI pause does indeed come with that risk.
  6. I might be wrong about The Sequences hardening people, but I think these systems are strongly based on human behavior data, and if most of the people in the data haven’t read The Sequences, then people who read The Sequences are further OOD than they would have been and therefore less predictable.
  7. I agree that profit-driven manipulation might still be primary and was probably how human manipulation capabilities first emerged and were originally fine-tuned, probably in the early-mid 2010s. But since these are historically unprecedented degrees of power over humans, and due to international information warfare e.g. between the US and China (which is my area of expertise), I doubt that manipulation capabilities remained exclusively profit-driven. I think that it’s possible that >90% of people at each of the tech companies haven’t worked on these systems, and of the 10% who have, it’s very possible that 95% of those people only work on profit-based systems. But I also think that there are some people who work on geopolitics-prioritizing manipulation too, e.g. revolving door employment with intelligence agencies.
  - der 2 Nov 2023 15:20 UTC
    2 points
    1
    Parent
    “Clown attack” is a phenomenal term, for a probably real and serious thing. You should be very proud of it.
    - trevor 2 Nov 2023 16:29 UTC
      2 points
      0
      Parent
      I think that the people at Facebook/Meta and the NSA probably already coined a term for it, likely an even better one as they have access to the actual data required to run these attacks. But we’ll never know what their word was anyway, or even if they have one.