sunwillrise comments on Twitter thread on politics of AI safety

sunwillrise 31 Jul 2024 0:39 UTC
5 points
0
The basic partial counterpoint, as I see it, builds off of what Wei Dai has been writing about for a long time (see also 1, 2, etc.), which is most crisply summarized in his comment on Christiano’s “What failure looks like” post:
Here are some additional scenarios that don’t fit into this story or aren’t made very salient by it.
1. AI-powered memetic warfare makes all humans effectively insane.
I’m not sure how much I buy the specific details of trevor’s theorizing about Clown Attacks, given that he routinely makes strong (and a priori unlikely) factual conspiracy claims without supplying any empirical evidence for them, but large parts of lc’s comment on his most popular post rings importantly true to me:
The vast majority of people alive today are the effective mental subjects of some religion, political party, national identity, or combination of the three, no magical backdoor access necessary; the confirmed tools and techniques are sufficient to ruin lives or convince people to do things completely counter to their own interests. And there are intermediate stages of effectiveness that political lobbying can ratchet up along, between the ones they’re at now and total control.
[...]
The premise of the above post is not that AI companies are going to try to weaponize “human thought steering” against AI safety. The premise of the above post is that AI companies are going to develop technology that can be used to manipulate people’s affinities and politics, Intel agencies will pilfer it or ask for it, and then it’s going to be weaponized, to a degree of much greater effectiveness than they have been able to afford historically. I’m ambivalent about the included story in particular being carried out, but if you care about anything (such as AI safety), it’s probably necessary that you keep your utilityfunction intact.
I already live in a world that feels to me qualitatively more insane and inadequate than it was even a mere 15 years ago. There could be selection effects of course, such as the fact that a more interconnected world with a greater and faster distribution of information would allow for a higher percentage of misdeeds and bad events to be reported on than in the past even if the “real” quantity had remained unchanged, but even correcting for those, I attribute a substantial part (in fact, a majority) of the negative change to technological improvements that have allowed people to be more connected with one another and to consume more and more maldaptive memes generated by misaligned processes (through social media and software and the Internet more broadly).
I find it rather unlikely that the continued rise of LLMs will reverse this trend; instead, I expect it to only become more and more amplified and accelerated, as sheer lunacy expands to cover more and more public discourse (as an illustration, consider the familiar example of e-acc’s, a phenomenon that would have been mostly inconceivable even just a decade ago). So when you say the following:
So our main job now is to empower future common-sense decision-making.
And:
[these proposals] are robust to the inevitable scramble for power that will follow those “holy shit” movements
I realize that I am not at all optimistic about the continued prevalence and stability of “common-sense” as time goes by, particularly in the context of politicized discourse in which reasonable equilibria about decision-making were already getting more and more fragile even before LLMs came around the corner, ready to light the powder keg on fire...
- Noosphere89 31 Jul 2024 20:05 UTC
  2 points
  0
  Parent
  I basically agree with this comment, and the basic reason I am much more pessimistic on sane AI governance than a lot of LWers is precisely because I expect LLMs to be more persuasive than humans, and there’s very strong evidence for it.
  
  Here’s an RCT and pre-registered study on this topic, and while I find the sample numbers a little low (I’d like it to be more in the realm of 1000-2000 randomly selected people), this is the only type of study that can ensure that the effects are casual without relying much on your priors, so the fact that they show large persuasion from LLMs is really strong evidence for the belief that AI systems are better than humans at persuading people when given access to personal data and interaction.
  
  https://arxiv.org/abs/2403.14380
  
  More generally, it provides evidence for Sam Altman’s thesis that super-persuasive AI will come long before AI that’s good in every other field.