AnnaSalamon comments on Slowing down AI progress is an underexplored alignment strategy

AnnaSalamon 12 Jul 2022 20:00 UTC
42 points
48
> I’ve thought a bit about ideas like this, and talked to much smarter people than myself about such ideas—and they usually dismiss them, which I take as a strong signal this may be a misguided idea.
I honestly don’t know whether slowing down AI progress in these ways is/isn’t a good idea. It seems plausibly good to me. I do think I disagree about whether the “much smarter people”s dismissal of these ideas is a strong signal.
Why I disagree about the strong signal thing:
I had to push through some fear as I wrote the sentence about it seeming plausibly good to me, because as I wrote it I imagined a bunch of potential conflict between e.g. AI safety folks, and AI folks. For example, a few months ago I asked a safety researcher within a major AI lab what they thought of e.g. people saying they thought it was bad to do/speed AI research. They gave me an expression I interpreted as fear, swallowed, and said something like: gosh, it was hard to think about because it might lead to conflict between them and their colleagues at the AI lab.
At least one person well-versed in AI safety, who I personally think non-stupid about policy also, has told me privately that they think it’s probably helpful if people-at-large decide to try to slow AI or to talk about AI being scary, but that it seems disadvantageous (for a number of good, altruistic reasons) for them personally to do it.
Basically, it seems plausible to me that:
1. There’s a “silence of elites” on the topic of “maybe we should try by legal and non-violent means to slow AI progress, e.g. by noticing aloud that maybe it is anti-social for people to be hurtling toward the ability to kill everyone,”
2. Lots of people (such as Trevor1 in the parent comment) interpret this silence as evidence that the strategy
3. But it is actually mostly evidence that many elites are in local contexts where their personal ability to do the specific work they are trying to do would be harmed by them saying such things aloud, plus social contagion/mimicry of such views.
I am *not* sure the above 1-3 is the case. I have also talked with folks who’ve thought a lot about safety and who honestly think that existential risk is lower if we have AI soon (before humanity can harm itself in other ways), for example. But I think the above is plausible enough that I’d recommend being pretty careful not to interpret elite silence as necessarily meaning there’s a solid case against “slow down AI,” and pushing instead for inside-view arguments that actually make sense to you, or to others who you think are good thinkers and who are not politically entangled with AI or AI safety.
When I try it myself on an inside view, I see things pointing in multiple directions. Would love to see LW try to hash it out.
- trevor 12 Jul 2022 20:46 UTC
  9 points
  9
  Parent
  All of this makes sense, and I do agree that it’s worth consideration (I quadruple upvoted the check mark on your comment). Mainly in-person conversations, since the absolute worst case scenario with in-person conversations is that new people learn a ton of really good information about the nitty-gritty problems with mass public outreach; such as international affairs. I don’t know if there’s a knowable upper bound on how wayward/compromised/radicalized this discussion could get if such discussion takes place predominantly on the internet.
  I’d also like to clarify that I’m not “interpreting this silence as evidence”, I’ve talked to AI policy people, and I also am one, and I understand the details of why we reflexively shoot down the idea of mass public outreach. It all boils down to ludicrously powerful, territorial, invisible people with vested interests in AI, and zero awareness of what AGI is or why it might be important (for the time being).
- steven0461 12 Jul 2022 21:34 UTC
  4 points
  9
  Parent
  
  I have also talked with folks who’ve thought a lot about safety and who honestly think that existential risk is lower if we have AI soon (before humanity can harm itself in other ways), for example.
  
  It seems hard to make the numbers come out that way. E.g. suppose human-level AGI in 2030 would cause a 60% chance of existential disaster and a 40% chance of existential disaster becoming impossible, and human-level AGI in 2050 would cause a 50% chance of existential disaster and a 50% chance of existential disaster becoming impossible. Then to be indifferent about AI timelines, conditional on human-level AGI in 2050, you’d have to expect a ¹⁄₅ probability of existential disaster from other causes in the 2030-2050 period. (That way, with human-level AGI in 2050, you’d have a ¹⁄₂ * ⁴⁄₅ = 40% chance of surviving, just like with human-level AGI in 2030.) I don’t really know of non-AI risks in the ballpark of 10% per decade.
  
  (My guess at MIRI people’s model is more like 99% chance of existential disaster from human-level AGI in 2030 and 90% in 2050, in which case indifference would require a 90% chance of some other existential disaster in 2030-2050, to cut 10% chance of survival down to 1%.)