Tomás B. comments on Slowing down AI progress is an underexplored alignment strategy

Tomás B. 12 Jul 2022 15:49 UTC
30 points
19
I’ve thought a bit about ideas like this, and talked to much smarter people than myself about such ideas—and they usually dismiss them, which I take as a strong signal this may be a misguided idea.
I think the Machiavellian model of politics is largely correct—and it just is the case that if you look closely at any great change in policy you see, beneath the idealized narrative, a small coterie of very smart ideologues engaging in Machiavellian politics.
To the extent overt political power is necessary for EA causes to succeed, Machiavellian politics will be necessary and good. However, this sort of duplicitous regulatory judo you advocate strikes me as possibly backfiring—by politicizing AI in this way those who are working on the actually important AI safety research become very tempting targets to the mechanisms you hope to summon. We see hints of this already.
To the extent it is possible to get people with correct understanding of the actually important problem in positions of bureaucratic and moral authority, this seems really, really good. Machiavellian politics will be required to do this. Such people may indeed need to lie about their motivations. And perhaps they may find it necessary to manipulate the population in the way you describe.
However, if you don’t have such people actually in charge and use judo mind tricks to manipulate existing authorities to bring AI further into the culture war, you are summoning a beast you, by definition, lack the power to tame.
I suspect it would backfire horribly, incentivize safety washing of various kinds in existing organizations who are best positioned to shape regulation, making new alignment orgs like Conjecture and Redwood very difficult to start, and worst of all making overtly caring about the actual problem very politically difficult
- AnnaSalamon 12 Jul 2022 20:00 UTC
  42 points
  48
  Parent
  > I’ve thought a bit about ideas like this, and talked to much smarter people than myself about such ideas—and they usually dismiss them, which I take as a strong signal this may be a misguided idea.
  I honestly don’t know whether slowing down AI progress in these ways is/isn’t a good idea. It seems plausibly good to me. I do think I disagree about whether the “much smarter people”s dismissal of these ideas is a strong signal.
  Why I disagree about the strong signal thing:
  I had to push through some fear as I wrote the sentence about it seeming plausibly good to me, because as I wrote it I imagined a bunch of potential conflict between e.g. AI safety folks, and AI folks. For example, a few months ago I asked a safety researcher within a major AI lab what they thought of e.g. people saying they thought it was bad to do/speed AI research. They gave me an expression I interpreted as fear, swallowed, and said something like: gosh, it was hard to think about because it might lead to conflict between them and their colleagues at the AI lab.
  At least one person well-versed in AI safety, who I personally think non-stupid about policy also, has told me privately that they think it’s probably helpful if people-at-large decide to try to slow AI or to talk about AI being scary, but that it seems disadvantageous (for a number of good, altruistic reasons) for them personally to do it.
  Basically, it seems plausible to me that:
  1. There’s a “silence of elites” on the topic of “maybe we should try by legal and non-violent means to slow AI progress, e.g. by noticing aloud that maybe it is anti-social for people to be hurtling toward the ability to kill everyone,”
  2. Lots of people (such as Trevor1 in the parent comment) interpret this silence as evidence that the strategy
  3. But it is actually mostly evidence that many elites are in local contexts where their personal ability to do the specific work they are trying to do would be harmed by them saying such things aloud, plus social contagion/mimicry of such views.
  I am *not* sure the above 1-3 is the case. I have also talked with folks who’ve thought a lot about safety and who honestly think that existential risk is lower if we have AI soon (before humanity can harm itself in other ways), for example. But I think the above is plausible enough that I’d recommend being pretty careful not to interpret elite silence as necessarily meaning there’s a solid case against “slow down AI,” and pushing instead for inside-view arguments that actually make sense to you, or to others who you think are good thinkers and who are not politically entangled with AI or AI safety.
  When I try it myself on an inside view, I see things pointing in multiple directions. Would love to see LW try to hash it out.
  - trevor 12 Jul 2022 20:46 UTC
    9 points
    9
    Parent
    All of this makes sense, and I do agree that it’s worth consideration (I quadruple upvoted the check mark on your comment). Mainly in-person conversations, since the absolute worst case scenario with in-person conversations is that new people learn a ton of really good information about the nitty-gritty problems with mass public outreach; such as international affairs. I don’t know if there’s a knowable upper bound on how wayward/compromised/radicalized this discussion could get if such discussion takes place predominantly on the internet.
    I’d also like to clarify that I’m not “interpreting this silence as evidence”, I’ve talked to AI policy people, and I also am one, and I understand the details of why we reflexively shoot down the idea of mass public outreach. It all boils down to ludicrously powerful, territorial, invisible people with vested interests in AI, and zero awareness of what AGI is or why it might be important (for the time being).
  - steven0461 12 Jul 2022 21:34 UTC
    4 points
    9
    Parent
    
    I have also talked with folks who’ve thought a lot about safety and who honestly think that existential risk is lower if we have AI soon (before humanity can harm itself in other ways), for example.
    
    It seems hard to make the numbers come out that way. E.g. suppose human-level AGI in 2030 would cause a 60% chance of existential disaster and a 40% chance of existential disaster becoming impossible, and human-level AGI in 2050 would cause a 50% chance of existential disaster and a 50% chance of existential disaster becoming impossible. Then to be indifferent about AI timelines, conditional on human-level AGI in 2050, you’d have to expect a ¹⁄₅ probability of existential disaster from other causes in the 2030-2050 period. (That way, with human-level AGI in 2050, you’d have a ¹⁄₂ * ⁴⁄₅ = 40% chance of surviving, just like with human-level AGI in 2030.) I don’t really know of non-AI risks in the ballpark of 10% per decade.
    
    (My guess at MIRI people’s model is more like 99% chance of existential disaster from human-level AGI in 2030 and 90% in 2050, in which case indifference would require a 90% chance of some other existential disaster in 2030-2050, to cut 10% chance of survival down to 1%.)