On “lab leaders would choose to stop if given the coordination-guaranteed button” vs “big ol’ global mood shift”, I think the mood shift is way more likely (relatively) for two reasons.
One of which was argued directionally about and I want to capture more crisply the way I see it, and the other I didn’t see mentioned and might be a helpful model to consider.
The “inverse scaling law” for human intelligence vs rationality. “AI arguments are pretty hard” for “folks like scott and sam and elon and dario” because it’s very easy for intelligent people to wade into the thing overconfidently and tie themselves into knots of rationalization (amplified by incentives, “commitment and consistency” as Nate mentioned re: SBF, etc). Whereas for most people (and this, afaict, was a big part of Eliezer’s update on communicating AGI Ruin to a general audience), it’s a straightforward “looks very dangerous, let’s not do this.”
The “agency bias” (?): lab leaders et al think they can and shouldfix things. Not just point out problems, but save the day with positive action. (“I’m not going to oppose Big Oil, I’m going to build Tesla.”) “I’m the smart, careful one, I have a plan (to make the current thing be ok-actually to be doing; to salvage it; to do the different probably-wrong thing, etc.)” Most people don’t give themselves that “hero license” and even oppose others having it, which is one of those “almost always wrong but in this case right actually” things with AI.
So getting a vast number of humans to “big ol’ global mood shift” into “let’s stop those hubristically-agentic people from getting everyone killed which is obviously bad” seems more likely to me than getting the small number of the latter into “our plans suck actually, including mine and any I could still come up with, so we should stop.”
On “lab leaders would choose to stop if given the coordination-guaranteed button” vs “big ol’ global mood shift”, I think the mood shift is way more likely (relatively) for two reasons.
One of which was argued directionally about and I want to capture more crisply the way I see it, and the other I didn’t see mentioned and might be a helpful model to consider.
The “inverse scaling law” for human intelligence vs rationality. “AI arguments are pretty hard” for “folks like scott and sam and elon and dario” because it’s very easy for intelligent people to wade into the thing overconfidently and tie themselves into knots of rationalization (amplified by incentives, “commitment and consistency” as Nate mentioned re: SBF, etc). Whereas for most people (and this, afaict, was a big part of Eliezer’s update on communicating AGI Ruin to a general audience), it’s a straightforward “looks very dangerous, let’s not do this.”
The “agency bias” (?): lab leaders et al think they can and should fix things. Not just point out problems, but save the day with positive action. (“I’m not going to oppose Big Oil, I’m going to build Tesla.”) “I’m the smart, careful one, I have a plan (to make the current thing be ok-actually to be doing; to salvage it; to do the different probably-wrong thing, etc.)” Most people don’t give themselves that “hero license” and even oppose others having it, which is one of those “almost always wrong but in this case right actually” things with AI.
So getting a vast number of humans to “big ol’ global mood shift” into “let’s stop those hubristically-agentic people from getting everyone killed which is obviously bad” seems more likely to me than getting the small number of the latter into “our plans suck actually, including mine and any I could still come up with, so we should stop.”