It’ll be interesting to see if OpenAI will keep going with their compute commitments now that the two main superalignment leads have left.
Linch
I’m interested in what people think of are the strongest arguments against this view. Here are a few counterarguments that I’m aware of:
1. Empirically the AI-focused scaling labs seem to care quite a lot about safety, and make credible commitments for safety. If anything, they seem to be “ahead of the curve” compared to larger tech companies or governments.2. Government/intergovernmental agencies, and to a lesser degree larger companies, are bureaucratic and sclerotic and generally less competent.
3. The AGI safety issues that EAs worry about the most are abstract and speculative, so having a “normal” safety culture isn’t as helpful as buying in into the more abstract arguments, which you might expect to be easier to do for newer companies.
4. Scaling labs share “my” values. So AI doom aside, all else equal, you might still want scaling labs to “win” over democratically elected governments/populist control.
We should expect that the incentives and culture for AI-focused companies to make them uniquely terrible for producing safe AGI.
From a “safety from catastrophic risk” perspective, I suspect an “AI-focused company” (e.g. Anthropic, OpenAI, Mistral) is abstractly pretty close to the worst possible organizational structure for getting us towards AGI. I have two distinct but related reasons:
Incentives
Culture
From an incentives perspective, consider realistic alternative organizational structures to “AI-focused company” that nonetheless has enough firepower to host multibillion-dollar scientific/engineering projects:
As part of an intergovernmental effort (e.g. CERN’s Large Hadron Collider, the ISS)
As part of a governmental effort of a single country (e.g. Apollo Program, Manhattan Project, China’s Tiangong)
As part of a larger company (e.g. Google DeepMind, Meta AI)
In each of those cases, I claim that there are stronger (though still not ideal) organizational incentives to slow down, pause/stop, or roll back deployment if there is sufficient evidence or reason to believe that further development can result in major catastrophe. In contrast, an AI-focused company has every incentive to go ahead on AI when the cause for pausing is uncertain, and minimal incentive to stop or even take things slowly.
From a culture perspective, I claim that without knowing any details of the specific companies, you should expect AI-focused companies to be more likely than plausible contenders to have the following cultural elements:
Ideological AGI Vision AI-focused companies may have a large contingent of “true believers” who are ideologically motivated to make AGI at all costs and
No Pre-existing Safety Culture AI-focused companies may have minimal or no strong “safety” culture where people deeply understand, have experience in, and are motivated by a desire to avoid catastrophic outcomes.
The first one should be self-explanatory. The second one is a bit more complicated, but basically I think it’s hard to have a safety-focused culture just by “wanting it” hard enough in the abstract, or by talking a big game. Instead, institutions (relatively) have more of a safe & robust culture if they have previously suffered the (large) costs of not focusing enough on safety.
For example, engineers who aren’t software engineers understand fairly deep down that their mistakes can kill people, and that their predecessors’ fuck-up have indeed killed people (think bridges collapsing, airplanes falling, medicines not working, etc). Software engineers rarely have such experience.
Similarly, governmental institutions have institutional memories with the problems of major historical fuckups, in a way that new startups very much don’t.
I can see some arguments in your direction but would tentatively guess the opposite.
(not a lawyer)
My layman’s understanding is that managerial employees are excluded from that ruling, unfortunately. Which I think applies to William_S if I read his comment correctly. (See Pg 11, in the “Excluded” section in the linked pdf in your link)
“This is more of a comment than a question” as they say
Yeah that’s fair! I agree that they would lose the bet as stated.
Rebuttal here!
Anyway, if the message someone received from Hanson’s writings on medicine was “yay Hanson”, and Scott’s response was “boo Hanson,” then I agree people should wait for Hanson’s rebuttal before being like “boo Hanson.”
But if the message that people received was “medicine doesn’t work” (and it appears that many people did), then Scott’s writings should be an useful update, independent of whether Hanson’s-writings-as-intended was actually trying to deliver that message.
People might appreciate this short (<3 minutes) video interviewing me about my April 1 startup, Open Asteroid Impact:
Alas I think doing this will be prohibitively expensive/technologically infeasible. We did some BOTECs at the launch party and even just getting rid of leap seconds was too expensive for us.
That’s one of many reasons why I’m trying to raise 7 trillion dollars.
Open Asteroid Impact strongly disagrees with this line of thinking. Our theory of change relies on many asteroids filled with precious minerals hitting earth, as mining in space (even LEO) is prohibitively expensive compared to on-ground mining.
While your claims may be true for small asteroids, we strongly believe that scale is all you need. Over time, sufficiently large, and sufficiently many, asteroids can solve the problem of specific asteroids not successfully impacting Earth.
rare earth metals? More like common space metals, amirite?
[April Fools’ Day] Introducing Open Asteroid Impact
In 2015 or so when my friend and I independently came across a lot of rationalist concepts, we learned that each other were interested in this sort of LW-shaped thing. He offered for us to try the AI box game. I played the game as Gatekeeper and won with ease. So at least my anecdotes don’t make me particularly worried.
That said, these days I wouldn’t publicly offer to play the game against an unlimited pool of strangers. When my friend and I played against each other, there was an implicit set of norms in play, that explicitly don’t apply to the game as stated as “the AI has no ethical constraints.”
I do not particularly relish the thought of giving a stranger with a ton of free time and something to prove the license to be (e.g) as mean to me as possible over text for two hours straight (while having days or even weeks to prepare ahead of time). I might lose, too. I can think of at least 3 different attack vectors[1] that might get me to decide that the -EV of losing the game is not as bad as the -EV of having to stay online and attentive in such a situation for almost 2 more hours.
That said, I’m also not necessarily convinced that in the literal boxing example (a weakly superhuman AI is in a server farm somewhere, I’m the sole gatekeeper responsible to decide whether to let it out or not), I’d necessarily let it out. Even after accounting for the greater cognitive capabilities and thoroughness of superhuman AI. This is because I expect my willingness to hold in an actual potential end-of-world scenario is much higher than my willingness to hold for $25 and some internet points.
- ^
In the spirit of the game, I will not publicly say what they are. But I can tell people over DMs if they’re interested, I expect most people to agree that they’re a)within the explicit rules of the game, b) plausibly will cause reasonable people to fold, and c) are not super analogous to actual end-of-world scenarios.
- ^
Yep.
Yeah this came up in a number of times during covid forecasting in 2020. Eg, you might expect the correlational effect of having a lockdown during times of expected high mortality load to outweigh any causal advantages on mortality of lockdowns.
Yeah this came up in a number of times during covid forecasting in 2020. Eg, you might expect the correalational effect of having a lockdown during times of expected high mortality load to outweigh any causal advantages on mortality of lockdowns.
I agree it’s not a large commitment in some absolute sense. I think it’d still be instructive to see whether they’re able to hit this (not very high) bar.