The problem is that many actors will be able to unilaterally end the world. The solution is to decrease the number of decisions that would end the world if done wrong (and influence those decisions).
Why would the ‘number of decisions that would end the world if done wrong’ be necessarily above zero?
Also, wouldn’t labs be incentivized to trumpet whatever is most beneficial for their goals and downplay whatever is most deleterious? How would the true state within the lab be monitored instead of what’s communicated?
This frame makes the nonobvious/uncertain assertion that “many actors will be able to unilaterally end the world” I’m not interested in arguing whether by default many actors would be able to unilaterally build an AI that ends the world here. Insofar as that assertion is true, one kind of solution or way of orienting to the problem is to see it as a problem of unilateralism.
In response to your last paragraph: yeah, preventing a lab from building dangerous AI requires hard action like effectively monitoring it or controlling a necessary input to dangerous AI.
I’m not quite sure if ‘hard action like effectively monitoring it or controlling a necessary input to dangerous AI’ is realistic to implement everywhere there’s a concentration of researchers.
How do you envision this being realizable outside of extreme scenarios such as a world dictatorship?
I largely agree! Maybe we can get a stable policy regime of tracking hardware and auditing all large training runs with model evals that can identify unsafe systems. Maybe the US government can do intermediate stuff like tracking hardware and restricting training compute.
But mostly this frame is about raising questions or suggesting orientations or helping you notice if something appears.
(By the way, I roughly endorse this frame less than the others, which is why it’s at the end.)
Maybe we can get a stable policy regime of tracking hardware and auditing all large training runs with model evals that can identify unsafe systems. Maybe the US government can do intermediate stuff like tracking hardware and restricting training compute.
Wouldn’t this need to be done worldwide, near simultaneously, to be effective?
I’m not sure doing it in one country will move the needle much.
I’m still skeptical it’ll be anywhere near that easy. India, Japan, Korea, and some other countries are also coming on to the scene and will likely need to be included in any future deal.
Plus even when they’re at the negotiating table, the parties are incentivized to stall as long as possible to cut the best possible deal, because they won’t all be identically worried to the same degree.
And there’s nothing to stop them from walking away at any time. Even the folks in Europe may want to hold out if they feel like they can extract the maximum concessions elsewhere from both U.S. and China.
Why would the ‘number of decisions that would end the world if done wrong’ be necessarily above zero?
Also, wouldn’t labs be incentivized to trumpet whatever is most beneficial for their goals and downplay whatever is most deleterious? How would the true state within the lab be monitored instead of what’s communicated?
This frame makes the nonobvious/uncertain assertion that “many actors will be able to unilaterally end the world” I’m not interested in arguing whether by default many actors would be able to unilaterally build an AI that ends the world here. Insofar as that assertion is true, one kind of solution or way of orienting to the problem is to see it as a problem of unilateralism.
In response to your last paragraph: yeah, preventing a lab from building dangerous AI requires hard action like effectively monitoring it or controlling a necessary input to dangerous AI.
Thanks for the answer. Your frame is interesting.
I’m not quite sure if ‘hard action like effectively monitoring it or controlling a necessary input to dangerous AI’ is realistic to implement everywhere there’s a concentration of researchers.
How do you envision this being realizable outside of extreme scenarios such as a world dictatorship?
I largely agree! Maybe we can get a stable policy regime of tracking hardware and auditing all large training runs with model evals that can identify unsafe systems. Maybe the US government can do intermediate stuff like tracking hardware and restricting training compute.
But mostly this frame is about raising questions or suggesting orientations or helping you notice if something appears.
(By the way, I roughly endorse this frame less than the others, which is why it’s at the end.)
Wouldn’t this need to be done worldwide, near simultaneously, to be effective?
I’m not sure doing it in one country will move the needle much.
To some extent, yes, it would need US + Europe (including UK) + China. A strong treaty is necessary for some goals.
I’d guess that the US alone could buy a year.
One Western government doing something often causes other Western governments to do it.
(Edit in response to reply: I don’t think we have important disagreements here, so ending the conversation.)
I’m still skeptical it’ll be anywhere near that easy. India, Japan, Korea, and some other countries are also coming on to the scene and will likely need to be included in any future deal.
Plus even when they’re at the negotiating table, the parties are incentivized to stall as long as possible to cut the best possible deal, because they won’t all be identically worried to the same degree.
And there’s nothing to stop them from walking away at any time. Even the folks in Europe may want to hold out if they feel like they can extract the maximum concessions elsewhere from both U.S. and China.