Thanks for the post! I see two important difficulties with your proposal.
First, you say (quoting your comment below)
It’s not in the US self-interest to disempower itself and all its current power centers by allowing a US company to build uncontrollable AGI...The reason that the self-interest hasn’t yet played out is that US and Chinese leaders still haven’t fully understood the game theory payout matrix.
The trouble here is that it is in the US (& China’s) self-interest, as that’s seen by some leaders, to take some chance of out-of-control AGI if the alternative is the other side taking over. And either country can create safety standards for consumer products while secretly pursuing AGI for military or other purposes. That changes the payout matrix dramatically.
I think your argument could work if
a) both sides could trust that the other was applying its safety standards universally, but that takes international cooperation rather than simple self-interest; or
b) it was common knowledge that AGI was highly likely to be uncontrollable, but now we’re back to the same debate about existential risk from AI that we were in before your proposal.
Second (and less centrally), as others have pointed out, your definition of tool AI as (in part) ‘AI that we can control’ begs the question. Certainly for some kinds of tool AI such as AlphaFold, it’s easy to show that we can control them; they only operate over a very narrow domain. But for broader sorts of tools like assistants to help us manage our daily tasks, which people clearly want and for which there are strong economic incentives, it’s not obvious what level of risk to expect, and again we’re back to the same debates we were already having.
A world with good safety standards for AI is certainly preferable to a world without them, and I think there’s value in advocating for them and in pointing out the risks in the just-scale-fast position. But I think this proposal fails to address some critical challenges of escaping the current domestic and international race dynamics.
The key point I will make is that, from a game-theoretic point of view, this race is not an arms race but a suicide race. In an arms race, the winner ends up better off than the loser, whereas in a suicide race, both parties lose massively if either one crosses the finish line.
But from a game-theoretic perspective, it can still make sense for the US to aggressively pursue AGI, even if one believes there’s a substantial risk of an AGI takeover in the case of a race, especially if the US acts in its own self interest. Even with this simple model, the optimal strategy would depend on how likely AGI takeover is, how bad China getting controllable AGI first would be from the point of view of the US, and how likely China is to also not race if the US does not race. In particular, if the US is highly confident that China will aggressively pursue AGI even if the US chooses to not race, then the optimal strategy for the US could be to race even if AGI takeover is highly likely.
So really I think some key cruxes here are:
How likely is AGI (or its descendants) to take over?
How likely is China to aggressively pursue AGI if the US chooses not to race?
And vice versa for China. But the OP doesn’t really make any headway on those.
Additionally, I think there are a bunch of complicating details that also end up mattering, for example:
To what extent can two rival countries cooperate while simultaneously competing? The US and the Soviets did cooperate onmultipleoccasions, while engaged in intense geopolitic competition. That could matter if one thinks racing is bad because it makes cooperation harder (as opposed to being bad because it brings AGI faster).
How (if at all) does the magnitude of the leader’s lead over the follower change the probability of AGI takeover (i.e., does the leader need “room to manoeuvre” to develop AGI safely)?
Is the likelihood of AGI takeover lower when AGI is developed in some given country than in some other given country (all else equal)?
Is some sort of coordination more likely in worlds where there’s a larger gap between racing nations (e.g., because the leader has more leverage over the follower, or because a close follower is less willing to accept a deal)?
And adding to that, obviously constructs like “the US” and “China” are simplifications too, and the details around who actually makes and influences decisions could end up mattering a lot
It seems to me all these things could matter when determining the optimal US strategy, but I don’t see them addressed in the OP.
Thanks for the post! I see two important difficulties with your proposal.
First, you say (quoting your comment below)
The trouble here is that it is in the US (& China’s) self-interest, as that’s seen by some leaders, to take some chance of out-of-control AGI if the alternative is the other side taking over. And either country can create safety standards for consumer products while secretly pursuing AGI for military or other purposes. That changes the payout matrix dramatically.
I think your argument could work if
a) both sides could trust that the other was applying its safety standards universally, but that takes international cooperation rather than simple self-interest; or
b) it was common knowledge that AGI was highly likely to be uncontrollable, but now we’re back to the same debate about existential risk from AI that we were in before your proposal.
Second (and less centrally), as others have pointed out, your definition of tool AI as (in part) ‘AI that we can control’ begs the question. Certainly for some kinds of tool AI such as AlphaFold, it’s easy to show that we can control them; they only operate over a very narrow domain. But for broader sorts of tools like assistants to help us manage our daily tasks, which people clearly want and for which there are strong economic incentives, it’s not obvious what level of risk to expect, and again we’re back to the same debates we were already having.
A world with good safety standards for AI is certainly preferable to a world without them, and I think there’s value in advocating for them and in pointing out the risks in the just-scale-fast position. But I think this proposal fails to address some critical challenges of escaping the current domestic and international race dynamics.
Yes, this seems right to me. The OP says
But from a game-theoretic perspective, it can still make sense for the US to aggressively pursue AGI, even if one believes there’s a substantial risk of an AGI takeover in the case of a race, especially if the US acts in its own self interest. Even with this simple model, the optimal strategy would depend on how likely AGI takeover is, how bad China getting controllable AGI first would be from the point of view of the US, and how likely China is to also not race if the US does not race. In particular, if the US is highly confident that China will aggressively pursue AGI even if the US chooses to not race, then the optimal strategy for the US could be to race even if AGI takeover is highly likely.
So really I think some key cruxes here are:
How likely is AGI (or its descendants) to take over?
How likely is China to aggressively pursue AGI if the US chooses not to race?
And vice versa for China. But the OP doesn’t really make any headway on those.
Additionally, I think there are a bunch of complicating details that also end up mattering, for example:
To what extent can two rival countries cooperate while simultaneously competing? The US and the Soviets did cooperate on multiple occasions, while engaged in intense geopolitic competition. That could matter if one thinks racing is bad because it makes cooperation harder (as opposed to being bad because it brings AGI faster).
How (if at all) does the magnitude of the leader’s lead over the follower change the probability of AGI takeover (i.e., does the leader need “room to manoeuvre” to develop AGI safely)?
Is the likelihood of AGI takeover lower when AGI is developed in some given country than in some other given country (all else equal)?
Is some sort of coordination more likely in worlds where there’s a larger gap between racing nations (e.g., because the leader has more leverage over the follower, or because a close follower is less willing to accept a deal)?
And adding to that, obviously constructs like “the US” and “China” are simplifications too, and the details around who actually makes and influences decisions could end up mattering a lot
It seems to me all these things could matter when determining the optimal US strategy, but I don’t see them addressed in the OP.