I was focusing on what the [containable, uncontainable] continuum of possibilities means.
But ok, looking at this table,
#
Scenario
P(A wins)
P(B wins)
Equilibrium?
4
A & B: rush maximally
0.01 wins if rogue utility large, 0.5 if rogue utility is small.
0.01 wins if rogue utility large, 0.5 if rogue utility is small.
Yes. Note this is the historical outcome for most prior weapons technologies. [chemical and biological weapons being exceptions]
8
A: toothy ban, B: join the ban
0.5
0.5
Yes, but unstable. It’s less and less stable the more parties there are. If it’s A....J, then at any moment, at least one party may be “line toeing”. This is unstable if there is a marginal gain from line toeing—the party with slightly stronger, barely legal AGI has more GDP, which eventually means they begin to break away from the pack. This breaks down to 2 then 3 in your model then settles on 4.
Historical example I can think of would be treaties on weapons post ww1. It began to fail with line toeing.
The United States developed better technology to get better performance from their ships while still working within the weight limits, the United Kingdom exploited a loop-hole in the terms, the Italians misrepresented the weight of their vessels, and when up against the limits, Japan left the treaty. The nations which violated the terms of the treaty did not suffer great consequences for their actions. Within little more than a decade, the treaty was abandoned.
It seems reasonable to assume this is a likely outcome for AI treaties.
For this not to be the actual outcome, something has to have changed from the historical examples—which included plenty of nuclear blackmail threats—to the present day. What has changed? Do we have a rational reason to think it will go any differently?
Note also this line toeing behavior is happening right now from China and Nvidia.
Rogue utility is the other parameter we need to add to this table to make it complete.
I was focusing on what the [containable, uncontainable] continuum of possibilities means.
Ahh, you mean, what’s expected utility of having a controlled AGI of power X vs. expected disutility of having a rogue AGI of the same power? And how does the expected payoff of different international strategies change as X gets larger?
Hm. Let’s consider just A’s viewpoint, and strategies of {steady progress, accelerate, ban only domestically, ban internationally}.
Steady progress is always viable up to the capability level where AGI becomes geopolitically relevant; let’s call that level G.
Acceleration is nonviable in the range of values below G but above some threshold at which accident risk is large enough to cause public backlash. The current global culture is one of excess caution and status-quo bias, and the “desire to ban” would grow very quickly with “danger”, up to some threshold level at which it won’t even matter how useful it is, the society would never want it. Let’s call that threshold D.
Domestic-only ban is viable in the range [D;G). If it’s not a matter of national security, and the public is upset, it gets banned. It’s also viable if A strongly expects that AGI will be uncontrollable but the disaster won’t spill over into other countries: i. e., if it expects that if B rushes AGI, it’ll only end up shooting itself in the foot. Let’s call that C.
At X>G, acceleration, a toothy international ban, and (in some circumstances) a domestic-only ban are viable.
At X>C, only acceleration and a toothy ban are viable. At that point, what decides between them is the geopolitical actor’s assessment of how likely they are to successfully win the race. If it’s low enough, only the ban has non-zero expected utility (since even a full-scale nuclear war likely won’t lead to extinction).
So we have 0<D<G<C, steady progress is viable at [0,G), acceleration is viable at [0;D)∪[G;+∞), domestic ban is viable at [D;G) and sometimes also at [G;C), and a toothy international ban is viable at [G;+∞).
Not sure if that’s useful.
Yes. Note this is the historical outcome for most prior weapons technologies. [chemical and biological weapons being exceptions]
Interesting how exceptions are sometimes reachable after all, isn’t it?
Yes, but unstable. It’s less and less stable the more parties there are
Yep. But as I’d said, it’s a game NatSec agencies are playing against each other all the time, and if they’re actually sold on the importance of keeping this equilibrium, I expect they’d be okay at that. In the meantime, we can ramp up other cognitive-enhancement initiatives… Or, well, at least not die for a few years longer.
Ahh, you mean, what’s expected utility of having a controlled AGI of power X vs. expected loss of having an rogue AGI of the same power? And how does the expected payoff of different international strategies change as X gets larger?
No. At a given level of human understanding of AI, there are 2 levels of model.
The strongest model humans can contain (this means the model is mostly working for humans and is not going to deliberately betray). This is Utilitysafe.
The strongest model that the current level of (compute built in total + total data that exists + total robotics) is able to exist. Until very recently, 1 == 2. Humans could not build enough compute to make a model that was remotely dangerous. This is Utilityrogue.
And then I have realized that intelligence is meaningless. What matters is utility. Utility is the relative benefit ofintelligence. It’s worthless to win “70% of tasks vs the next best software model” by doing 1% better, humans will crush the AI if they get to have 1% more resources.
Utility is some enormous table of domains and it’s a multiplier. I gave examples.
General intelligence, like current models are beginning to show, mean that utility in one domain transfer to another domain. General learning intelligence, which we can do with scripts tacked onto current models, can have utility in all domains but require training data to learn new domains.
Examples:
Utility[designing living creatures]:
Humans = millions. (we could design other living organisms from scratch with all new amino acids in millions of times less time that evolution needs)
Controllable AI from deepmind : 100* humans or more
Tactical AI = ?. But think about it. Can you win with half the tanks? Probably. 1/10? 1/100? Probably not.
Utility[manufacturing]
Humans = 1.0
Machine policy solver = ?
Utility [aircraft design]
Humans = 1.0
RL solver from humans = ?
And so on. Exponential growth complicates this, even a small utility benefit would allow one party to win.
When I try to reason in a grounded way over “ok, what would the solution from humans look like? What would the solution from a narrow AI* that humans control look like? What is the best solution physics allows?”.
Well it depends. On easy tasks (pathfinding, aiming a ballistic weapon) or even modestly complex tasks (manipulation, assembly, debugging), I don’t see the best solution as being very much better. For extremely complex tasks I don’t know.
Current empirical data shows only small utility gains for now.
If the [Utilitysafe, Utilityrogue] multiplier is very large (100+), in all long term futures the AIs choose the future. Nature can’t abide a vacuum like that. Doomers/decelerationists can only buy a small amount of time.
If the [Utilitysafe, Utilityrogue] multiplier is small(<2.0), you must accelerate, because the [Utilitysafe] vs [regular humans] multiplier is still an unwinnable battle due to exponential growth. Humans can survive in this future as long as at least 2⁄3 of the “bits”—the physical resources—are in the hands of Utilitysafe systems.
Medium term values (2-100) it depends, you need to be very careful but maybe you can keep AI under control for a little while. You need 99% of the resources to be in the hands of safe systems, escaped AI are very much a crisis.
*myopia and various forms of container are strategies that lower a general AI to a narrow AI without the cost of developing a custom narrow AI.
I appreciate this discussion, and want to throw in my 2 cents:
I believe that any sort of delay (such as via an attempted ban which does slow AI development down) buys some chance of improved outcomes for humanity. In fact, a lot of my hope for a good outcome lies in finding a way to accomplish this delay whilst accelerating safety research.
The timing/speed at which the defection happens matters a lot! Not just the probability of defection occurring. How much does developing AI in secret instead of openly slow down a government? None? I expect it slows it at least somewhat.
Imagine it’s the cold war. You are an anti nuclear advocate. “STOP preparing to put city killer fusion bombs onto missiles and cramming them into submarines! You endanger human civilization itself, it is possible that we could all die!” you say.
Faction wise you get dismissed as a communist sympathizer. Your security clearances would be revoked and you would be fired as a federal employee in that era. “Communist” then means “decel” now.
I would agree that every part of the statement above is true. Preparing to commit genocide with thermonuclear devices is a morally dubious act, and while there is debate on the probability of a nuclear winter, it is correct to say that the chance is nonzero that a total nuclear war at the cold war arsenal peak could have extincted humanity or weakened humanity to the point the next crisis finished it off.
But...you don’t have a choice. You are locked into a race. Trying to reframe it not as a race doesn’t change the fact it’s a race. Any international agreements not to build loaded ICBMs you can confidently predict the other parties will secretly defect on.
Historically several countries did defect : Israel, Taiwan, and South Africa being the immediate 3 I can think of. Right now Ukraine is being brutalized for its choice to cooperate. Related to your point above, they took longer to get nukes than the superpowers did due to the need for secrecy.
I think we are in the same situation now. I think the payoff matrix in favor of AI is far more favorable to the case for AI than nukes were. The incentives are far, far stronger.
So it seems like a grounded view, based on the facts, to predict the following outcome: AI acceleration until the singularity.
In more details : you talk about regulations and possibly efforts going underground. This only becomes relevant if out of all major power blocks, there is not at least one bloc building ASI at the maximum speed possible. I think this is what will happen as this seems to be what is happening today, so it just has to continue in one power bloc.
We have not seen a total war effort yet but it seems maybe inevitable. (Right now ai investment is still a tiny part of the economy at maybe 200 Billion/year. A total war effort means the party puts all resources into preparing for the coming nuclear war and developing ASI. A less than total war effort, but most of the economy goes into ai, is also possible and would be what economics would cause to happen naturally)
The other blocs choices don’t matter at that point, since the winner will take over the market for ASI services and own (or be subsumed by) the outcome.
There’s other cold war analogs as well. Many people during the era expected the war would happen during their career. They participated in developing and deploying nukes and expected a fight to the end. Yet again, what choice did they have.
As far as I know, most datacenters are not currently secretly underground in bomb-proof bunkers. If this were known to be the case, I’d have different views of the probable events in the next few years.
Currently, as far as I know, most data centers are out in the open. They weren’t built with the knowledge that soon they would become the equivalent of nuclear ICBM silos. The current state of the world is heavily offense-favoring, so far as I know.
Do you disagree? Do you think ASI will be fully developed and utilized sufficiently to achieve complete worldwide hegemony by the user without there being any major international conflict?
Well, as I mentioned in the other post, but I will open a larger point here: anything physics permits can theoretically happen in the future, right? For complex situations like this, scientifically validated models like particle physics do not exist yet. All we can do is look at what historically happened in a similar scenario and our prior should be that the outcome draw is from the same probability distribution this round. Agree/disagree?
So for nuclear programs, all the outcomes have happened. Superpowers have built vast campuses and secret cities, and made the plutonium and assembled the devices in aboveground facilities. Israel apparently did it underground. Iran has been successfully decelerated from their nuclear ambitions for decades.
But the general trend I feel is that a superpower can’t be stopped by bombing, and another common element has happened a bunch of times historically. Bombing and military actions often harden a belligerents resolve. They hardened the UKs resolve, US resolve, Russian resolve, it goes on.
So in the hypothetical world where party A is too worried about AI dangers to build their own, party B is building it, unless A can kill B, B would respond to the attack by a total war effort and would develop AI and win or die. (Die from nukes, die from the AI, or win the planet)
That is what I predict would be the outcome and we can enumerate all the wars where this historically has happened if you would like. Trend wise the civilian and military leaders on B, consuming their internal propaganda, tend to commit to total war.
If B is a superpower (the USA and China, maybe EU) they can kill all the other parties with their nukes, so choosing to kill B is choosing suicide for yourself.
Yes, I am imagining that there is some sort of toothy international agreement with official inspectors posted at every data center worldwide. That’s what I mean by delay. Or the delay which could come from the current leading labs slowing down and letting China catch up.
If the first lab to cross the finish line gets forcibly nationalized or assassinated by state-sponsored terrorist groups, why hurry? Why stay ahead? If we can buy six more months by not rushing quite as hard, why not buy them? What do the big labs lose by letting China catch up? Maybe this seems unlikely now, but I expect the end-game months are going to look a lot different.
Responding to your other comment: Probably AI labs will be nationalized, yes, as models reach capabilities levels to be weapons in themselves.
The one here: a “toothy international agreement” to me sounds indistinguishable from the “world government and nukes are banned” world that makes rational sense from the 1950s but has not happened for 73 years.
Why would it happen this time? In your world model, are you imagining that the “world government” outcome was always a non negligible probability and the dice roll could go this way?
Or do you think that a world that let countries threaten humanity and every living person in a city with doomsday nuclear arsenals would consider total human extinction a more serious threat than nukes, and people would come together to an agreement?
Or do you think the underlying technology or sociological structure of the world has changed in a way that allows world governments now, but didn’t then?
I genuinely don’t know how you are reaching these conclusions. Do you see my perspective? Countless forces between human groups create trends, and those trends are the history and economics we know. To expect a different result requires the underlying rules to have shifted.
A world government seems much more plausible to me in a world where the only surviving fraction of humanity is huddled in terror in the few remaining underground bunkers belonging to a single nation.
Note: I don’t advocate for this world outcome, but I do see it as a likely outcome in the worlds where strong international cooperation fails.
I was focusing on what the [containable, uncontainable] continuum of possibilities means.
But ok, looking at this table,
Yes, but unstable. It’s less and less stable the more parties there are. If it’s A....J, then at any moment, at least one party may be “line toeing”. This is unstable if there is a marginal gain from line toeing—the party with slightly stronger, barely legal AGI has more GDP, which eventually means they begin to break away from the pack. This breaks down to 2 then 3 in your model then settles on 4.
Historical example I can think of would be treaties on weapons post ww1. It began to fail with line toeing.
https://en.wikipedia.org/wiki/Arms_control
The United States developed better technology to get better performance from their ships while still working within the weight limits, the United Kingdom exploited a loop-hole in the terms, the Italians misrepresented the weight of their vessels, and when up against the limits, Japan left the treaty. The nations which violated the terms of the treaty did not suffer great consequences for their actions. Within little more than a decade, the treaty was abandoned.
It seems reasonable to assume this is a likely outcome for AI treaties.
For this not to be the actual outcome, something has to have changed from the historical examples—which included plenty of nuclear blackmail threats—to the present day. What has changed? Do we have a rational reason to think it will go any differently?
Note also this line toeing behavior is happening right now from China and Nvidia.
Rogue utility is the other parameter we need to add to this table to make it complete.
Ahh, you mean, what’s expected utility of having a controlled AGI of power X vs. expected disutility of having a rogue AGI of the same power? And how does the expected payoff of different international strategies change as X gets larger?
Hm. Let’s consider just A’s viewpoint, and strategies of {steady progress, accelerate, ban only domestically, ban internationally}.
Steady progress is always viable up to the capability level where AGI becomes geopolitically relevant; let’s call that level G.
Acceleration is nonviable in the range of values below G but above some threshold at which accident risk is large enough to cause public backlash. The current global culture is one of excess caution and status-quo bias, and the “desire to ban” would grow very quickly with “danger”, up to some threshold level at which it won’t even matter how useful it is, the society would never want it. Let’s call that threshold D.
Domestic-only ban is viable in the range [D;G). If it’s not a matter of national security, and the public is upset, it gets banned. It’s also viable if A strongly expects that AGI will be uncontrollable but the disaster won’t spill over into other countries: i. e., if it expects that if B rushes AGI, it’ll only end up shooting itself in the foot. Let’s call that C.
At X>G, acceleration, a toothy international ban, and (in some circumstances) a domestic-only ban are viable.
At X>C, only acceleration and a toothy ban are viable. At that point, what decides between them is the geopolitical actor’s assessment of how likely they are to successfully win the race. If it’s low enough, only the ban has non-zero expected utility (since even a full-scale nuclear war likely won’t lead to extinction).
So we have 0<D<G<C, steady progress is viable at [0,G), acceleration is viable at [0;D)∪[G;+∞), domestic ban is viable at [D;G) and sometimes also at [G;C), and a toothy international ban is viable at [G;+∞).
Not sure if that’s useful.
Interesting how exceptions are sometimes reachable after all, isn’t it?
Yep. But as I’d said, it’s a game NatSec agencies are playing against each other all the time, and if they’re actually sold on the importance of keeping this equilibrium, I expect they’d be okay at that. In the meantime, we can ramp up other cognitive-enhancement initiatives… Or, well, at least not die for a few years longer.
No. At a given level of human understanding of AI, there are 2 levels of model.
The strongest model humans can contain (this means the model is mostly working for humans and is not going to deliberately betray). This is Utilitysafe.
The strongest model that the current level of (compute built in total + total data that exists + total robotics) is able to exist. Until very recently, 1 == 2. Humans could not build enough compute to make a model that was remotely dangerous. This is Utilityrogue.
And then I have realized that intelligence is meaningless. What matters is utility. Utility is the relative benefit of intelligence. It’s worthless to win “70% of tasks vs the next best software model” by doing 1% better, humans will crush the AI if they get to have 1% more resources.
Utility is some enormous table of domains and it’s a multiplier. I gave examples.
General intelligence, like current models are beginning to show, mean that utility in one domain transfer to another domain. General learning intelligence, which we can do with scripts tacked onto current models, can have utility in all domains but require training data to learn new domains.
Examples:
Utility[designing living creatures]:
Humans = millions. (we could design other living organisms from scratch with all new amino acids in millions of times less time that evolution needs)
Controllable AI from deepmind : 100* humans or more
Evolution : 1.0 (baseline)
Utility[chip, algorithm design]
Humans = 1.0
Controllable AI from deepmind : Less than 1.1 times humans. See https://deepmind.google/impact/optimizing-computer-systems-with-more-generalized-ai-tools/. 4% for video compression
Utility[tank battles]
Humans = 1.0
Tactical AI = ?. But think about it. Can you win with half the tanks? Probably. 1/10? 1/100? Probably not.
Utility[manufacturing]
Humans = 1.0
Machine policy solver = ?
Utility [aircraft design]
Humans = 1.0
RL solver from humans = ?
And so on. Exponential growth complicates this, even a small utility benefit would allow one party to win.
When I try to reason in a grounded way over “ok, what would the solution from humans look like? What would the solution from a narrow AI* that humans control look like? What is the best solution physics allows?”.
Well it depends. On easy tasks (pathfinding, aiming a ballistic weapon) or even modestly complex tasks (manipulation, assembly, debugging), I don’t see the best solution as being very much better. For extremely complex tasks I don’t know.
Current empirical data shows only small utility gains for now.
If the [Utilitysafe, Utilityrogue] multiplier is very large (100+), in all long term futures the AIs choose the future. Nature can’t abide a vacuum like that. Doomers/decelerationists can only buy a small amount of time.
If the [Utilitysafe, Utilityrogue] multiplier is small(<2.0), you must accelerate, because the [Utilitysafe] vs [regular humans] multiplier is still an unwinnable battle due to exponential growth. Humans can survive in this future as long as at least 2⁄3 of the “bits”—the physical resources—are in the hands of Utilitysafe systems.
Medium term values (2-100) it depends, you need to be very careful but maybe you can keep AI under control for a little while. You need 99% of the resources to be in the hands of safe systems, escaped AI are very much a crisis.
*myopia and various forms of container are strategies that lower a general AI to a narrow AI without the cost of developing a custom narrow AI.
I appreciate this discussion, and want to throw in my 2 cents:
I believe that any sort of delay (such as via an attempted ban which does slow AI development down) buys some chance of improved outcomes for humanity. In fact, a lot of my hope for a good outcome lies in finding a way to accomplish this delay whilst accelerating safety research.
The timing/speed at which the defection happens matters a lot! Not just the probability of defection occurring. How much does developing AI in secret instead of openly slow down a government? None? I expect it slows it at least somewhat.
So let’s engage on that.
Imagine it’s the cold war. You are an anti nuclear advocate. “STOP preparing to put city killer fusion bombs onto missiles and cramming them into submarines! You endanger human civilization itself, it is possible that we could all die!” you say.
Faction wise you get dismissed as a communist sympathizer. Your security clearances would be revoked and you would be fired as a federal employee in that era. “Communist” then means “decel” now.
I would agree that every part of the statement above is true. Preparing to commit genocide with thermonuclear devices is a morally dubious act, and while there is debate on the probability of a nuclear winter, it is correct to say that the chance is nonzero that a total nuclear war at the cold war arsenal peak could have extincted humanity or weakened humanity to the point the next crisis finished it off.
But...you don’t have a choice. You are locked into a race. Trying to reframe it not as a race doesn’t change the fact it’s a race. Any international agreements not to build loaded ICBMs you can confidently predict the other parties will secretly defect on.
Historically several countries did defect : Israel, Taiwan, and South Africa being the immediate 3 I can think of. Right now Ukraine is being brutalized for its choice to cooperate. Related to your point above, they took longer to get nukes than the superpowers did due to the need for secrecy.
I think we are in the same situation now. I think the payoff matrix in favor of AI is far more favorable to the case for AI than nukes were. The incentives are far, far stronger.
So it seems like a grounded view, based on the facts, to predict the following outcome: AI acceleration until the singularity.
In more details : you talk about regulations and possibly efforts going underground. This only becomes relevant if out of all major power blocks, there is not at least one bloc building ASI at the maximum speed possible. I think this is what will happen as this seems to be what is happening today, so it just has to continue in one power bloc.
We have not seen a total war effort yet but it seems maybe inevitable. (Right now ai investment is still a tiny part of the economy at maybe 200 Billion/year. A total war effort means the party puts all resources into preparing for the coming nuclear war and developing ASI. A less than total war effort, but most of the economy goes into ai, is also possible and would be what economics would cause to happen naturally)
The other blocs choices don’t matter at that point, since the winner will take over the market for ASI services and own (or be subsumed by) the outcome.
There’s other cold war analogs as well. Many people during the era expected the war would happen during their career. They participated in developing and deploying nukes and expected a fight to the end. Yet again, what choice did they have.
Do we actually have a choice now?
As far as I know, most datacenters are not currently secretly underground in bomb-proof bunkers. If this were known to be the case, I’d have different views of the probable events in the next few years.
Currently, as far as I know, most data centers are out in the open. They weren’t built with the knowledge that soon they would become the equivalent of nuclear ICBM silos. The current state of the world is heavily offense-favoring, so far as I know.
Do you disagree? Do you think ASI will be fully developed and utilized sufficiently to achieve complete worldwide hegemony by the user without there being any major international conflict?
Well, as I mentioned in the other post, but I will open a larger point here: anything physics permits can theoretically happen in the future, right? For complex situations like this, scientifically validated models like particle physics do not exist yet. All we can do is look at what historically happened in a similar scenario and our prior should be that the outcome draw is from the same probability distribution this round. Agree/disagree?
So for nuclear programs, all the outcomes have happened. Superpowers have built vast campuses and secret cities, and made the plutonium and assembled the devices in aboveground facilities. Israel apparently did it underground. Iran has been successfully decelerated from their nuclear ambitions for decades.
But the general trend I feel is that a superpower can’t be stopped by bombing, and another common element has happened a bunch of times historically. Bombing and military actions often harden a belligerents resolve. They hardened the UKs resolve, US resolve, Russian resolve, it goes on.
So in the hypothetical world where party A is too worried about AI dangers to build their own, party B is building it, unless A can kill B, B would respond to the attack by a total war effort and would develop AI and win or die. (Die from nukes, die from the AI, or win the planet)
That is what I predict would be the outcome and we can enumerate all the wars where this historically has happened if you would like. Trend wise the civilian and military leaders on B, consuming their internal propaganda, tend to commit to total war.
If B is a superpower (the USA and China, maybe EU) they can kill all the other parties with their nukes, so choosing to kill B is choosing suicide for yourself.
Yes, I am imagining that there is some sort of toothy international agreement with official inspectors posted at every data center worldwide. That’s what I mean by delay. Or the delay which could come from the current leading labs slowing down and letting China catch up.
If the first lab to cross the finish line gets forcibly nationalized or assassinated by state-sponsored terrorist groups, why hurry? Why stay ahead? If we can buy six more months by not rushing quite as hard, why not buy them? What do the big labs lose by letting China catch up? Maybe this seems unlikely now, but I expect the end-game months are going to look a lot different.
See my other comment here: https://www.lesswrong.com/posts/A4nfKtD9MPFBaa5ME/we-re-all-in-this-together?commentId=JqbvwWPtXbFuDqaib
Responding to your other comment: Probably AI labs will be nationalized, yes, as models reach capabilities levels to be weapons in themselves.
The one here: a “toothy international agreement” to me sounds indistinguishable from the “world government and nukes are banned” world that makes rational sense from the 1950s but has not happened for 73 years.
Why would it happen this time? In your world model, are you imagining that the “world government” outcome was always a non negligible probability and the dice roll could go this way?
Or do you think that a world that let countries threaten humanity and every living person in a city with doomsday nuclear arsenals would consider total human extinction a more serious threat than nukes, and people would come together to an agreement?
Or do you think the underlying technology or sociological structure of the world has changed in a way that allows world governments now, but didn’t then?
I genuinely don’t know how you are reaching these conclusions. Do you see my perspective? Countless forces between human groups create trends, and those trends are the history and economics we know. To expect a different result requires the underlying rules to have shifted.
A world government seems much more plausible to me in a world where the only surviving fraction of humanity is huddled in terror in the few remaining underground bunkers belonging to a single nation.
Note: I don’t advocate for this world outcome, but I do see it as a likely outcome in the worlds where strong international cooperation fails.