Thane Ruthenis comments on We’re all in this together

Thane Ruthenis 5 Dec 2023 22:44 UTC

4 points

With China|Russia|EU|USA, they will mass manufacture air defense and bury all the important elements for their AI effort. You will have to fire nukes

Mm. Let me generalize your spread of possibilities as follows:

“AGI might be containable” $\to$ “AGI is an incredibly powerful technology… but just that”
“AGI might be risky but containable” $\to$ “AGI, by itself, may be a major geopolitical actor”
“AGI might be uncontainable” $\to$ “AGI can defeat all of humanity combined”

Whether one believes that AGI is containable is entangled with how much they should expect to benefit by developing one. If a government thinks AGIs aren’t going to be that impressive, they won’t fight hard to be able to develop one, and won’t push against others trying to develop one. If a government is concerned about AGI enough to ban it domestically, it sounds like they expect accident risks to be major disasters/extinction-level, which means they’d expect solving AGI to grant whoever does it hegemony.

So in the hypothetical case where we have a superpower A taking AGI seriously enough to ban it domestically and to try to bully other nations into the same, and a defecting superpower B burying their datacenters in response, so that the only way to stop it is nukes? Then it sounds like superpower A would recognize that it’s about to lose the whole lightcone to B if it does nothing, so it’ll go ahead and actually fire the nukes.

And it should be able to credibly signal such resolve to B ahead of time, meaning the defector would expect this outcome, and so not do the bury-the-datacenters thing.

Like… Yeah, making a government recognize that AGI risk is so major it needs to ban all domestic development and try to enforce an international ban is a tall order. But once a major government is sold on the idea, it’ll also automatically be willing to risk a nuclear exchange to enforce this ban, which will be visible to the other parties.

Conversely, if the government isn’t taking AGI seriously enough to risk a nuclear exchange, it’s probably not taking it seriously enough to ban it domestically (to the point of not even engaging in secret research) either. Which invalidates the premise of the “what if one of the major actors chooses not to race” hypothetical.

So you can lose in this situation, but you have tools. You can act still. It’s not over. AI banning parties just lose automatically.

Fair. I expect it’d devolve out of anyone’s control rapidly, but I agree that it’d look like a game they’d be willing to play, for the relevant parties.

Edit:

What’s wrong with this table, what additional rows or labels do I need to add to express this more completely?

As above, I think some of the possibilities correspond to self-contradictory worlds. If a superpower A worries about AGI enough to ban it despite the race dynamics, it’s not going to sit idly by and let superpower B win the game; even on pain of a nuclear exchange. So foolishly accelerating against serious concern from other parties gets you nuked, which means “do nothing and keep incrementally improving without AGI” is the least-bad option.

Or so it should ideally work out. There’s a bunch of different dimensions along which nations can take AGI seriously or not. I’ll probably think about it later, maybe compile a table too.

Gerald Monroe 5 Dec 2023 22:55 UTC

4 points

Parent

Ok. I think this collapses even further to a 1 dimensional comparison.

I made a mistake in the table, it’s not all or nothing. Some models are containable and some are not, with the uncontainable models being more capable.

Utilitysafe = ([strongest containable model you can build]*, resources)

Utilityrogue = ( [strongest model that can be developed]**, resources)

Or in other words, there is a utility gain that is a function of how capable to model is. Utility gain is literally doing more with less.

Obviously a nearly 0 intelligence evolutionary process evolved life, using billions of years and the biosphere of a planet that required some enormous number of dice rolls at a galaxy or universe wide scale.

Utility_evolution = (random walk search, the resources of probably the Milky way galaxy)

Utility is domain specific but for example, to do twice as good as evolution, you could design life with half the resources. If you had double the utility in a tank battle, you can win with half the tanks.

And you’re asserting a belief that Utilityrogue >>> Utilitysafe.

Possibly true, possibly not.

But from the point of view of policymakers, they know a safe AI that is some amount stronger than current SOTA can be developed. And that having that model lets them fight against any rogues or models from other players.

If in numbers, actual reality is say Utilityrogue = 2*(to limit of buildable compute)Utilitysafe, we should build ASI as fast as possible.

And if the actual numbers are

Utilityrogue = 1000*(to limit of buildable compute)Utilitysafe

We shouldn’t.

Do we have any numerical way to estimate this ratio? Is there a real world experiment we could perform to estimate what this is? Right now should policymakers assume it’s a high ratio or a low ratio?

What if we don’t know and have a probability distribution. Say it’s (90 percent 2*, 10 percent 1000*).

From the perspective of the long term survival of humanity, this is “pDoom is almost 10 percent”. But what are national policymakers going to do? What do we have to do in response?

*I think we both agree there is some level of AI capability that’s safe? Conventional software has AI like behavior and it’s safe ish.

** Remember RSI doesn’t end up with infinite intelligence, it will have diminishing returns as you approach the most capable model that current compute will support.

Thane Ruthenis 6 Dec 2023 0:08 UTC

8 points

Parent

And you’re asserting a belief that Utilityrogue >>> Utilitysafe.

I… don’t think I’m asserting that? Maybe I’m misunderstanding you.

What I’m claiming is that:

Suppose we have two geopolitical superpowers, A and B.

If both A and B proceed at a measured pace. It’s unclear which of them wins the AI race. If AGI isn’t particularly dangerous, there isn’t even such a thing as “winning” it, it’s just a factor in a global power game. But the more powerful it is, the more likely it is that the party who’s better at it will grow more powerful, all the way to becoming the hegemon.
- So “both proceed steadily” isn’t an equilibrium: each will want to go just a bit faster.
If party A accelerates, while party B either proceeds steadily or bans AGI, but in a geopolitically toothless manner, party A either wins (if AGI isn’t extremely dangerous, or if A proceeds quickly-but-responsibly) or kills everyone (if AGI’s an existential threat and A is irresponsible).
- That isn’t an equilibrium either: party B won’t actualize this hypothetical.
If both A and B accelerate, neither ends up building AGI safely. It’s either constant disasters or everyone straight-up dies (depending on how powerful AGI is).
- That is an equilibrium, but of “everyone loses” kind. (AGI power only determines the extent of the loss.)
If party A bans AGI internationally, in a way it takes seriously, but party B accelerates anyway, then party A acts to stop B, all the way up to a lose/lose nuclear exchange.
- That is not an equilibrium, as going here is just a loss for B.
If party A bans AGI internationally, and party B respects the ban’s seriousness, the relative balance of power is preserved. It’s unclear who takes the planet, because it’s too far in the uncertain future and doesn’t depend on just one factor.
- That is the equilibrium it seems worth going for.

In table form, it’d be something like:

#	Scenario	P(A wins)	P(B wins)	Equilibrium?
1	A & B: proceed steadily	0.5	0.5	No (A goes to 2)
2	A: speed up, B: steady	0.99	0	No (B goes to 3)
3	A: speed up, B: speed up more	0	0.98	No (A&B iterate 2-3, until reaching 4)
4	A & B: rush maximally	0.01	0.01	Yes
5	A: toothless ban, B: steady	0	1	No (A goes to 2 or 7)
6	A: toothless ban, B: rush	0	0.01	No (A goes to 4 or 7)
7	A: toothy ban, B: rush	0	0	No (B goes to 8)
8	A: toothy ban, B: join the ban	0.5	0.5	Yes

Gerald Monroe 6 Dec 2023 18:59 UTC
4 points
0
Parent
So I thought about it overnight and I wanted to add a comment. What bothers me about this table is that nuclear brinkmanship—“stop doing that or we will kill ourselves and you”—it doesn’t seem very probable to ever happen.

I know you want this to happen, and I know you believe this is one of the few routes to survival.

But think about the outcomes if you are playing this brinkmanship game.

Action : Nuke. Outcome : death in the next few hours.

Action : back down.

Outcome pAIsafe : life (maybe under occupation) Outcome pAIrogue : life or delayed death (AI chooses) Outcome pAIweak : life.

If the above is correct, this makes n player brinkmanship games over AI get played with the expectation the other will back down. Which means… acceleration becomes the dominant strategy again. Acceleration and preparing for a nuclear war.
- Thane Ruthenis 7 Dec 2023 1:55 UTC
  2 points
  0
  Parent
  What bothers me about this table is that nuclear brinkmanship—“stop doing that or we will kill ourselves and you”—it doesn’t seem very probable to ever happen.
  I think that proves too much. By this logic, nuclear war can never happen, because “stop invading us or we will kill ourselves and you” results in a similar decision problem, no? “Die immediately” vs. “maybe we can come back from occupation via guerilla warfare”. In which case pro-AI-ban nations can just directly invade the defectors and dig out their underground data centers via conventional methods?
  Or even just precision-nuke just the data centers, because they know the attacked nation won’t retaliate with a strike on the attacker’s population centers in the fear of a retaliatory annihilatory strike? Again, a choice of “die immediately” vs. “maybe we can hold our own geopolitically without AGI after all”.
  Edit:
  Outcome pAIsafe : life (maybe under occupation) Outcome pAIrogue : life or delayed death (AI chooses) Outcome pAIweak : life.
  Also, as I’d outlined, I expect that a government whose stance on AGI is like this isn’t going to try to ban it domestically to begin with, especially if AI has so much acknowledged geopolitical importance that some other nation is willing to nuclear-war-proof its data centers. The scenario where a nation bans domestic AI and tries to bully others into doing the same is a scenario in which that nation is pretty certain that the outcomes of “AI safe” and “AI weak” aren’t gonna happen.

Gerald Monroe 6 Dec 2023 0:23 UTC

2 points

Parent

I was focusing on what the [containable, uncontainable] continuum of possibilities means.

But ok, looking at this table,

Scenario

P(A wins)

P(B wins)

Equilibrium?

A & B: rush maximally

0.01 wins if rogue utility large, 0.5 if rogue utility is small.

Yes. Note this is the historical outcome for most prior weapons technologies. [chemical and biological weapons being exceptions]

A: toothy ban, B: join the ban

0.5

Yes, but unstable. It’s less and less stable the more parties there are. If it’s A....J, then at any moment, at least one party may be “line toeing”. This is unstable if there is a marginal gain from line toeing—the party with slightly stronger, barely legal AGI has more GDP, which eventually means they begin to break away from the pack. This breaks down to 2 then 3 in your model then settles on 4.

Historical example I can think of would be treaties on weapons post ww1. It began to fail with line toeing.

https://en.wikipedia.org/wiki/Arms_control

The United States developed better technology to get better performance from their ships while still working within the weight limits, the United Kingdom exploited a loop-hole in the terms, the Italians misrepresented the weight of their vessels, and when up against the limits, Japan left the treaty. The nations which violated the terms of the treaty did not suffer great consequences for their actions. Within little more than a decade, the treaty was abandoned.

It seems reasonable to assume this is a likely outcome for AI treaties.

For this not to be the actual outcome, something has to have changed from the historical examples—which included plenty of nuclear blackmail threats—to the present day. What has changed? Do we have a rational reason to think it will go any differently?

Note also this line toeing behavior is happening right now from China and Nvidia.

Rogue utility is the other parameter we need to add to this table to make it complete.

Thane Ruthenis 6 Dec 2023 0:58 UTC
4 points
0
Parent
I was focusing on what the [containable, uncontainable] continuum of possibilities means.
Ahh, you mean, what’s expected utility of having a controlled AGI of power X vs. expected disutility of having a rogue AGI of the same power? And how does the expected payoff of different international strategies change as X gets larger?
Hm. Let’s consider just A’s viewpoint, and strategies of {steady progress, accelerate, ban only domestically, ban internationally}.
- Steady progress is always viable up to the capability level where AGI becomes geopolitically relevant; let’s call that level $G$ .
- Acceleration is nonviable in the range of values below $G$ but above some threshold at which accident risk is large enough to cause public backlash. The current global culture is one of excess caution and status-quo bias, and the “desire to ban” would grow very quickly with “danger”, up to some threshold level at which it won’t even matter how useful it is, the society would never want it. Let’s call that threshold $D$ .
- Domestic-only ban is viable in the range $[D; G)$ . If it’s not a matter of national security, and the public is upset, it gets banned. It’s also viable if A strongly expects that AGI will be uncontrollable but the disaster won’t spill over into other countries: i. e., if it expects that if B rushes AGI, it’ll only end up shooting itself in the foot. Let’s call that $C$ .
- At $X > G$ , acceleration, a toothy international ban, and (in some circumstances) a domestic-only ban are viable.
- At $X > C$ , only acceleration and a toothy ban are viable. At that point, what decides between them is the geopolitical actor’s assessment of how likely they are to successfully win the race. If it’s low enough, only the ban has non-zero expected utility (since even a full-scale nuclear war likely won’t lead to extinction).
So we have $0 < D < G < C$ , steady progress is viable at $[0, G)$ , acceleration is viable at $[0; D) \cup [G; + \infty)$ , domestic ban is viable at $[D; G)$ and sometimes also at $[G; C)$ , and a toothy international ban is viable at $[G; + \infty)$ .
Not sure if that’s useful.
Yes. Note this is the historical outcome for most prior weapons technologies. [chemical and biological weapons being exceptions]
Interesting how exceptions are sometimes reachable after all, isn’t it?
Yes, but unstable. It’s less and less stable the more parties there are
Yep. But as I’d said, it’s a game NatSec agencies are playing against each other all the time, and if they’re actually sold on the importance of keeping this equilibrium, I expect they’d be okay at that. In the meantime, we can ramp up other cognitive-enhancement initiatives… Or, well, at least not die for a few years longer.
- Gerald Monroe 6 Dec 2023 1:18 UTC
  4 points
  0
  Parent
  Ahh, you mean, what’s expected utility of having a controlled AGI of power X vs. expected loss of having an rogue AGI of the same power? And how does the expected payoff of different international strategies change as X gets larger?
  No. At a given level of human understanding of AI, there are 2 levels of model.
  1. The strongest model humans can contain (this means the model is mostly working for humans and is not going to deliberately betray). This is Utilitysafe.
  2. The strongest model that the current level of (compute built in total + total data that exists + total robotics) is able to exist. Until very recently, 1 == 2. Humans could not build enough compute to make a model that was remotely dangerous. This is Utilityrogue.
  And then I have realized that intelligence is meaningless. What matters is utility. Utility is the relative benefit of intelligence. It’s worthless to win “70% of tasks vs the next best software model” by doing 1% better, humans will crush the AI if they get to have 1% more resources.
  Utility is some enormous table of domains and it’s a multiplier. I gave examples.
  General intelligence, like current models are beginning to show, mean that utility in one domain transfer to another domain. General learning intelligence, which we can do with scripts tacked onto current models, can have utility in all domains but require training data to learn new domains.
  Examples:
  Utility[designing living creatures]:
  Humans = millions. (we could design other living organisms from scratch with all new amino acids in millions of times less time that evolution needs)
  Controllable AI from deepmind : 100* humans or more
  Evolution : 1.0 (baseline)
  
  Utility[chip, algorithm design]
  Humans = 1.0
  Controllable AI from deepmind : Less than 1.1 times humans. See https://deepmind.google/impact/optimizing-computer-systems-with-more-generalized-ai-tools/. 4% for video compression
  Utility[tank battles]
  Humans = 1.0
  Tactical AI = ?. But think about it. Can you win with half the tanks? Probably. 1/10? 1/100? Probably not.
  Utility[manufacturing]
  Humans = 1.0
  Machine policy solver = ?
  Utility [aircraft design]
  Humans = 1.0
  RL solver from humans = ?
  And so on. Exponential growth complicates this, even a small utility benefit would allow one party to win.
  When I try to reason in a grounded way over “ok, what would the solution from humans look like? What would the solution from a narrow AI* that humans control look like? What is the best solution physics allows?”.
  Well it depends. On easy tasks (pathfinding, aiming a ballistic weapon) or even modestly complex tasks (manipulation, assembly, debugging), I don’t see the best solution as being very much better. For extremely complex tasks I don’t know.
  Current empirical data shows only small utility gains for now.
  If the [Utilitysafe, Utilityrogue] multiplier is very large (100+), in all long term futures the AIs choose the future. Nature can’t abide a vacuum like that. Doomers/decelerationists can only buy a small amount of time.
  If the [Utilitysafe, Utilityrogue] multiplier is small(<2.0), you must accelerate, because the [Utilitysafe] vs [regular humans] multiplier is still an unwinnable battle due to exponential growth. Humans can survive in this future as long as at least ²⁄₃ of the “bits”—the physical resources—are in the hands of Utilitysafe systems.
  Medium term values (2-100) it depends, you need to be very careful but maybe you can keep AI under control for a little while. You need 99% of the resources to be in the hands of safe systems, escaped AI are very much a crisis.
  *myopia and various forms of container are strategies that lower a general AI to a narrow AI without the cost of developing a custom narrow AI.
  - Nathan Helm-Burger 7 Dec 2023 0:07 UTC
    4 points
    2
    Parent
    I appreciate this discussion, and want to throw in my 2 cents:
    I believe that any sort of delay (such as via an attempted ban which does slow AI development down) buys some chance of improved outcomes for humanity. In fact, a lot of my hope for a good outcome lies in finding a way to accomplish this delay whilst accelerating safety research.
    The timing/speed at which the defection happens matters a lot! Not just the probability of defection occurring. How much does developing AI in secret instead of openly slow down a government? None? I expect it slows it at least somewhat.
    - Gerald Monroe 7 Dec 2023 0:29 UTC
      4 points
      −2
      Parent
      So let’s engage on that.
      
      Imagine it’s the cold war. You are an anti nuclear advocate. “STOP preparing to put city killer fusion bombs onto missiles and cramming them into submarines! You endanger human civilization itself, it is possible that we could all die!” you say.
      
      Faction wise you get dismissed as a communist sympathizer. Your security clearances would be revoked and you would be fired as a federal employee in that era. “Communist” then means “decel” now.
      
      I would agree that every part of the statement above is true. Preparing to commit genocide with thermonuclear devices is a morally dubious act, and while there is debate on the probability of a nuclear winter, it is correct to say that the chance is nonzero that a total nuclear war at the cold war arsenal peak could have extincted humanity or weakened humanity to the point the next crisis finished it off.
      
      But...you don’t have a choice. You are locked into a race. Trying to reframe it not as a race doesn’t change the fact it’s a race. Any international agreements not to build loaded ICBMs you can confidently predict the other parties will secretly defect on.
      
      Historically several countries did defect : Israel, Taiwan, and South Africa being the immediate 3 I can think of. Right now Ukraine is being brutalized for its choice to cooperate. Related to your point above, they took longer to get nukes than the superpowers did due to the need for secrecy.
      
      I think we are in the same situation now. I think the payoff matrix in favor of AI is far more favorable to the case for AI than nukes were. The incentives are far, far stronger.
      
      So it seems like a grounded view, based on the facts, to predict the following outcome: AI acceleration until the singularity.
      
      In more details : you talk about regulations and possibly efforts going underground. This only becomes relevant if out of all major power blocks, there is not at least one bloc building ASI at the maximum speed possible. I think this is what will happen as this seems to be what is happening today, so it just has to continue in one power bloc.
      
      We have not seen a total war effort yet but it seems maybe inevitable. (Right now ai investment is still a tiny part of the economy at maybe 200 Billion/year. A total war effort means the party puts all resources into preparing for the coming nuclear war and developing ASI. A less than total war effort, but most of the economy goes into ai, is also possible and would be what economics would cause to happen naturally)
      
      The other blocs choices don’t matter at that point, since the winner will take over the market for ASI services and own (or be subsumed by) the outcome.
      
      There’s other cold war analogs as well. Many people during the era expected the war would happen during their career. They participated in developing and deploying nukes and expected a fight to the end. Yet again, what choice did they have.
      
      Do we actually have a choice now?
      - Nathan Helm-Burger 7 Dec 2023 0:51 UTC
        2 points
        0
        Parent
        As far as I know, most datacenters are not currently secretly underground in bomb-proof bunkers. If this were known to be the case, I’d have different views of the probable events in the next few years.
        Currently, as far as I know, most data centers are out in the open. They weren’t built with the knowledge that soon they would become the equivalent of nuclear ICBM silos. The current state of the world is heavily offense-favoring, so far as I know.
        Do you disagree? Do you think ASI will be fully developed and utilized sufficiently to achieve complete worldwide hegemony by the user without there being any major international conflict?
        Gerald Monroe 7 Dec 2023 1:05 UTC
        4 points
        2
        Parent
        Well, as I mentioned in the other post, but I will open a larger point here: anything physics permits can theoretically happen in the future, right? For complex situations like this, scientifically validated models like particle physics do not exist yet. All we can do is look at what historically happened in a similar scenario and our prior should be that the outcome draw is from the same probability distribution this round. Agree/disagree?
        
        So for nuclear programs, all the outcomes have happened. Superpowers have built vast campuses and secret cities, and made the plutonium and assembled the devices in aboveground facilities. Israel apparently did it underground. Iran has been successfully decelerated from their nuclear ambitions for decades.
        
        But the general trend I feel is that a superpower can’t be stopped by bombing, and another common element has happened a bunch of times historically. Bombing and military actions often harden a belligerents resolve. They hardened the UKs resolve, US resolve, Russian resolve, it goes on.
        
        So in the hypothetical world where party A is too worried about AI dangers to build their own, party B is building it, unless A can kill B, B would respond to the attack by a total war effort and would develop AI and win or die. (Die from nukes, die from the AI, or win the planet)
        
        That is what I predict would be the outcome and we can enumerate all the wars where this historically has happened if you would like. Trend wise the civilian and military leaders on B, consuming their internal propaganda, tend to commit to total war.
        
        If B is a superpower (the USA and China, maybe EU) they can kill all the other parties with their nukes, so choosing to kill B is choosing suicide for yourself.
      - Nathan Helm-Burger 7 Dec 2023 0:44 UTC
        2 points
        0
        Parent
        Yes, I am imagining that there is some sort of toothy international agreement with official inspectors posted at every data center worldwide. That’s what I mean by delay. Or the delay which could come from the current leading labs slowing down and letting China catch up.
        If the first lab to cross the finish line gets forcibly nationalized or assassinated by state-sponsored terrorist groups, why hurry? Why stay ahead? If we can buy six more months by not rushing quite as hard, why not buy them? What do the big labs lose by letting China catch up? Maybe this seems unlikely now, but I expect the end-game months are going to look a lot different.
        See my other comment here: https://www.lesswrong.com/posts/A4nfKtD9MPFBaa5ME/we-re-all-in-this-together?commentId=JqbvwWPtXbFuDqaib
        Gerald Monroe 7 Dec 2023 0:53 UTC
        2 points
        0
        Parent
        Responding to your other comment: Probably AI labs will be nationalized, yes, as models reach capabilities levels to be weapons in themselves.
        
        The one here: a “toothy international agreement” to me sounds indistinguishable from the “world government and nukes are banned” world that makes rational sense from the 1950s but has not happened for 73 years.
        
        Why would it happen this time? In your world model, are you imagining that the “world government” outcome was always a non negligible probability and the dice roll could go this way?
        
        Or do you think that a world that let countries threaten humanity and every living person in a city with doomsday nuclear arsenals would consider total human extinction a more serious threat than nukes, and people would come together to an agreement?
        
        Or do you think the underlying technology or sociological structure of the world has changed in a way that allows world governments now, but didn’t then?
        
        I genuinely don’t know how you are reaching these conclusions. Do you see my perspective? Countless forces between human groups create trends, and those trends are the history and economics we know. To expect a different result requires the underlying rules to have shifted.
        Nathan Helm-Burger 7 Dec 2023 0:56 UTC
        2 points
        0
        Parent
        A world government seems much more plausible to me in a world where the only surviving fraction of humanity is huddled in terror in the few remaining underground bunkers belonging to a single nation.
        Note: I don’t advocate for this world outcome, but I do see it as a likely outcome in the worlds where strong international cooperation fails.