To start off, I agree with pretty much all of it. It’s unlikely that any of the main players actively want the world to end, and inasmuch as they’ll bring that outcome about, it’ll be by mistake. It’s marginally more likely that some of them are risking the world in the pursuit of personal power/status/ideology, and monstrously consider everyone’s deaths an acceptable risk, but I’m not certain of that either.
That said, I agree that we could, in principle, all cooperate here. The thing often touted about as a counterargument is “but China”. Except China doesn’t want to die any more than the US, and there isn’t, in principle, any reason the Chinese government can’t be convinced of the seriousness of the danger. They would believe in the reality of an asteroid on a collision course with Earth if shown the evidence, and the AGI threat is no less real.
What we all can cooperate towards is to ban AGI development, and seek other, more controllable and less unilateral ways to create superintelligences. Human cognitive enhancement, genetic engineering, uploads, whatever.
What we can all do, together, is avert the omnicide. It’s in ~no-one’s interests.
What I don’t believe is that we could cooperate on building an AGI that would bring about an utopia.
The term “AI Alignment” is a bit obfuscatory. The technical problem of AI Alignment would be better termed the control problem. An aligned AGI isn’t necessarily an AGI that does the best for humanity; an aligned AGI is one that does precisely what its designer(s) intended for it to.
And the utopias of the majority of people, or companies, or governments, across history and across the world today, would be death or a hellscape for most other people. We would not enjoy life in a world in which North Korea, or a corporate sociopath, or Trump’s US, or some bureaucracy, solved AGI alignment.
Again, I would like to empathize that “that means we need to outrace everyone else” is not the correct takeaway from this.
Firstly, who are “we”? There is no person or a group of people Humanity can trust with doing it right.
No-one who is currently racing can be trusted. A US company successfully solving alignment isn’t particularly more likely to result in a pro-humanity utopia, or an utopia for you, than the Chinese doing it first.
No-one who may start racing can be trusted. The US military nationalizing the project isn’t going to result in a good outcome, either. And if we set things up such that the AI’s value system is being decided as a matter of national policy, via a democratic process? Where people discuss it among themselves, with politicians chiming in, and maybe then vote on it, and then the result is interpreted by some bureaucratic process, and then someone types it in? Don’t make me laugh. We’d be lucky if the AI that’d result from this just kills everyone and tiles the universe with paperwork, rather than trapping everyone in some inescapable Kafkaesque nightmare.
(To be clear, it’s not because the people will want bad things. It’s because our processes for eliciting and agglomerating their preferences – any and all processes in wide use – are an abomination.)
To do it right, whatever process builds the AGI would need to be actively, desperately trying not to leave its fingerprints on the future. But I trust pretty much no-one and no thing to actually do that, instead of hijacking the future for their values. … Except myself and a few specific people, of course. But I rather doubt that you or the people of Indonesia or humanity as a whole would feel particularly enthusiastic about handing off that decision to me, yeah?
Same for any other candidature. There is nobody to whom Humanity as a whole can entrust its future; nobody to whom it should feel comfortable deferring that decision.
Secondly, that’d be giving up too early. None of the above invalidates the core argument: even if we can’t agree on an utopia, pretty much all of us would prefer to keep things as they are, keep incrementally improving our conditions and painstakingly negotiating compromises, to all of us just dying overnight. And the policy of “so we need to outrace the competitors before they build a hell” doesn’t actually lead to your utopia, because you are not going to solve alignment in those conditions either. That policy leads to death. Which, again, ~nobody wants.
So I agree that we can all cooperate on this, that the current state of affairs is a mistake, and that we can negotiate for a better outcome. Ban AGI research internationally. Keep advancing using other technologies.
We will still need to reconcile our differences later on, of course. But it can be done incrementally, a steady pace of negotiation and power balancing and cultural mingling and sanity-raising. There are routes of intelligence enhancement that are more gradual, that let us preserve this sort of incrementalism and stability while still letting Humanity keep empowering itself. Gradual intelligence augmentation, via biotechnological or cyborgism-like or upload-based means.
Humanity-as-a-whole can’t entrust its future to any given part of itself. But it can still build a future for itself. There doesn’t have to be a singular point in time at which we are deciding the whole shape of the future and are then unable to backtrack.
Bottom line: As things stand, anyoneanywhere solving either AGI or AGI alignment is not, on expectation, going to lead to a good outcome for humanity. Our processes are too dysfunctional:
We can’t trust each other to let each other solve alignment in peace, and–
– we can’t survive if we let that distrust get to us and start racing each other, because then none of us solve alignment and we all die.
The best outcome we can all cooperate towards – and that is a good outcome that we can all cooperate towards – is to ban the accursed thing.
Except China doesn’t want to die any more than the US, and there isn’t, in principle, any reason the Chinese government can’t be convinced of the seriousness of the danger. They would believe in the reality of an asteroid on a collision course with Earth if shown the evidence, and the AGI threat is no less real.
The current belief held by many people is that future AI can be controlled. And I think it’s a statement of fact that if you accept architectural limitations that will lower net performance, you can build controllable/safe systems that exhibit AGI and ASI like behavior. [I think it’s fair to disagree on how much performance you have to give up, how strong a series of ‘boxes’ you use, and the outcome when the models escape]
So it’s a race to get these systems and whoever loses, loses it all.
You probably disagree with the above, but at present the Chinese and US governments appear to be acting like they are racing.
For example, China is reacting to get access to compute:
The issue with your point of view is that as long as the evidence leaves 2 positions in superposition with good probability mass on both:
[ future AGI/ASI systems can be controlled and harnessed by humans using straightforward methods | future AGI/ASI systems cannot be controlled or harnessed by humans easily ]
Then the parties have to assume that AGI/ASI can be controlled, and will provide a pivotal advantage, and have to get strapped with their own. Hence a race.
To resolve the above superposition, the race would have to continue until AGI/ASI exists and many versions of it have been tested.
If all the versions fail to be controllable and the system causes industrial accident after accident (see nuclear fission power), that’s one reality and in that world, heavy regulations and restrictions would make sense and likely be supported.
2. If at least one architecture turns out to be pretty controllable, then that’s the other world.
3. The third world is that the utility gain from even early ASI is so enormous that it kills everyone.
I assume you believe (3) to be a fact. But how do you propose to convince the key decision-makers without direct evidence?
When you talk about the overwhelming power of an ASI—can invent nanotechnology in days, coordinate drone strikes that depose entire governments within hours, convince people to act against their own interests—think of how that sounds to government policymakers. That sounds like a weapon you had better get immediately. Conversely, ‘weaker’ and perhaps more realistic ASI systems that needs years to do the above and vast resources are more controllable.
When you talk about the overwhelming power of an ASI—can invent nanotechnology in days, coordinate drone strikes that depose entire governments within hours, convince people to act against their own interests—think of how that sounds to government policymakers. That sounds like a weapon you had better get immediately
Yeah, that’s a difficult framing problem.
Suppose there were a device such that, if built, it would cause an explosion powerful enough to crack the planet; and suppose there were an industry racing to build it, believing that it’s possible to harness it as a revolutionary energy source. Say, if that whole “will it ignite the atmosphere?” thing with nuclear bombs weren’t possible to rule out in advance of testing one, that’d about fit the bill.
It seems plausible that if that were literally our problem, it’d be possible to convince governments to ban the pursuit of this entire technology. Especially if they didn’t manage to classify it start to finish; if we could leverage public pressure.
The problem is in the flavour/aesthetic. “Creating a really smart thing” is pretty difficult to equate with “accidentally setting off a planet-shattering explosion” in most people’s minds. Nevertheless, it should be theoretically possible to pick a way to convey the message of AI Risk that’d activate all the same heuristics in people’s minds as “nuclear accident risk”. The crux of political messaging is that you don’t actually necessarily need to delve into the concrete scenarios, or put them front and center – you just need to pick a message that resonates with people at some abstraction level.
Ok I have tried to table out the outcomes in this situation. This is from a viewpoint of a “power bloc”, for example if the UK bans AI research but their close allies the USA secretly defect, it would be the same as the UK choosing to secretly defect.
Note that also for the upper part of this table, the !(others accelerate) outcome, all countries in the world who have the ability to access the necessary chips and access to nuclear weapons must each separately choose not to accelerate. In an attempt at a worldwide ban, anyone who chooses to accelerate is protected by their own nuclear weapons, which there is no effective defense to pre-AGI. So they get to independently choose to pay|!pay the international outrage and sanctions if they wish to access the “take the planet” outcomes.
This makes it a choice of (pay|!pay) ^ n, where n is the number of actually separate factions. It would be interesting to see how large n actually is. Obviously [West, China] are factions, so n is at least 2, but how many other parties are there? Is Taiwan or Israel their own parties? How long would it take Russia to obtain the chips necessary?
Italicized is I think what AI doomers believe.
Bold is I think what e/acc believe.
AI Containability x Other country actions
Ban AI
Ban AI publically, defect
Accelerate AI
Easy Containability, others ban
stasis
take the planet
take the planet++
Occasionally Escapes, others ban
stasis
take the planet
take the planet++
Uncontainable, others ban
stasis
AI chooses
AI chooses
Easy Containability, others defect
government deposed
WW3 or Cold War 2
take the planet
Occasionally Escapes, others defect
government deposed
WW3 or Cold War 2
take the planet
Uncontainable, others defect
AI chooses
AI chooses
AI chooses
Easy Containability, others accelerate
government deposed
government deposed
WW3 or Cold War 2
Occasionally Escapes, others accelerate
government deposed
government deposed
WW3 or Cold War 2
Uncontainable, others accelerate
AI chooses
AI chooses
AI chooses
I would like to add some color to this table, not sure how. But in general, governments are going to perceive the “deposed” scenario as an unacceptable outcome, a war as a disfavorable but for powerful governments, winnable outcome, and obviously they would prefer the world where they can ‘take the planet’. This is where using AGI/ASI and exponential production rates, the government manufactures whatever tool they want in the numbers necessary to depose everyone else. Theoretically this doesn’t need to be a weapon, for example you could offer aging treatments to citizens of other countries (and their elder relatives) if they rescind their current citizenship. And financially buy all of the assets of all the other countries.
I think this very neatly shows e/acc as a belief. If you think there is no real chance all the other countries will stop developing AI, you only have the rightmost column as a valid choice. All the outcomes are not great but the rightmost column is the least bad.
This also seems to show why ‘doomer’ faction members have such a depressed attitude. All the outcomes are bad. The ‘stasis’ one means everyone dies from aging and it’s unstable—it ends on the first defection. All the rest leave humans at the mercy of a machine that has random alignment. Even the “aligned self modifying AGI/ASI” dream would mean the outcome is still “AI chooses”, just humans have weighted the outcome in their favor.
@Thane Ruthenis I am very very curious to see your reaction. If this is a bad visualization of the ‘board’ I’d love to make it more detailed in a grounded, reasonable way. For example I am assuming the ‘occasionally escapes’ scenario means the AGI/ASI do occasionally defect or break out, and some of the defections do cause significant human casualties, but humans do win each battle eventually. This would be consistent with your ‘unstable nuclear software’ post.
I think doomer members believe that the utility benefit of being an escaped self modifying ASI is so large that those outcomes become “AI chooses” as well. I have been lumping that into “uncontainable”
Note that also for the upper part of this table, the !(others accelerate) outcome, all countries in the world who have the ability to access the necessary chips and access to nuclear weapons must each separately choose not to accelerate. In an attempt at a worldwide ban, anyone who chooses to accelerate is protected by their own nuclear weapons
I don’t think that’s right.
Ground fact: If you take the premise that AGI presents an existential risk as a given, merely risking a nuclear exchange in order to prevent someone else from building it (the infamous “bomb the datacenters of foreign defectors” proposal) is correct. If you’re taking it seriously, and the enemy is taking it seriously, then you know that sanctions and being Greatly Concerned won’t stop them, and that their success would be your end, and that it’s not certain that if you bomb them they’ll retaliate. So you bomb them.
So if both parties are taking the promises and risks of AGI seriously, then a sufficiently big coalition choosing to ban AGI can effectively ban it for everyone else, including non-signatories. The non-signatories will know the threat of mere nuclear weapons won’t deter the others.
I mean, of course it’d still be precarious and there’d be constant attempts to push the boundaries, but the NatSec agencies have been playing that game against each other for a while, and it may be stable enough.
Conversely, if some of the parties aren’t taking the risks seriously, and they’re willing to accelerate, and are posturing about how others’ attempts to prevent or sabotage their AGI projects will be met with nuclear retaliation… Yeah, I’m calling that bluff.
If Russia did anything good the last two years, it’s making the “I’ll nuke you if you cross this red line!” look like something a clown says then never acts on.
If you think there is no real chance all the other countries will stop developing AI, you only have the rightmost column as a valid choice
I mean, that’s just the standard Prisoner’s Dilemma setup there, no? And it’s sometimes possible to make people recognize that defect/defect and cooperate/cooperate are the only stable states between two similar-enough agents, and that they should therefore all cooperate. Making people recognize this is non-trivial, yes, but it’s a problem that’s sometimes been solved.
Also, in this case, the cooperating party can, in some scenarios, force the other party to cooperate as well, which somewhat changes the calculus.
All the outcomes are bad. The ‘stasis’ one means everyone dies from aging.
Eh, I don’t agree with that either. AGI isn’t the only technology left, nor the only technology that can prevent aging, and banning AGI doesn’t have to mean banning all technology. I’d already mentioned other forms of intelligence enhancement as possibilities. Hell, the AI tools that exist today, even if frozen at the current type of architecture, can likely be realized to greatly accelerate other technologies. Immortality escape velocity may very well be achievable in the next 20-30 years even without AGI.
There’s a very valid concern about bureaucracies and the culture of over-caution strangling innovation, which I’m very sympathetic to. But risking blowing up the planet over it seems excessive. Maybe try some mass anti-FDA protests or something like that first?
So I’d replace “stasis” with “incremental advancement”.
For example I am assuming the ‘occasionally escapes’ scenario means the AGI/ASI do occasionally defect or break out, and some of the defections do cause significant human casualties, but humans do win each battle eventually
Yeah, that seem like one of the possible ways the world could be.
Though I’d note that I expect the level of superintelligence necessary to beat humanity to be surprisingly low. You don’t necessarily need to be at “derives nanotechnology in a few days, Basilisk-hacks people en masse”. Don’t even have to be self-modifying. Just being a bit smarter than humans + having more parallel-processing power may be enough. See the arguments here and here.
So if one of the escapees is at least that competent, and it manages to get a toehold in the human civilization (get one of the countries to shelter it, get a relatively powerful social movement on its side, distribute itself far enough that it’d take time to find and shut down all its instances), that may be enough for it to start up a power-accumulation loop that’d outpace humanity’s poorly-coordinated attempts to make it stumble. Especially if governments aren’t willing to risk nuclear exchanges over the issue.
Conversely, if some of the parties aren’t taking the risks seriously, and they’re willing to accelerate, and are posturing about how others’ attempts to prevent or sabotage their AGI projects will be met with nuclear retaliation… Yeah, I’m calling that bluff.
If Russia did anything good the last two years, it’s making the “I’ll nuke you if you cross this red line!” look like something a clown says then never acts on
It depends on the party and their geographic area and their manufacturing ability. Small countries like Israel or Taiwan, both of whom are nuclear capable, yes, you can probably destroy the key tools needed to make ICs and prevent imports of new ones.
With China|Russia|EU|USA, they will mass manufacture air defense and bury all the important elements for their AI effort. You will have to fire nukes, and they will return fire and everyone in your faction dies. So from a decisionmaker’s point of view now it is
[AI might be containable (like current computers) | AI might be risky but containable| AI might be uncontainable]. Only in the last box do we hit [AI chooses <the survival of humanity>]. While if we fire nukes now, all the outcomes are [we die].
Though I’d note that I expect the level of superintelligence necessary to beat humanity to be surprisingly low. You don’t necessarily need to be at “derives nanotechnology in a few days, Basilisk-hacks people en masse”. Don’t even have to be self-modifying. Just being a bit smarter than humans + having more parallel-processing power may be enough. See the arguments here and here.
So in that scenario, you can be 1 of 2 groups of humans:
A. [you built the ASI that just escaped]
B. [you banned it and someone else built it]
Note that in situation A, you bring in all the experts and prior AI tools you used to accomplish A. You can examine your container code (with the help of ‘trustworthy’ models you have for this) and find and patch the bugs that were exploited. You can lobotomize your local copies of the ASI and query it to predict what it’s going to do next.
And you have an ASI, presumably you have robots that can copy themselves. You can start building countermeasures.
A lot of attacks are asymmetric, I admit that. The countermeasures are much more expensive than the attack. For example if you are up against arbitrarily designed pathogens or rogue nanotechnology, there is probably no vaccine that will work. Anyone infected is a goner or would have to have their brain uploaded to save them from an LN2 frozen sample. But space suits will stop the protein based pathogens, and even diamond nanobots will have materials they can’t cut through due to too little energy stored in them.
You also have the situation that the ASI that escaped is likely attempting self improvement and so it may become more capable than humans best models.
So you can lose in this situation, but you have tools. You can act still. It’s not over. AI banning parties just lose automatically. In fact human institutions that fail to adopt ai internally also all lose automatically.
This relates to the geopolitical decision table above because the defection risk means someone might be about to create this situation for you unless you also secretly defect. Yeah, it’s prisoners dilemma albeit the “cooperate” payoff seems to be very poor, it has a dominant strategy of acceleration.
This seems like a key crux. @Thane Ruthenis is accelerating AI at a geopolitical level the dominant strategy in a game theoretic sense? If it’s not dominant, why? What’s wrong with this table, what additional rows or labels do I need to add to express this more completely?
With China|Russia|EU|USA, they will mass manufacture air defense and bury all the important elements for their AI effort. You will have to fire nukes
Mm. Let me generalize your spread of possibilities as follows:
“AGI might be containable” → “AGI is an incredibly powerful technology… but just that”
“AGI might be risky but containable” → “AGI, by itself, may be a major geopolitical actor”
“AGI might be uncontainable” → “AGI can defeat all of humanity combined”
Whether one believes that AGI is containable is entangled with how much they should expect to benefit by developing one. If a government thinks AGIs aren’t going to be that impressive, they won’t fight hard to be able to develop one, and won’t push against others trying to develop one. If a government is concerned about AGI enough to ban it domestically, it sounds like they expect accident risks to be major disasters/extinction-level, which means they’d expect solving AGI to grant whoever does it hegemony.
So in the hypothetical case where we have a superpower A taking AGI seriously enough to ban it domestically and to try to bully other nations into the same, and a defecting superpower B burying their datacenters in response, so that the only way to stop it is nukes? Then it sounds like superpower A would recognize that it’s about to lose the whole lightcone to B if it does nothing, so it’ll go ahead and actually fire the nukes.
And it should be able to credibly signal such resolve to B ahead of time, meaning the defector would expect this outcome, and so not do the bury-the-datacenters thing.
Like… Yeah, making a government recognize that AGI risk is so major it needs to ban all domestic development and try to enforce an international ban is a tall order. But once a major government is sold on the idea, it’ll also automatically be willing to risk a nuclear exchange to enforce this ban, which will be visible to the other parties.
Conversely, if the government isn’t taking AGI seriously enough to risk a nuclear exchange, it’s probably not taking it seriously enough to ban it domestically (to the point of not even engaging in secret research) either. Which invalidates the premise of the “what if one of the major actors chooses not to race” hypothetical.
So you can lose in this situation, but you have tools. You can act still. It’s not over. AI banning parties just lose automatically.
Fair. I expect it’d devolve out of anyone’s control rapidly, but I agree that it’d look like a game they’d be willing to play, for the relevant parties.
Edit:
What’s wrong with this table, what additional rows or labels do I need to add to express this more completely?
As above, I think some of the possibilities correspond to self-contradictory worlds. If a superpower A worries about AGI enough to ban it despite the race dynamics, it’s not going to sit idly by and let superpower B win the game; even on pain of a nuclear exchange. So foolishly accelerating against serious concern from other parties gets you nuked, which means “do nothing and keep incrementally improving without AGI” is the least-bad option.
Or so it should ideally work out. There’s a bunch of different dimensions along which nations can take AGI seriously or not. I’ll probably think about it later, maybe compile a table too.
Ok. I think this collapses even further to a 1 dimensional comparison.
I made a mistake in the table, it’s not all or nothing. Some models are containable and some are not, with the uncontainable models being more capable.
Utilitysafe = ([strongest containable model you can build]*, resources)
Utilityrogue =( [strongest model that can be developed]**, resources)
Or in other words, there is a utility gain that is a function of how capable to model is. Utility gain is literally doing more with less.
Obviously a nearly 0 intelligence evolutionary process evolved life, using billions of years and the biosphere of a planet that required some enormous number of dice rolls at a galaxy or universe wide scale.
Utility_evolution = (random walk search, the resources of probably the Milky way galaxy)
Utility is domain specific but for example, to do twice as good as evolution, you could design life with half the resources. If you had double the utility in a tank battle, you can win with half the tanks.
And you’re asserting a belief that Utilityrogue >>> Utilitysafe.
Possibly true, possibly not.
But from the point of view of policymakers, they know a safe AI that is some amount stronger than current SOTA can be developed. And that having that model lets them fight against any rogues or models from other players.
If in numbers, actual reality is say Utilityrogue = 2*(to limit of buildable compute)Utilitysafe, we should build ASI as fast as possible.
And if the actual numbers are
Utilityrogue = 1000*(to limit of buildable compute)Utilitysafe
We shouldn’t.
Do we have any numerical way to estimate this ratio? Is there a real world experiment we could perform to estimate what this is? Right nowshould policymakers assume it’s a high ratio or a low ratio?
What if we don’t know and have a probability distribution. Say it’s (90 percent 2*, 10 percent 1000*).
From the perspective of the long term survival of humanity, this is “pDoom is almost 10 percent”. But what are national policymakers going to do? What do we have to do in response?
*I think we both agree there is some level of AI capability that’s safe? Conventional software has AI like behavior and it’s safe ish.
** Remember RSI doesn’t end up with infinite intelligence, it will have diminishing returns as you approach the most capable model that current compute will support.
Suppose we have two geopolitical superpowers, A and B.
If both A and B proceed at a measured pace. It’s unclear which of them wins the AI race. If AGI isn’t particularly dangerous, there isn’t even such a thing as “winning” it, it’s just a factor in a global power game. But the more powerful it is, the more likely it is that the party who’s better at it will grow more powerful, all the way to becoming the hegemon.
So “both proceed steadily” isn’t an equilibrium: each will want to go just a bit faster.
If party A accelerates, while party B either proceeds steadily or bans AGI, but in a geopolitically toothless manner, party A either wins (if AGI isn’t extremely dangerous, or if A proceeds quickly-but-responsibly) or kills everyone (if AGI’s an existential threat and A is irresponsible).
That isn’t an equilibrium either: party B won’t actualize this hypothetical.
If both A and B accelerate, neither ends up building AGI safely. It’s either constant disasters or everyone straight-up dies (depending on how powerful AGI is).
That is an equilibrium, but of “everyone loses” kind. (AGI power only determines the extent of the loss.)
If party A bans AGI internationally, in a way it takes seriously, but party B accelerates anyway, then party A acts to stop B, all the way up to a lose/lose nuclear exchange.
That is not an equilibrium, as going here is just a loss for B.
If party A bans AGI internationally, and party B respects the ban’s seriousness, the relative balance of power is preserved. It’s unclear who takes the planet, because it’s too far in the uncertain future and doesn’t depend on just one factor.
So I thought about it overnight and I wanted to add a comment. What bothers me about this table is that nuclear brinkmanship—“stop doing that or we will kill ourselves and you”—it doesn’t seem very probable to ever happen.
I know you want this to happen, and I know you believe this is one of the few routes to survival.
But think about the outcomes if you are playing this brinkmanship game.
Action : Nuke. Outcome : death in the next few hours.
Action : back down.
Outcome pAIsafe : life (maybe under occupation)
Outcome pAIrogue : life or delayed death (AI chooses)
Outcome pAIweak : life.
If the above is correct, this makes n player brinkmanship games over AI get played with the expectation the other will back down. Which means… acceleration becomes the dominant strategy again. Acceleration and preparing for a nuclear war.
What bothers me about this table is that nuclear brinkmanship—“stop doing that or we will kill ourselves and you”—it doesn’t seem very probable to ever happen.
I think that proves too much. By this logic, nuclear war can never happen, because “stop invading us or we will kill ourselves and you” results in a similar decision problem, no? “Die immediately” vs. “maybe we can come back from occupation via guerilla warfare”. In which case pro-AI-ban nations can just directly invade the defectors and dig out their underground data centers via conventional methods?
Or even just precision-nuke just the data centers, because they know the attacked nation won’t retaliate with a strike on the attacker’s population centers in the fear of a retaliatory annihilatory strike? Again, a choice of “die immediately” vs. “maybe we can hold our own geopolitically without AGI after all”.
Edit:
Outcome pAIsafe : life (maybe under occupation) Outcome pAIrogue : life or delayed death (AI chooses) Outcome pAIweak : life.
Also, as I’d outlined, I expect that a government whose stance on AGI is like this isn’t going to try to ban it domestically to begin with, especially if AI has so much acknowledged geopolitical importance that some other nation is willing to nuclear-war-proof its data centers. The scenario where a nation bans domestic AI and tries to bully others into doing the same is a scenario in which that nation is pretty certain that the outcomes of “AI safe” and “AI weak” aren’t gonna happen.
I was focusing on what the [containable, uncontainable] continuum of possibilities means.
But ok, looking at this table,
#
Scenario
P(A wins)
P(B wins)
Equilibrium?
4
A & B: rush maximally
0.01 wins if rogue utility large, 0.5 if rogue utility is small.
0.01 wins if rogue utility large, 0.5 if rogue utility is small.
Yes. Note this is the historical outcome for most prior weapons technologies. [chemical and biological weapons being exceptions]
8
A: toothy ban, B: join the ban
0.5
0.5
Yes, but unstable. It’s less and less stable the more parties there are. If it’s A....J, then at any moment, at least one party may be “line toeing”. This is unstable if there is a marginal gain from line toeing—the party with slightly stronger, barely legal AGI has more GDP, which eventually means they begin to break away from the pack. This breaks down to 2 then 3 in your model then settles on 4.
Historical example I can think of would be treaties on weapons post ww1. It began to fail with line toeing.
The United States developed better technology to get better performance from their ships while still working within the weight limits, the United Kingdom exploited a loop-hole in the terms, the Italians misrepresented the weight of their vessels, and when up against the limits, Japan left the treaty. The nations which violated the terms of the treaty did not suffer great consequences for their actions. Within little more than a decade, the treaty was abandoned.
It seems reasonable to assume this is a likely outcome for AI treaties.
For this not to be the actual outcome, something has to have changed from the historical examples—which included plenty of nuclear blackmail threats—to the present day. What has changed? Do we have a rational reason to think it will go any differently?
Note also this line toeing behavior is happening right now from China and Nvidia.
Rogue utility is the other parameter we need to add to this table to make it complete.
I was focusing on what the [containable, uncontainable] continuum of possibilities means.
Ahh, you mean, what’s expected utility of having a controlled AGI of power X vs. expected disutility of having a rogue AGI of the same power? And how does the expected payoff of different international strategies change as X gets larger?
Hm. Let’s consider just A’s viewpoint, and strategies of {steady progress, accelerate, ban only domestically, ban internationally}.
Steady progress is always viable up to the capability level where AGI becomes geopolitically relevant; let’s call that level G.
Acceleration is nonviable in the range of values below G but above some threshold at which accident risk is large enough to cause public backlash. The current global culture is one of excess caution and status-quo bias, and the “desire to ban” would grow very quickly with “danger”, up to some threshold level at which it won’t even matter how useful it is, the society would never want it. Let’s call that threshold D.
Domestic-only ban is viable in the range [D;G). If it’s not a matter of national security, and the public is upset, it gets banned. It’s also viable if A strongly expects that AGI will be uncontrollable but the disaster won’t spill over into other countries: i. e., if it expects that if B rushes AGI, it’ll only end up shooting itself in the foot. Let’s call that C.
At X>G, acceleration, a toothy international ban, and (in some circumstances) a domestic-only ban are viable.
At X>C, only acceleration and a toothy ban are viable. At that point, what decides between them is the geopolitical actor’s assessment of how likely they are to successfully win the race. If it’s low enough, only the ban has non-zero expected utility (since even a full-scale nuclear war likely won’t lead to extinction).
So we have 0<D<G<C, steady progress is viable at [0,G), acceleration is viable at [0;D)∪[G;+∞), domestic ban is viable at [D;G) and sometimes also at [G;C), and a toothy international ban is viable at [G;+∞).
Not sure if that’s useful.
Yes. Note this is the historical outcome for most prior weapons technologies. [chemical and biological weapons being exceptions]
Interesting how exceptions are sometimes reachable after all, isn’t it?
Yes, but unstable. It’s less and less stable the more parties there are
Yep. But as I’d said, it’s a game NatSec agencies are playing against each other all the time, and if they’re actually sold on the importance of keeping this equilibrium, I expect they’d be okay at that. In the meantime, we can ramp up other cognitive-enhancement initiatives… Or, well, at least not die for a few years longer.
Ahh, you mean, what’s expected utility of having a controlled AGI of power X vs. expected loss of having an rogue AGI of the same power? And how does the expected payoff of different international strategies change as X gets larger?
No. At a given level of human understanding of AI, there are 2 levels of model.
The strongest model humans can contain (this means the model is mostly working for humans and is not going to deliberately betray). This is Utilitysafe.
The strongest model that the current level of (compute built in total + total data that exists + total robotics) is able to exist. Until very recently, 1 == 2. Humans could not build enough compute to make a model that was remotely dangerous. This is Utilityrogue.
And then I have realized that intelligence is meaningless. What matters is utility. Utility is the relative benefit ofintelligence. It’s worthless to win “70% of tasks vs the next best software model” by doing 1% better, humans will crush the AI if they get to have 1% more resources.
Utility is some enormous table of domains and it’s a multiplier. I gave examples.
General intelligence, like current models are beginning to show, mean that utility in one domain transfer to another domain. General learning intelligence, which we can do with scripts tacked onto current models, can have utility in all domains but require training data to learn new domains.
Examples:
Utility[designing living creatures]:
Humans = millions. (we could design other living organisms from scratch with all new amino acids in millions of times less time that evolution needs)
Controllable AI from deepmind : 100* humans or more
Tactical AI = ?. But think about it. Can you win with half the tanks? Probably. 1/10? 1/100? Probably not.
Utility[manufacturing]
Humans = 1.0
Machine policy solver = ?
Utility [aircraft design]
Humans = 1.0
RL solver from humans = ?
And so on. Exponential growth complicates this, even a small utility benefit would allow one party to win.
When I try to reason in a grounded way over “ok, what would the solution from humans look like? What would the solution from a narrow AI* that humans control look like? What is the best solution physics allows?”.
Well it depends. On easy tasks (pathfinding, aiming a ballistic weapon) or even modestly complex tasks (manipulation, assembly, debugging), I don’t see the best solution as being very much better. For extremely complex tasks I don’t know.
Current empirical data shows only small utility gains for now.
If the [Utilitysafe, Utilityrogue] multiplier is very large (100+), in all long term futures the AIs choose the future. Nature can’t abide a vacuum like that. Doomers/decelerationists can only buy a small amount of time.
If the [Utilitysafe, Utilityrogue] multiplier is small(<2.0), you must accelerate, because the [Utilitysafe] vs [regular humans] multiplier is still an unwinnable battle due to exponential growth. Humans can survive in this future as long as at least 2⁄3 of the “bits”—the physical resources—are in the hands of Utilitysafe systems.
Medium term values (2-100) it depends, you need to be very careful but maybe you can keep AI under control for a little while. You need 99% of the resources to be in the hands of safe systems, escaped AI are very much a crisis.
*myopia and various forms of container are strategies that lower a general AI to a narrow AI without the cost of developing a custom narrow AI.
I appreciate this discussion, and want to throw in my 2 cents:
I believe that any sort of delay (such as via an attempted ban which does slow AI development down) buys some chance of improved outcomes for humanity. In fact, a lot of my hope for a good outcome lies in finding a way to accomplish this delay whilst accelerating safety research.
The timing/speed at which the defection happens matters a lot! Not just the probability of defection occurring. How much does developing AI in secret instead of openly slow down a government? None? I expect it slows it at least somewhat.
Imagine it’s the cold war. You are an anti nuclear advocate. “STOP preparing to put city killer fusion bombs onto missiles and cramming them into submarines! You endanger human civilization itself, it is possible that we could all die!” you say.
Faction wise you get dismissed as a communist sympathizer. Your security clearances would be revoked and you would be fired as a federal employee in that era. “Communist” then means “decel” now.
I would agree that every part of the statement above is true. Preparing to commit genocide with thermonuclear devices is a morally dubious act, and while there is debate on the probability of a nuclear winter, it is correct to say that the chance is nonzero that a total nuclear war at the cold war arsenal peak could have extincted humanity or weakened humanity to the point the next crisis finished it off.
But...you don’t have a choice. You are locked into a race. Trying to reframe it not as a race doesn’t change the fact it’s a race. Any international agreements not to build loaded ICBMs you can confidently predict the other parties will secretly defect on.
Historically several countries did defect : Israel, Taiwan, and South Africa being the immediate 3 I can think of. Right now Ukraine is being brutalized for its choice to cooperate. Related to your point above, they took longer to get nukes than the superpowers did due to the need for secrecy.
I think we are in the same situation now. I think the payoff matrix in favor of AI is far more favorable to the case for AI than nukes were. The incentives are far, far stronger.
So it seems like a grounded view, based on the facts, to predict the following outcome: AI acceleration until the singularity.
In more details : you talk about regulations and possibly efforts going underground. This only becomes relevant if out of all major power blocks, there is not at least one bloc building ASI at the maximum speed possible. I think this is what will happen as this seems to be what is happening today, so it just has to continue in one power bloc.
We have not seen a total war effort yet but it seems maybe inevitable. (Right now ai investment is still a tiny part of the economy at maybe 200 Billion/year. A total war effort means the party puts all resources into preparing for the coming nuclear war and developing ASI. A less than total war effort, but most of the economy goes into ai, is also possible and would be what economics would cause to happen naturally)
The other blocs choices don’t matter at that point, since the winner will take over the market for ASI services and own (or be subsumed by) the outcome.
There’s other cold war analogs as well. Many people during the era expected the war would happen during their career. They participated in developing and deploying nukes and expected a fight to the end. Yet again, what choice did they have.
As far as I know, most datacenters are not currently secretly underground in bomb-proof bunkers. If this were known to be the case, I’d have different views of the probable events in the next few years.
Currently, as far as I know, most data centers are out in the open. They weren’t built with the knowledge that soon they would become the equivalent of nuclear ICBM silos. The current state of the world is heavily offense-favoring, so far as I know.
Do you disagree? Do you think ASI will be fully developed and utilized sufficiently to achieve complete worldwide hegemony by the user without there being any major international conflict?
Well, as I mentioned in the other post, but I will open a larger point here: anything physics permits can theoretically happen in the future, right? For complex situations like this, scientifically validated models like particle physics do not exist yet. All we can do is look at what historically happened in a similar scenario and our prior should be that the outcome draw is from the same probability distribution this round. Agree/disagree?
So for nuclear programs, all the outcomes have happened. Superpowers have built vast campuses and secret cities, and made the plutonium and assembled the devices in aboveground facilities. Israel apparently did it underground. Iran has been successfully decelerated from their nuclear ambitions for decades.
But the general trend I feel is that a superpower can’t be stopped by bombing, and another common element has happened a bunch of times historically. Bombing and military actions often harden a belligerents resolve. They hardened the UKs resolve, US resolve, Russian resolve, it goes on.
So in the hypothetical world where party A is too worried about AI dangers to build their own, party B is building it, unless A can kill B, B would respond to the attack by a total war effort and would develop AI and win or die. (Die from nukes, die from the AI, or win the planet)
That is what I predict would be the outcome and we can enumerate all the wars where this historically has happened if you would like. Trend wise the civilian and military leaders on B, consuming their internal propaganda, tend to commit to total war.
If B is a superpower (the USA and China, maybe EU) they can kill all the other parties with their nukes, so choosing to kill B is choosing suicide for yourself.
Yes, I am imagining that there is some sort of toothy international agreement with official inspectors posted at every data center worldwide. That’s what I mean by delay. Or the delay which could come from the current leading labs slowing down and letting China catch up.
If the first lab to cross the finish line gets forcibly nationalized or assassinated by state-sponsored terrorist groups, why hurry? Why stay ahead? If we can buy six more months by not rushing quite as hard, why not buy them? What do the big labs lose by letting China catch up? Maybe this seems unlikely now, but I expect the end-game months are going to look a lot different.
Responding to your other comment: Probably AI labs will be nationalized, yes, as models reach capabilities levels to be weapons in themselves.
The one here: a “toothy international agreement” to me sounds indistinguishable from the “world government and nukes are banned” world that makes rational sense from the 1950s but has not happened for 73 years.
Why would it happen this time? In your world model, are you imagining that the “world government” outcome was always a non negligible probability and the dice roll could go this way?
Or do you think that a world that let countries threaten humanity and every living person in a city with doomsday nuclear arsenals would consider total human extinction a more serious threat than nukes, and people would come together to an agreement?
Or do you think the underlying technology or sociological structure of the world has changed in a way that allows world governments now, but didn’t then?
I genuinely don’t know how you are reaching these conclusions. Do you see my perspective? Countless forces between human groups create trends, and those trends are the history and economics we know. To expect a different result requires the underlying rules to have shifted.
A world government seems much more plausible to me in a world where the only surviving fraction of humanity is huddled in terror in the few remaining underground bunkers belonging to a single nation.
Note: I don’t advocate for this world outcome, but I do see it as a likely outcome in the worlds where strong international cooperation fails.
And if we set things up such that the AI’s value system is being decided as a matter of national policy, via a democratic process? Where people discuss it among themselves, with politicians chiming in, and maybe then vote on it, and then the result is interpreted by some bureaucratic process, and then someone types it in? Don’t make me laugh. We’d be lucky if the AI that’d result from this just kills everyone and tiles the universe with paperwork, rather than trapping everyone in some inescapable Kafkaesque nightmare.
(To be clear, it’s not because the people will want bad things. It’s because our processes for eliciting and agglomerating their preferences – any and all processes in wide use – are an abomination.)
We will still need to reconcile our differences later on, of course. But it can be done incrementally, a steady pace of negotiation and power balancing and cultural mingling and sanity-raising. There are routes of intelligence enhancement that are more gradual, that let us preserve this sort of incrementalism and stability while still letting Humanity keep empowering itself. Gradual intelligence augmentation, via biotechnological or cyborgism-like or upload-based means.
The is a lot of statism or pessimism about the potential for improving coordination in your comments. No mentions of the projects of the kind:
Thanks for the links! I’ve been idly thinking about such projects as well, nice to see what ideas others have been considering. Hopefully there’s something workable in there.
But my median prediction is that none of that works, yes. Let alone on the relevant timeline (<10 years). Stuff like Manifold Markets and Twitter’s Community Notes are steps in the right direction, but they’re so very ridiculously tiny. In the meanwhile, the pressures destroying the coordination ability continue to mount.
My optimism would rise dramatically if one of these ideas spawns, say, an SV startup centered around it that ends up valued at billions of dollars within the next three years.
No-one who is currently racing can be trusted. A US company successfully solving alignment isn’t particularly more likely to result in a pro-humanity utopia, or an utopia for you, than the Chinese doing it first.
Why do you think this. I don’t know how you even could find a sufficiently large team of californian AI researchers who lack cosmopolitan values, who wouldn’t blow the whistle if someone asked them to implement like, a nationalistic or oligarchic (They still have capped profits.) utility function or whatever instead of the obvious “optimize the preferences of all currently living humans”.
I feel like you have to be missing a basic understanding of the cultural background of college-educated californians or the degree of decentralization in their decisionmaking (you could read sama’s reinstatement as a demonstration of a cult of personality, but I think that is wrong, what actually happened was staff organized, chose the leader they preferred, and got their way).
In general, I agree that global cooperation might not be possible, but the door hasn’t closed yet, and if it closes, there is still something that can be done.
I wish I could believe that.
To start off, I agree with pretty much all of it. It’s unlikely that any of the main players actively want the world to end, and inasmuch as they’ll bring that outcome about, it’ll be by mistake. It’s marginally more likely that some of them are risking the world in the pursuit of personal power/status/ideology, and monstrously consider everyone’s deaths an acceptable risk, but I’m not certain of that either.
That said, I agree that we could, in principle, all cooperate here. The thing often touted about as a counterargument is “but China”. Except China doesn’t want to die any more than the US, and there isn’t, in principle, any reason the Chinese government can’t be convinced of the seriousness of the danger. They would believe in the reality of an asteroid on a collision course with Earth if shown the evidence, and the AGI threat is no less real.
What we all can cooperate towards is to ban AGI development, and seek other, more controllable and less unilateral ways to create superintelligences. Human cognitive enhancement, genetic engineering, uploads, whatever.
What we can all do, together, is avert the omnicide. It’s in ~no-one’s interests.
What I don’t believe is that we could cooperate on building an AGI that would bring about an utopia.
The term “AI Alignment” is a bit obfuscatory. The technical problem of AI Alignment would be better termed the control problem. An aligned AGI isn’t necessarily an AGI that does the best for humanity; an aligned AGI is one that does precisely what its designer(s) intended for it to.
And the utopias of the majority of people, or companies, or governments, across history and across the world today, would be death or a hellscape for most other people. We would not enjoy life in a world in which North Korea, or a corporate sociopath, or Trump’s US, or some bureaucracy, solved AGI alignment.
Again, I would like to empathize that “that means we need to outrace everyone else” is not the correct takeaway from this.
Firstly, who are “we”? There is no person or a group of people Humanity can trust with doing it right.
No-one who is currently racing can be trusted. A US company successfully solving alignment isn’t particularly more likely to result in a pro-humanity utopia, or an utopia for you, than the Chinese doing it first.
No-one who may start racing can be trusted. The US military nationalizing the project isn’t going to result in a good outcome, either. And if we set things up such that the AI’s value system is being decided as a matter of national policy, via a democratic process? Where people discuss it among themselves, with politicians chiming in, and maybe then vote on it, and then the result is interpreted by some bureaucratic process, and then someone types it in? Don’t make me laugh. We’d be lucky if the AI that’d result from this just kills everyone and tiles the universe with paperwork, rather than trapping everyone in some inescapable Kafkaesque nightmare.
(To be clear, it’s not because the people will want bad things. It’s because our processes for eliciting and agglomerating their preferences – any and all processes in wide use – are an abomination.)
To do it right, whatever process builds the AGI would need to be actively, desperately trying not to leave its fingerprints on the future. But I trust pretty much no-one and no thing to actually do that, instead of hijacking the future for their values. … Except myself and a few specific people, of course. But I rather doubt that you or the people of Indonesia or humanity as a whole would feel particularly enthusiastic about handing off that decision to me, yeah?
Same for any other candidature. There is nobody to whom Humanity as a whole can entrust its future; nobody to whom it should feel comfortable deferring that decision.
Secondly, that’d be giving up too early. None of the above invalidates the core argument: even if we can’t agree on an utopia, pretty much all of us would prefer to keep things as they are, keep incrementally improving our conditions and painstakingly negotiating compromises, to all of us just dying overnight. And the policy of “so we need to outrace the competitors before they build a hell” doesn’t actually lead to your utopia, because you are not going to solve alignment in those conditions either. That policy leads to death. Which, again, ~nobody wants.
So I agree that we can all cooperate on this, that the current state of affairs is a mistake, and that we can negotiate for a better outcome. Ban AGI research internationally. Keep advancing using other technologies.
We will still need to reconcile our differences later on, of course. But it can be done incrementally, a steady pace of negotiation and power balancing and cultural mingling and sanity-raising. There are routes of intelligence enhancement that are more gradual, that let us preserve this sort of incrementalism and stability while still letting Humanity keep empowering itself. Gradual intelligence augmentation, via biotechnological or cyborgism-like or upload-based means.
Humanity-as-a-whole can’t entrust its future to any given part of itself. But it can still build a future for itself. There doesn’t have to be a singular point in time at which we are deciding the whole shape of the future and are then unable to backtrack.
Bottom line: As things stand, anyone anywhere solving either AGI or AGI alignment is not, on expectation, going to lead to a good outcome for humanity. Our processes are too dysfunctional:
We can’t trust each other to let each other solve alignment in peace, and–
– we can’t survive if we let that distrust get to us and start racing each other, because then none of us solve alignment and we all die.
The best outcome we can all cooperate towards – and that is a good outcome that we can all cooperate towards – is to ban the accursed thing.
The current belief held by many people is that future AI can be controlled. And I think it’s a statement of fact that if you accept architectural limitations that will lower net performance, you can build controllable/safe systems that exhibit AGI and ASI like behavior. [I think it’s fair to disagree on how much performance you have to give up, how strong a series of ‘boxes’ you use, and the outcome when the models escape]
So it’s a race to get these systems and whoever loses, loses it all.
You probably disagree with the above, but at present the Chinese and US governments appear to be acting like they are racing.
For example, China is reacting to get access to compute:
https://wccftech.com/chinese-factories-dismantling-thousands-of-nvidia-geforce-rtx-4090-gaming-gpus-turning-ai-solutions/
https://www.tomshardware.com/news/old-rtx-3080-gpus-repurposed-for-chinese-ai-market-with-20gb-and-blower-style-cooling
https://www.chinadaily.com.cn/a/202310/21/WS65330e19a31090682a5e9dce.html
The US government is acting to shut off China’s access to compute:
https://fortune.com/2023/12/02/ai-chip-export-controls-china-nvidia-raimondo/
The issue with your point of view is that as long as the evidence leaves 2 positions in superposition with good probability mass on both:
[ future AGI/ASI systems can be controlled and harnessed by humans using straightforward methods | future AGI/ASI systems cannot be controlled or harnessed by humans easily ]
Then the parties have to assume that AGI/ASI can be controlled, and will provide a pivotal advantage, and have to get strapped with their own. Hence a race.
To resolve the above superposition, the race would have to continue until AGI/ASI exists and many versions of it have been tested.
If all the versions fail to be controllable and the system causes industrial accident after accident (see nuclear fission power), that’s one reality and in that world, heavy regulations and restrictions would make sense and likely be supported.
2. If at least one architecture turns out to be pretty controllable, then that’s the other world.
3. The third world is that the utility gain from even early ASI is so enormous that it kills everyone.
I assume you believe (3) to be a fact. But how do you propose to convince the key decision-makers without direct evidence?
When you talk about the overwhelming power of an ASI—can invent nanotechnology in days, coordinate drone strikes that depose entire governments within hours, convince people to act against their own interests—think of how that sounds to government policymakers. That sounds like a weapon you had better get immediately. Conversely, ‘weaker’ and perhaps more realistic ASI systems that needs years to do the above and vast resources are more controllable.
Yeah, that’s a difficult framing problem.
Suppose there were a device such that, if built, it would cause an explosion powerful enough to crack the planet; and suppose there were an industry racing to build it, believing that it’s possible to harness it as a revolutionary energy source. Say, if that whole “will it ignite the atmosphere?” thing with nuclear bombs weren’t possible to rule out in advance of testing one, that’d about fit the bill.
It seems plausible that if that were literally our problem, it’d be possible to convince governments to ban the pursuit of this entire technology. Especially if they didn’t manage to classify it start to finish; if we could leverage public pressure.
The problem is in the flavour/aesthetic. “Creating a really smart thing” is pretty difficult to equate with “accidentally setting off a planet-shattering explosion” in most people’s minds. Nevertheless, it should be theoretically possible to pick a way to convey the message of AI Risk that’d activate all the same heuristics in people’s minds as “nuclear accident risk”. The crux of political messaging is that you don’t actually necessarily need to delve into the concrete scenarios, or put them front and center – you just need to pick a message that resonates with people at some abstraction level.
I’d played around with the idea a year ago, but haven’t really developed it further.
Ok I have tried to table out the outcomes in this situation. This is from a viewpoint of a “power bloc”, for example if the UK bans AI research but their close allies the USA secretly defect, it would be the same as the UK choosing to secretly defect.
Note that also for the upper part of this table, the !(others accelerate) outcome, all countries in the world who have the ability to access the necessary chips and access to nuclear weapons must each separately choose not to accelerate. In an attempt at a worldwide ban, anyone who chooses to accelerate is protected by their own nuclear weapons, which there is no effective defense to pre-AGI. So they get to independently choose to pay|!pay the international outrage and sanctions if they wish to access the “take the planet” outcomes.
This makes it a choice of (pay|!pay) ^ n, where n is the number of actually separate factions. It would be interesting to see how large n actually is. Obviously [West, China] are factions, so n is at least 2, but how many other parties are there? Is Taiwan or Israel their own parties? How long would it take Russia to obtain the chips necessary?
Italicized is I think what AI doomers believe.
Bold is I think what e/acc believe.
I would like to add some color to this table, not sure how. But in general, governments are going to perceive the “deposed” scenario as an unacceptable outcome, a war as a disfavorable but for powerful governments, winnable outcome, and obviously they would prefer the world where they can ‘take the planet’. This is where using AGI/ASI and exponential production rates, the government manufactures whatever tool they want in the numbers necessary to depose everyone else. Theoretically this doesn’t need to be a weapon, for example you could offer aging treatments to citizens of other countries (and their elder relatives) if they rescind their current citizenship. And financially buy all of the assets of all the other countries.
I think this very neatly shows e/acc as a belief. If you think there is no real chance all the other countries will stop developing AI, you only have the rightmost column as a valid choice. All the outcomes are not great but the rightmost column is the least bad.
This also seems to show why ‘doomer’ faction members have such a depressed attitude. All the outcomes are bad. The ‘stasis’ one means everyone dies from aging and it’s unstable—it ends on the first defection. All the rest leave humans at the mercy of a machine that has random alignment. Even the “aligned self modifying AGI/ASI” dream would mean the outcome is still “AI chooses”, just humans have weighted the outcome in their favor.
@Thane Ruthenis I am very very curious to see your reaction. If this is a bad visualization of the ‘board’ I’d love to make it more detailed in a grounded, reasonable way. For example I am assuming the ‘occasionally escapes’ scenario means the AGI/ASI do occasionally defect or break out, and some of the defections do cause significant human casualties, but humans do win each battle eventually. This would be consistent with your ‘unstable nuclear software’ post.
I think doomer members believe that the utility benefit of being an escaped self modifying ASI is so large that those outcomes become “AI chooses” as well. I have been lumping that into “uncontainable”
I don’t think that’s right.
Ground fact: If you take the premise that AGI presents an existential risk as a given, merely risking a nuclear exchange in order to prevent someone else from building it (the infamous “bomb the datacenters of foreign defectors” proposal) is correct. If you’re taking it seriously, and the enemy is taking it seriously, then you know that sanctions and being Greatly Concerned won’t stop them, and that their success would be your end, and that it’s not certain that if you bomb them they’ll retaliate. So you bomb them.
So if both parties are taking the promises and risks of AGI seriously, then a sufficiently big coalition choosing to ban AGI can effectively ban it for everyone else, including non-signatories. The non-signatories will know the threat of mere nuclear weapons won’t deter the others.
I mean, of course it’d still be precarious and there’d be constant attempts to push the boundaries, but the NatSec agencies have been playing that game against each other for a while, and it may be stable enough.
Conversely, if some of the parties aren’t taking the risks seriously, and they’re willing to accelerate, and are posturing about how others’ attempts to prevent or sabotage their AGI projects will be met with nuclear retaliation… Yeah, I’m calling that bluff.
If Russia did anything good the last two years, it’s making the “I’ll nuke you if you cross this red line!” look like something a clown says then never acts on.
I mean, that’s just the standard Prisoner’s Dilemma setup there, no? And it’s sometimes possible to make people recognize that defect/defect and cooperate/cooperate are the only stable states between two similar-enough agents, and that they should therefore all cooperate. Making people recognize this is non-trivial, yes, but it’s a problem that’s sometimes been solved.
Also, in this case, the cooperating party can, in some scenarios, force the other party to cooperate as well, which somewhat changes the calculus.
Eh, I don’t agree with that either. AGI isn’t the only technology left, nor the only technology that can prevent aging, and banning AGI doesn’t have to mean banning all technology. I’d already mentioned other forms of intelligence enhancement as possibilities. Hell, the AI tools that exist today, even if frozen at the current type of architecture, can likely be realized to greatly accelerate other technologies. Immortality escape velocity may very well be achievable in the next 20-30 years even without AGI.
There’s a very valid concern about bureaucracies and the culture of over-caution strangling innovation, which I’m very sympathetic to. But risking blowing up the planet over it seems excessive. Maybe try some mass anti-FDA protests or something like that first?
So I’d replace “stasis” with “incremental advancement”.
Yeah, that seem like one of the possible ways the world could be.
Though I’d note that I expect the level of superintelligence necessary to beat humanity to be surprisingly low. You don’t necessarily need to be at “derives nanotechnology in a few days, Basilisk-hacks people en masse”. Don’t even have to be self-modifying. Just being a bit smarter than humans + having more parallel-processing power may be enough. See the arguments here and here.
So if one of the escapees is at least that competent, and it manages to get a toehold in the human civilization (get one of the countries to shelter it, get a relatively powerful social movement on its side, distribute itself far enough that it’d take time to find and shut down all its instances), that may be enough for it to start up a power-accumulation loop that’d outpace humanity’s poorly-coordinated attempts to make it stumble. Especially if governments aren’t willing to risk nuclear exchanges over the issue.
It depends on the party and their geographic area and their manufacturing ability. Small countries like Israel or Taiwan, both of whom are nuclear capable, yes, you can probably destroy the key tools needed to make ICs and prevent imports of new ones.
With China|Russia|EU|USA, they will mass manufacture air defense and bury all the important elements for their AI effort. You will have to fire nukes, and they will return fire and everyone in your faction dies. So from a decisionmaker’s point of view now it is
[AI might be containable (like current computers) | AI might be risky but containable| AI might be uncontainable]. Only in the last box do we hit [AI chooses <the survival of humanity>]. While if we fire nukes now, all the outcomes are [we die].
So in that scenario, you can be 1 of 2 groups of humans:
A. [you built the ASI that just escaped]
B. [you banned it and someone else built it]
Note that in situation A, you bring in all the experts and prior AI tools you used to accomplish A. You can examine your container code (with the help of ‘trustworthy’ models you have for this) and find and patch the bugs that were exploited. You can lobotomize your local copies of the ASI and query it to predict what it’s going to do next.
And you have an ASI, presumably you have robots that can copy themselves. You can start building countermeasures.
A lot of attacks are asymmetric, I admit that. The countermeasures are much more expensive than the attack. For example if you are up against arbitrarily designed pathogens or rogue nanotechnology, there is probably no vaccine that will work. Anyone infected is a goner or would have to have their brain uploaded to save them from an LN2 frozen sample. But space suits will stop the protein based pathogens, and even diamond nanobots will have materials they can’t cut through due to too little energy stored in them.
You also have the situation that the ASI that escaped is likely attempting self improvement and so it may become more capable than humans best models.
So you can lose in this situation, but you have tools. You can act still. It’s not over. AI banning parties just lose automatically. In fact human institutions that fail to adopt ai internally also all lose automatically.
This relates to the geopolitical decision table above because the defection risk means someone might be about to create this situation for you unless you also secretly defect. Yeah, it’s prisoners dilemma albeit the “cooperate” payoff seems to be very poor, it has a dominant strategy of acceleration.
This seems like a key crux. @Thane Ruthenis is accelerating AI at a geopolitical level the dominant strategy in a game theoretic sense? If it’s not dominant, why? What’s wrong with this table, what additional rows or labels do I need to add to express this more completely?
Mm. Let me generalize your spread of possibilities as follows:
“AGI might be containable” → “AGI is an incredibly powerful technology… but just that”
“AGI might be risky but containable” → “AGI, by itself, may be a major geopolitical actor”
“AGI might be uncontainable” → “AGI can defeat all of humanity combined”
Whether one believes that AGI is containable is entangled with how much they should expect to benefit by developing one. If a government thinks AGIs aren’t going to be that impressive, they won’t fight hard to be able to develop one, and won’t push against others trying to develop one. If a government is concerned about AGI enough to ban it domestically, it sounds like they expect accident risks to be major disasters/extinction-level, which means they’d expect solving AGI to grant whoever does it hegemony.
So in the hypothetical case where we have a superpower A taking AGI seriously enough to ban it domestically and to try to bully other nations into the same, and a defecting superpower B burying their datacenters in response, so that the only way to stop it is nukes? Then it sounds like superpower A would recognize that it’s about to lose the whole lightcone to B if it does nothing, so it’ll go ahead and actually fire the nukes.
And it should be able to credibly signal such resolve to B ahead of time, meaning the defector would expect this outcome, and so not do the bury-the-datacenters thing.
Like… Yeah, making a government recognize that AGI risk is so major it needs to ban all domestic development and try to enforce an international ban is a tall order. But once a major government is sold on the idea, it’ll also automatically be willing to risk a nuclear exchange to enforce this ban, which will be visible to the other parties.
Conversely, if the government isn’t taking AGI seriously enough to risk a nuclear exchange, it’s probably not taking it seriously enough to ban it domestically (to the point of not even engaging in secret research) either. Which invalidates the premise of the “what if one of the major actors chooses not to race” hypothetical.
Fair. I expect it’d devolve out of anyone’s control rapidly, but I agree that it’d look like a game they’d be willing to play, for the relevant parties.
Edit:
As above, I think some of the possibilities correspond to self-contradictory worlds. If a superpower A worries about AGI enough to ban it despite the race dynamics, it’s not going to sit idly by and let superpower B win the game; even on pain of a nuclear exchange. So foolishly accelerating against serious concern from other parties gets you nuked, which means “do nothing and keep incrementally improving without AGI” is the least-bad option.
Or so it should ideally work out. There’s a bunch of different dimensions along which nations can take AGI seriously or not. I’ll probably think about it later, maybe compile a table too.
Ok. I think this collapses even further to a 1 dimensional comparison.
I made a mistake in the table, it’s not all or nothing. Some models are containable and some are not, with the uncontainable models being more capable.
Utilitysafe = ([strongest containable model you can build]*, resources)
Utilityrogue = ( [strongest model that can be developed]**, resources)
Or in other words, there is a utility gain that is a function of how capable to model is. Utility gain is literally doing more with less.
Obviously a nearly 0 intelligence evolutionary process evolved life, using billions of years and the biosphere of a planet that required some enormous number of dice rolls at a galaxy or universe wide scale.
Utility_evolution = (random walk search, the resources of probably the Milky way galaxy)
Utility is domain specific but for example, to do twice as good as evolution, you could design life with half the resources. If you had double the utility in a tank battle, you can win with half the tanks.
And you’re asserting a belief that Utilityrogue >>> Utilitysafe.
Possibly true, possibly not.
But from the point of view of policymakers, they know a safe AI that is some amount stronger than current SOTA can be developed. And that having that model lets them fight against any rogues or models from other players.
If in numbers, actual reality is say Utilityrogue = 2*(to limit of buildable compute)Utilitysafe, we should build ASI as fast as possible.
And if the actual numbers are
Utilityrogue = 1000*(to limit of buildable compute)Utilitysafe
We shouldn’t.
Do we have any numerical way to estimate this ratio? Is there a real world experiment we could perform to estimate what this is? Right now should policymakers assume it’s a high ratio or a low ratio?
What if we don’t know and have a probability distribution. Say it’s (90 percent 2*, 10 percent 1000*).
From the perspective of the long term survival of humanity, this is “pDoom is almost 10 percent”. But what are national policymakers going to do? What do we have to do in response?
*I think we both agree there is some level of AI capability that’s safe? Conventional software has AI like behavior and it’s safe ish.
** Remember RSI doesn’t end up with infinite intelligence, it will have diminishing returns as you approach the most capable model that current compute will support.
I… don’t think I’m asserting that? Maybe I’m misunderstanding you.
What I’m claiming is that:
Suppose we have two geopolitical superpowers, A and B.
If both A and B proceed at a measured pace. It’s unclear which of them wins the AI race. If AGI isn’t particularly dangerous, there isn’t even such a thing as “winning” it, it’s just a factor in a global power game. But the more powerful it is, the more likely it is that the party who’s better at it will grow more powerful, all the way to becoming the hegemon.
So “both proceed steadily” isn’t an equilibrium: each will want to go just a bit faster.
If party A accelerates, while party B either proceeds steadily or bans AGI, but in a geopolitically toothless manner, party A either wins (if AGI isn’t extremely dangerous, or if A proceeds quickly-but-responsibly) or kills everyone (if AGI’s an existential threat and A is irresponsible).
That isn’t an equilibrium either: party B won’t actualize this hypothetical.
If both A and B accelerate, neither ends up building AGI safely. It’s either constant disasters or everyone straight-up dies (depending on how powerful AGI is).
That is an equilibrium, but of “everyone loses” kind. (AGI power only determines the extent of the loss.)
If party A bans AGI internationally, in a way it takes seriously, but party B accelerates anyway, then party A acts to stop B, all the way up to a lose/lose nuclear exchange.
That is not an equilibrium, as going here is just a loss for B.
If party A bans AGI internationally, and party B respects the ban’s seriousness, the relative balance of power is preserved. It’s unclear who takes the planet, because it’s too far in the uncertain future and doesn’t depend on just one factor.
That is the equilibrium it seems worth going for.
In table form, it’d be something like:
No (A&B iterate 2-3,
until reaching 4)
So I thought about it overnight and I wanted to add a comment. What bothers me about this table is that nuclear brinkmanship—“stop doing that or we will kill ourselves and you”—it doesn’t seem very probable to ever happen.
I know you want this to happen, and I know you believe this is one of the few routes to survival.
But think about the outcomes if you are playing this brinkmanship game.
Action : Nuke. Outcome : death in the next few hours.
Action : back down.
Outcome pAIsafe : life (maybe under occupation) Outcome pAIrogue : life or delayed death (AI chooses) Outcome pAIweak : life.
If the above is correct, this makes n player brinkmanship games over AI get played with the expectation the other will back down. Which means… acceleration becomes the dominant strategy again. Acceleration and preparing for a nuclear war.
I think that proves too much. By this logic, nuclear war can never happen, because “stop invading us or we will kill ourselves and you” results in a similar decision problem, no? “Die immediately” vs. “maybe we can come back from occupation via guerilla warfare”. In which case pro-AI-ban nations can just directly invade the defectors and dig out their underground data centers via conventional methods?
Or even just precision-nuke just the data centers, because they know the attacked nation won’t retaliate with a strike on the attacker’s population centers in the fear of a retaliatory annihilatory strike? Again, a choice of “die immediately” vs. “maybe we can hold our own geopolitically without AGI after all”.
Edit:
Also, as I’d outlined, I expect that a government whose stance on AGI is like this isn’t going to try to ban it domestically to begin with, especially if AI has so much acknowledged geopolitical importance that some other nation is willing to nuclear-war-proof its data centers. The scenario where a nation bans domestic AI and tries to bully others into doing the same is a scenario in which that nation is pretty certain that the outcomes of “AI safe” and “AI weak” aren’t gonna happen.
I was focusing on what the [containable, uncontainable] continuum of possibilities means.
But ok, looking at this table,
Yes, but unstable. It’s less and less stable the more parties there are. If it’s A....J, then at any moment, at least one party may be “line toeing”. This is unstable if there is a marginal gain from line toeing—the party with slightly stronger, barely legal AGI has more GDP, which eventually means they begin to break away from the pack. This breaks down to 2 then 3 in your model then settles on 4.
Historical example I can think of would be treaties on weapons post ww1. It began to fail with line toeing.
https://en.wikipedia.org/wiki/Arms_control
The United States developed better technology to get better performance from their ships while still working within the weight limits, the United Kingdom exploited a loop-hole in the terms, the Italians misrepresented the weight of their vessels, and when up against the limits, Japan left the treaty. The nations which violated the terms of the treaty did not suffer great consequences for their actions. Within little more than a decade, the treaty was abandoned.
It seems reasonable to assume this is a likely outcome for AI treaties.
For this not to be the actual outcome, something has to have changed from the historical examples—which included plenty of nuclear blackmail threats—to the present day. What has changed? Do we have a rational reason to think it will go any differently?
Note also this line toeing behavior is happening right now from China and Nvidia.
Rogue utility is the other parameter we need to add to this table to make it complete.
Ahh, you mean, what’s expected utility of having a controlled AGI of power X vs. expected disutility of having a rogue AGI of the same power? And how does the expected payoff of different international strategies change as X gets larger?
Hm. Let’s consider just A’s viewpoint, and strategies of {steady progress, accelerate, ban only domestically, ban internationally}.
Steady progress is always viable up to the capability level where AGI becomes geopolitically relevant; let’s call that level G.
Acceleration is nonviable in the range of values below G but above some threshold at which accident risk is large enough to cause public backlash. The current global culture is one of excess caution and status-quo bias, and the “desire to ban” would grow very quickly with “danger”, up to some threshold level at which it won’t even matter how useful it is, the society would never want it. Let’s call that threshold D.
Domestic-only ban is viable in the range [D;G). If it’s not a matter of national security, and the public is upset, it gets banned. It’s also viable if A strongly expects that AGI will be uncontrollable but the disaster won’t spill over into other countries: i. e., if it expects that if B rushes AGI, it’ll only end up shooting itself in the foot. Let’s call that C.
At X>G, acceleration, a toothy international ban, and (in some circumstances) a domestic-only ban are viable.
At X>C, only acceleration and a toothy ban are viable. At that point, what decides between them is the geopolitical actor’s assessment of how likely they are to successfully win the race. If it’s low enough, only the ban has non-zero expected utility (since even a full-scale nuclear war likely won’t lead to extinction).
So we have 0<D<G<C, steady progress is viable at [0,G), acceleration is viable at [0;D)∪[G;+∞), domestic ban is viable at [D;G) and sometimes also at [G;C), and a toothy international ban is viable at [G;+∞).
Not sure if that’s useful.
Interesting how exceptions are sometimes reachable after all, isn’t it?
Yep. But as I’d said, it’s a game NatSec agencies are playing against each other all the time, and if they’re actually sold on the importance of keeping this equilibrium, I expect they’d be okay at that. In the meantime, we can ramp up other cognitive-enhancement initiatives… Or, well, at least not die for a few years longer.
No. At a given level of human understanding of AI, there are 2 levels of model.
The strongest model humans can contain (this means the model is mostly working for humans and is not going to deliberately betray). This is Utilitysafe.
The strongest model that the current level of (compute built in total + total data that exists + total robotics) is able to exist. Until very recently, 1 == 2. Humans could not build enough compute to make a model that was remotely dangerous. This is Utilityrogue.
And then I have realized that intelligence is meaningless. What matters is utility. Utility is the relative benefit of intelligence. It’s worthless to win “70% of tasks vs the next best software model” by doing 1% better, humans will crush the AI if they get to have 1% more resources.
Utility is some enormous table of domains and it’s a multiplier. I gave examples.
General intelligence, like current models are beginning to show, mean that utility in one domain transfer to another domain. General learning intelligence, which we can do with scripts tacked onto current models, can have utility in all domains but require training data to learn new domains.
Examples:
Utility[designing living creatures]:
Humans = millions. (we could design other living organisms from scratch with all new amino acids in millions of times less time that evolution needs)
Controllable AI from deepmind : 100* humans or more
Evolution : 1.0 (baseline)
Utility[chip, algorithm design]
Humans = 1.0
Controllable AI from deepmind : Less than 1.1 times humans. See https://deepmind.google/impact/optimizing-computer-systems-with-more-generalized-ai-tools/. 4% for video compression
Utility[tank battles]
Humans = 1.0
Tactical AI = ?. But think about it. Can you win with half the tanks? Probably. 1/10? 1/100? Probably not.
Utility[manufacturing]
Humans = 1.0
Machine policy solver = ?
Utility [aircraft design]
Humans = 1.0
RL solver from humans = ?
And so on. Exponential growth complicates this, even a small utility benefit would allow one party to win.
When I try to reason in a grounded way over “ok, what would the solution from humans look like? What would the solution from a narrow AI* that humans control look like? What is the best solution physics allows?”.
Well it depends. On easy tasks (pathfinding, aiming a ballistic weapon) or even modestly complex tasks (manipulation, assembly, debugging), I don’t see the best solution as being very much better. For extremely complex tasks I don’t know.
Current empirical data shows only small utility gains for now.
If the [Utilitysafe, Utilityrogue] multiplier is very large (100+), in all long term futures the AIs choose the future. Nature can’t abide a vacuum like that. Doomers/decelerationists can only buy a small amount of time.
If the [Utilitysafe, Utilityrogue] multiplier is small(<2.0), you must accelerate, because the [Utilitysafe] vs [regular humans] multiplier is still an unwinnable battle due to exponential growth. Humans can survive in this future as long as at least 2⁄3 of the “bits”—the physical resources—are in the hands of Utilitysafe systems.
Medium term values (2-100) it depends, you need to be very careful but maybe you can keep AI under control for a little while. You need 99% of the resources to be in the hands of safe systems, escaped AI are very much a crisis.
*myopia and various forms of container are strategies that lower a general AI to a narrow AI without the cost of developing a custom narrow AI.
I appreciate this discussion, and want to throw in my 2 cents:
I believe that any sort of delay (such as via an attempted ban which does slow AI development down) buys some chance of improved outcomes for humanity. In fact, a lot of my hope for a good outcome lies in finding a way to accomplish this delay whilst accelerating safety research.
The timing/speed at which the defection happens matters a lot! Not just the probability of defection occurring. How much does developing AI in secret instead of openly slow down a government? None? I expect it slows it at least somewhat.
So let’s engage on that.
Imagine it’s the cold war. You are an anti nuclear advocate. “STOP preparing to put city killer fusion bombs onto missiles and cramming them into submarines! You endanger human civilization itself, it is possible that we could all die!” you say.
Faction wise you get dismissed as a communist sympathizer. Your security clearances would be revoked and you would be fired as a federal employee in that era. “Communist” then means “decel” now.
I would agree that every part of the statement above is true. Preparing to commit genocide with thermonuclear devices is a morally dubious act, and while there is debate on the probability of a nuclear winter, it is correct to say that the chance is nonzero that a total nuclear war at the cold war arsenal peak could have extincted humanity or weakened humanity to the point the next crisis finished it off.
But...you don’t have a choice. You are locked into a race. Trying to reframe it not as a race doesn’t change the fact it’s a race. Any international agreements not to build loaded ICBMs you can confidently predict the other parties will secretly defect on.
Historically several countries did defect : Israel, Taiwan, and South Africa being the immediate 3 I can think of. Right now Ukraine is being brutalized for its choice to cooperate. Related to your point above, they took longer to get nukes than the superpowers did due to the need for secrecy.
I think we are in the same situation now. I think the payoff matrix in favor of AI is far more favorable to the case for AI than nukes were. The incentives are far, far stronger.
So it seems like a grounded view, based on the facts, to predict the following outcome: AI acceleration until the singularity.
In more details : you talk about regulations and possibly efforts going underground. This only becomes relevant if out of all major power blocks, there is not at least one bloc building ASI at the maximum speed possible. I think this is what will happen as this seems to be what is happening today, so it just has to continue in one power bloc.
We have not seen a total war effort yet but it seems maybe inevitable. (Right now ai investment is still a tiny part of the economy at maybe 200 Billion/year. A total war effort means the party puts all resources into preparing for the coming nuclear war and developing ASI. A less than total war effort, but most of the economy goes into ai, is also possible and would be what economics would cause to happen naturally)
The other blocs choices don’t matter at that point, since the winner will take over the market for ASI services and own (or be subsumed by) the outcome.
There’s other cold war analogs as well. Many people during the era expected the war would happen during their career. They participated in developing and deploying nukes and expected a fight to the end. Yet again, what choice did they have.
Do we actually have a choice now?
As far as I know, most datacenters are not currently secretly underground in bomb-proof bunkers. If this were known to be the case, I’d have different views of the probable events in the next few years.
Currently, as far as I know, most data centers are out in the open. They weren’t built with the knowledge that soon they would become the equivalent of nuclear ICBM silos. The current state of the world is heavily offense-favoring, so far as I know.
Do you disagree? Do you think ASI will be fully developed and utilized sufficiently to achieve complete worldwide hegemony by the user without there being any major international conflict?
Well, as I mentioned in the other post, but I will open a larger point here: anything physics permits can theoretically happen in the future, right? For complex situations like this, scientifically validated models like particle physics do not exist yet. All we can do is look at what historically happened in a similar scenario and our prior should be that the outcome draw is from the same probability distribution this round. Agree/disagree?
So for nuclear programs, all the outcomes have happened. Superpowers have built vast campuses and secret cities, and made the plutonium and assembled the devices in aboveground facilities. Israel apparently did it underground. Iran has been successfully decelerated from their nuclear ambitions for decades.
But the general trend I feel is that a superpower can’t be stopped by bombing, and another common element has happened a bunch of times historically. Bombing and military actions often harden a belligerents resolve. They hardened the UKs resolve, US resolve, Russian resolve, it goes on.
So in the hypothetical world where party A is too worried about AI dangers to build their own, party B is building it, unless A can kill B, B would respond to the attack by a total war effort and would develop AI and win or die. (Die from nukes, die from the AI, or win the planet)
That is what I predict would be the outcome and we can enumerate all the wars where this historically has happened if you would like. Trend wise the civilian and military leaders on B, consuming their internal propaganda, tend to commit to total war.
If B is a superpower (the USA and China, maybe EU) they can kill all the other parties with their nukes, so choosing to kill B is choosing suicide for yourself.
Yes, I am imagining that there is some sort of toothy international agreement with official inspectors posted at every data center worldwide. That’s what I mean by delay. Or the delay which could come from the current leading labs slowing down and letting China catch up.
If the first lab to cross the finish line gets forcibly nationalized or assassinated by state-sponsored terrorist groups, why hurry? Why stay ahead? If we can buy six more months by not rushing quite as hard, why not buy them? What do the big labs lose by letting China catch up? Maybe this seems unlikely now, but I expect the end-game months are going to look a lot different.
See my other comment here: https://www.lesswrong.com/posts/A4nfKtD9MPFBaa5ME/we-re-all-in-this-together?commentId=JqbvwWPtXbFuDqaib
Responding to your other comment: Probably AI labs will be nationalized, yes, as models reach capabilities levels to be weapons in themselves.
The one here: a “toothy international agreement” to me sounds indistinguishable from the “world government and nukes are banned” world that makes rational sense from the 1950s but has not happened for 73 years.
Why would it happen this time? In your world model, are you imagining that the “world government” outcome was always a non negligible probability and the dice roll could go this way?
Or do you think that a world that let countries threaten humanity and every living person in a city with doomsday nuclear arsenals would consider total human extinction a more serious threat than nukes, and people would come together to an agreement?
Or do you think the underlying technology or sociological structure of the world has changed in a way that allows world governments now, but didn’t then?
I genuinely don’t know how you are reaching these conclusions. Do you see my perspective? Countless forces between human groups create trends, and those trends are the history and economics we know. To expect a different result requires the underlying rules to have shifted.
A world government seems much more plausible to me in a world where the only surviving fraction of humanity is huddled in terror in the few remaining underground bunkers belonging to a single nation.
Note: I don’t advocate for this world outcome, but I do see it as a likely outcome in the worlds where strong international cooperation fails.
The is a lot of statism or pessimism about the potential for improving coordination in your comments. No mentions of the projects of the kind:
Collective Intelligence Project (+ ecosystem)
AI for Institutions projects (ideas)
Metagov projects, including DAO Science
Gaia Consortium
CLR, FOCAL, AI Objectives Institute
Thanks for the links! I’ve been idly thinking about such projects as well, nice to see what ideas others have been considering. Hopefully there’s something workable in there.
But my median prediction is that none of that works, yes. Let alone on the relevant timeline (<10 years). Stuff like Manifold Markets and Twitter’s Community Notes are steps in the right direction, but they’re so very ridiculously tiny. In the meanwhile, the pressures destroying the coordination ability continue to mount.
My optimism would rise dramatically if one of these ideas spawns, say, an SV startup centered around it that ends up valued at billions of dollars within the next three years.
Maybe this is going to be such a startup? -- A proposal for improving the global online discourse
Why do you think this. I don’t know how you even could find a sufficiently large team of californian AI researchers who lack cosmopolitan values, who wouldn’t blow the whistle if someone asked them to implement like, a nationalistic or oligarchic (They still have capped profits.) utility function or whatever instead of the obvious “optimize the preferences of all currently living humans”.
I feel like you have to be missing a basic understanding of the cultural background of college-educated californians or the degree of decentralization in their decisionmaking (you could read sama’s reinstatement as a demonstration of a cult of personality, but I think that is wrong, what actually happened was staff organized, chose the leader they preferred, and got their way).
In general, I agree that global cooperation might not be possible, but the door hasn’t closed yet, and if it closes, there is still something that can be done.