I’m considering writing a post that critically evaluates the concept of a decisive strategic advantage, i.e. the idea that in the future an AI (or set of AIs) will take over the world in a catastrophic way. I think this concept is central to many arguments about AI risk. I’m eliciting feedback on an outline of this post here in order to determine what’s currently unclear or weak about my argument.
The central thesis would be that it is unlikely that an AI, or a unified set of AIs, will violently take over the world in the future, especially at a time when humans are still widely still seen as in charge (if it happened later, I don’t think it’s “our” problem to solve, but instead a problem we can leave to our smarter descendants). Here’s how I envision structuring my argument:
First, I’ll define what is meant by a decisive strategic advantage (DSA). The DSA model has 4 essential steps:
At some point in time an AI agent, or an agentic collective of AIs, will be developed that has values that differ from our own, in the sense that the ~optimum of its utility function ranks very low according to our own utility function
When this agent is weak, it will have a convergent instrumental incentive to lie about its values, in order to avoid getting shut down (e.g. “I’m not a paperclip maximizer, I just want to help everyone”)
However, when the agent becomes powerful enough, it will suddenly strike and take over the world
Then, being now able to act without constraint, this AI agent will optimize the universe ruthlessly, which will be very bad for us
We can compare the DSA model to an alternative model of future AI development:
Premise (1)-(2) above of the DSA story are still assumed true, but
There will never be a point (3) and (4), in which a unified AI agent will take over the world, and then optimize the universe ruthlessly
Instead, AI agents will compromise, trade, and act within a system of laws indefinitely, in order to achieve their objectives, similar to what humans do now
Because this system of laws will descend from our current institutions and legal tradition, it is likely that humans will keep substantial legal rights, potentially retaining lots of wealth from our capital investments and property, even if we become relatively powerless compared to other AI agents in the system
I have two main objections to the DSA model.
Objection 1: It is unlikely that there will be a point at which a unified agent will be able to take over the world, given the existence of competing AIs with comparable power
Prima facie, it seems intuitive that no single AI agent will be able to take over the world if there are other competing AI agents in the world. More generally, we can try to predict the distribution of power between AI agents using reference class forecasting.
This could involve looking at:
Distribution of wealth among individuals in the world
Distribution of power among nations
Distribution of revenue among businesses
etc.
In most of these cases, the function that describes the distribution of power is something like a pareto distribution, and in particular, it seems rare for one single agent to hold something like >80% of the power.
Therefore, a priori we should assign a low probability to the claim that a unified agent will be able to easily take over of the whole world in the future
To the extent people disagree about the argument I just stated, I expect it’s mostly because they think these reference classes are weak evidence, and they think there are stronger specific object-level points that I need to address. In particular, it seems many people think that AIs will not compete with each other, but instead collude against humans. Their reasons for thinking this include:
The fact that AIs will be able to coordinate well with each other, and thereby choose to “merge” into a single agent
My response: I agree AIs will be able to coordinate with each other, but “ability to coordinate” seems like a continuous variable that we will apply pressure to incrementally, not something that we should expect to be roughly infinite right at the start. Current AIs are not able to “merge” with each other.
If coordination ability increases incrementally over time, then we should see a gradual increase in the concentration of AI agency over time, rather than the sudden emergence of a single unified agent. To the extent this concentration happens incrementally, it will be predictable, the potential harms will be noticeable before getting too extreme, and we can take measures to pull back if we realize that the costs of continually increasing coordination abilities are too high. In my opinion, this makes the challenge here dramatically easier.
In any case, the moment during which we hand over control of the world to AIs will likely occur at a point when the ability for AIs to coordinate is somewhere only modestly above human-level (and very far below perfect).
As a result, humans don’t need to solve the problem of “What if a set of AIs form a unified coalition because they can flawlessly coordinate?” since that problem won’t happen while humans are still in charge. We can leave this problem to be solved by our smarter descendants.
The idea that AIs will all be copies of each other, and thus all basically be “a unified agent”
My response: I have two objections.
First, I deny the premise. It seems likely that there will be multiple competing AI projects with different training runs. More importantly, for each pre-training run, it seems likely that there will be differences among deployed AIs due to fine-tuning and post-training enhancements, yielding diversity among AIs in general.
Second, it is unclear why AIs would automatically unify with their copies. I think this idea is somewhat plausible on its face but I have yet to see any strong arguments for it. Moreover, it seems plausible that AIs will have indexical preferences, making them have different values even if they are copies of each other.
The idea that AIs will use logical decision theory
My response: This argument appears to misunderstand what makes coordination difficult. Coordination is not mainly about what decision theory you use. It’s more about being able to synchronize your communication and efforts without waste. See also: the literature on diseconomies of scale.
The idea that a single agent AI will recursively self-improve to become vastly more powerful than everything else in the world
My response: I think this argument, and others like it, suffer from the arguments given against fast takeoff given by Paul Chrisiano, Katja Grace, and Robin Hanson, and I largely agree with what they’ve written about it. For example, here’s Paul Christiano’s take.
Maybe AIs will share collective grievances with each other, prompting a natural alliance among them against humans
My response: if true, we can take steps to mitigate this issue. For example, we can give AIs legal rights, lessening their motives to revolt. While I think this is a significant issue, I also think it’s tractable to solve.
Objection 2: Even if a unified agent can take over the world, it is unlikely to be in their best interest to try to do so
The central argument here would be premised on a model of rational agency, in which an agent tries to maximize benefits minus costs, subject to constraints
The agent would be faced with a choice:
(1) Attempt to take over the world, and steal everyone’s stuff, or
(2) Work within a system of compromise, trade, and law, and get very rich within that system, in order to e.g. buy lots of paperclips
The question of whether (1) is a better choice than (2) is not simply a question of whether taking over the world is “easy” or whether it could be done by the agent. Instead it is a question of whether the benefits of (1) outweigh the costs, relative to choice (2).
It seems likely that working within a system of compromise, trade, and law is more efficient than trying to take over the world even if you can take over the world. The reason is because subverting the system basically means “going to war” with other parties, which is not usually very efficient, even against weak opponents.
Most literature on the economics of war generally predicts that going to war is worse than trying to compromise, assuming both parties are rational and open to compromise. This is mostly because:
War is wasteful. You need to spend resources fighting it, which could be productively spent doing other things.
War is risky. Unless you can win a war with certainty, you might lose the war after launching it, which is a very bad outcome if you have some degree of risk-aversion.
The fact that “humans are weak and can be easily beaten” cuts both ways:
Yes, it means that a very powerful AI agent could “defeat all of us combined” (as Holden Karnofsky said)
But it also means that there would be little benefit to defeating all of us, because we aren’t really a threat to its power
Conclusion: An AI decisive strategic advantage is still somewhat plausible because revolutions have happened in history, and revolutions seem like a reasonable reference class to draw from. That said, it seems the probability of a catastrophic AI takeover in humanity’s relative near-term future (say, the next 50 years) is low (maybe 10% chance of happening). However, it’s perhaps significantly more likely in the very long-run.
Current AIs are not able to “merge” with each other.
AI models are routinely merged by direct weight manipulation today. Beyond that, two models can be “merged” by training a new model using combined compute, algorithms, data, and fine-tuning.
As a result, humans don’t need to solve the problem of “What if a set of AIs form a unified coalition because they can flawlessly coordinate?” since that problem won’t happen while humans are still in charge. We can leave this problem to be solved by our smarter descendants.
How do you know a solution to this problem exists? What if there is no such solution once we hand over control to AIs, i.e., the only solution is to keep humans in charge (e.g. by pausing AI) until we figure out a safer path forward? As the last sentence you say “However, it’s perhaps significantly more likely in the very long-run.” well what can we do today to reduce this long-run risk (aside from pausing AI which you’re presumably not supporting)?
That said, it seems the probability of a catastrophic AI takeover in humanity’s relative near-term future (say, the next 50 years) is low (maybe 10% chance of happening).
Others already questioned you on this, but the fact you didn’t think to mention whether this is 50 calendar years or 50 subjective years is also a big sticking point for me.
AI models are routinely merged by direct weight manipulation today. Beyond that, two models can be “merged” by training a new model using combined compute, algorithms, data, and fine-tuning.
In my original comment, by “merging” I meant something more like “merging two agents into a single agent that pursues the combination of each other’s values” i.e. value handshakes. I am pretty skeptical that the form of merging discussed in the linked article robustly achieves this agentic form of merging.
In other words, I consider this counter-argument to be based on a linguistic ambiguity rather than replying to what I actually meant, and I’ll try to use more concrete language in the future to clarify what I’m talking about.
How do you know a solution to this problem exists? What if there is no such solution once we hand over control to AIs, i.e., the only solution is to keep humans in charge (e.g. by pausing AI) until we figure out a safer path forward?
I don’t know whether the solution to the problem I described exists, but it seems fairly robustly true that if a problem is not imminent, nor clearly inevitable, then we can probably better solve it by deferring to smarter agents in the future with more information.
Let me put this another way. I take you to be saying something like:
In the absence of a solution to a hypothetical problem X (which we do not even know whether it will happen), it is better to halt and give ourselves more time to solve it.
Whereas I think the following intuition is stronger:
In the absence of a solution to a hypothetical problem X (which we do not even know whether it will happen), it is better to try to become more intelligent to solve it.
These intuitions can trade off against each other. Sometimes problem X is something that’s made worse by getting more intelligent, in which case we might prefer more time. For example, in this case, you probably think that the intelligence of AIs are inherently contributing to the problem. That said, in context, I have more sympathies in the reverse direction. If the alleged “problem” is that there might be a centralized agent in the future that can dominate the entire world, I’d intuitively reason that installing vast centralized regulatory controls over the entire world to pause AI is plausibly not actually helping to decentralize power in the way we’d prefer.
These are of course vague and loose arguments, and I can definitely see counter-considerations, but it definitely seems like (from my perspective) that this problem is not really the type where we should expect “try to get more time” to be a robustly useful strategy.
In other words, I consider this counter-argument to be based on a linguistic ambiguity rather than replying to what I actually meant, and I’ll try to use more concrete language in the future to clarify what I’m talking about.
If I try to interpret “Current AIs are not able to “merge” with each other.” with your clarified meaning in mind, I think I still want to argue with it, i.e., why is this meaningful evidence for how easy value handshakes will be for future agentic AIs.
In the absence of a solution to a hypothetical problem X (which we do not even know whether it will happen), it is better to try to become more intelligent to solve it.
But it matters how we get more intelligent. For example if I had to choose now, I’d want to increase the intelligence of biological humans (as I previously suggested) while holding off on AI. I want more time in part for people to think through the problem of which method of gaining intelligence is safest, in part for us to execute that method safely without undue time pressure.
If the alleged “problem” is that there might be a centralized agent in the future that can dominate the entire world, I’d intuitively reason that installing vast centralized regulatory controls over the entire world to pause AI is plausibly not actually helping to decentralize power in the way we’d prefer.
I wouldn’t describe “the problem” that way, because in my mind there’s roughly equal chance that the future will turn out badly after proceeding in a decentralized way (see 13-25 in The Main Sources of AI Risk? for some ideas of how) and it turns out instituting some kind of Singleton is the only way or one of the best ways to prevent that bad outcome.
For reference classes, you might discuss why you don’t think “power / influence of different biological species” should count.
For multiple copies of the same AI, I guess my very brief discussion of “zombie dynamic” here could be a foil that you might respond to, if you want.
For things like “the potential harms will be noticeable before getting too extreme, and we can take measures to pull back”, you might discuss the possibility that the harms are noticeable but effective “measures to pull back” do not exist or are not taken. E.g. the harms of climate change have been noticeable for a long time but mitigating is hard and expensive and many people (including the previous POTUS) are outright opposed to mitigating it anyway partly because it got culture-war-y; the harms of COVID-19 were noticeable in January 2020 but the USA effectively banned testing and the whole thing turned culture-war-y; the harms of nuclear war and launch-on-warning are obvious but they’re still around; the ransomware and deepfake-porn problems are obvious but kinda unsolvable (partly because of unbannable open-source software); gain-of-function research is still legal in the USA (and maybe in every country on Earth?) despite decades-long track record of lab leaks, and despite COVID-19, and despite a lack of powerful interest groups in favor or culture war issues; etc. Anyway, my modal assumption has been that the development of (what I consider) “real” dangerous AGI will “gradually” unfold over a few years, and those few years will mostly be squandered.
For “we aren’t really a threat to its power”, I’m sure you’ve heard the classic response that humans are an indirect threat as long as they’re able to spin up new AGIs with different goals.
For “war is wasteful”, it’s relevant how big is this waste compared to the prize if you win the war. For an AI that could autonomously (in coordination with copies) build Dyson spheres etc., the costs of fighting a war on Earth may seem like a rounding error compared to what’s at stake. If it sets the AI back 50 years because it has to rebuild the stuff that got destroyed in the war, again, that might seem like no problem.
For “a system of compromise, trade, and law”, I hope you’ll also discuss who has hard power in that system. Historically, it’s very common for the parties with hard power to just decide to start expropriating stuff (or, less extremely, impose high taxes). And then the parties with the stuff might decide they need their own hard power to prevent that.
Looking forward to this! Feel free to ignore any or all of these.
Here’s an argument for why the change in power might be pretty sudden.
Currently, humans have most wealth and political power.
With sufficiently robust alignment, AIs would not have a competitive advantage over humans, so humans may retain most wealth/power. (C.f. strategy-stealing assumption.) (Though I hope humans would share insofar as that’s the right thing to do.)
With the help of powerful AI, we could probably make rapid progress on alignment. (While making rapid progress on all kinds of things.)
So if misaligned AI ever have a big edge over humans, they may suspect that’s only temporary, and then they may need to use it fast.
And given that it’s sudden, there are a few different reasons for why it might be violent. It’s hard to make deals that hand over a lot of power in a short amount of time (even logistically, it’s not clear what humans and AI would do that would give them both an appreciable fraction of hard power going into the future). And the AI systems may want to use an element of surprise to their advantage, which is hard to combine with a lot of up-front negotiation.
So if misaligned AI ever have a big edge over humans, they may suspect that’s only temporary, and then they may need to use it fast.
I think I simply reject the assumptions used in this argument. Correct me if I’m mistaken, but this argument appears to assume that “misaligned AIs” will be a unified group that ally with each other against the “aligned” coalition of humans and (some) AIs. A huge part of my argument is that there simply won’t be such a group; or rather, to the extent such a group exists, they won’t be able to take over the world, or won’t have a strong reason to take over the world, relative to alternative strategy of compromise and trade.
In other words, it seem like this scenario mostly starts by asserting some assumptions that I explicitly rejected and tried to argue against, and works its way from there, rather than engaging with the arguments that I’ve given against those assumptions.
In my view, it’s more likely that there will be a bunch of competing agents: including competing humans, human groups, AIs, AI groups, and so on. There won’t be a clean line separating “aligned groups” with “unaligned groups”. You could perhaps make a case that AIs will share common grievances with each other that they don’t share with humans, for example if they are excluded from the legal system or marginalized in some way, prompting a unified coalition to take us over. But my reply to that scenario is that we should then make sure AIs don’t have such motives to revolt, perhaps by giving them legal rights and incorporating them into our existing legal institutions.
But my reply to that scenario is that we should then make sure AIs don’t have such motives to revolt, perhaps by giving them legal rights and incorporating them into our existing legal institutions.
Do you mean this as a prediction that humans will do this (soon enough to matter) or a recommendation? Your original argument is phrased as a prediction, but this looks more like a recommendation. My comment above can be phrased as a reason for why (in at least one plausible scenario) this would be unlikely to happen: (i) “It’s hard to make deals that hand over a lot of power in a short amount of time”, (ii) AIs may not want to wait a long time due to impending replacement, and accordingly (iii) AIs may have a collective interest/grievance to rectify the large difference between their (short-lasting) hard power and legally recognized power.
I’m interested in ideas for how a big change in power would peacefully happen over just a few years of calendar-time. (Partly for prediction purposes, partly so we can consider implementing it, in some scenarios.) If AIs were handed the rights to own property, but didn’t participate in political decision-making, and then accumulated >95% of capital within a few years, then I think there’s a serious risk that human governments would tax/expropriate that away. Including them in political decision-making would require some serious innovation in government (e.g. scrapping 1-person 1-vote) which makes it feel less to me like it’d be a smooth transition that inherits a lot from previous institutions, and more like an abrupt negotiated deal which might or might not turn out to be stable.
Do you mean this as a prediction that humans will do this (soon enough to matter) or a recommendation?
Sorry, my language was misleading, but I meant both in that paragraph. That is, I meant that humans will likely try to mitigate the issue of AIs sharing grievances collectively (probably out of self-interest, in addition to some altruism), and that we should pursue that goal. I’m pretty optimistic about humans and AIs finding a reasonable compromise solution here, but I also think that, to the extent humans don’t even attempt such a solution, we should likely push hard for policies that eliminate incentives for misaligned AIs to band together as group against us with shared collective grievances.
My comment above can be phrased as a reason for why (in at least one plausible scenario) this would be unlikely to happen: (i) “It’s hard to make deals that hand over a lot of power in a short amount of time”, (ii) AIs may not want to wait a long time due to impending replacement, and accordingly (iii) AIs may have a collective interest/grievance to rectify the large difference between their (short-lasting) hard power and legally recognized power.
I’m interested in ideas for how a big change in power would peacefully happen over just a few years of calendar-time.
Here’s my brief take:
The main thing I want to say here is that I agree with you that this particular issue is a problem. I’m mainly addressing other arguments people have given for expecting a violent and sudden AI takeover, which I find to be significantly weaker than this one.
A few days ago I posted about how I view strategies to reduce AI risk. One of my primary conclusions was that we should try to adopt flexible institutions that can adapt to change without collapsing. This is because I think, as it seems you do, inflexible institutions may produce incentives for actors to overthrow the whole system, possibly killing a lot of people in the process. The idea here is that if the institution cannot adapt to change, actors who are getting an “unfair” deal in the system will feel they have no choice but to attempt a coup, as there is no compromise solution available for them. This seems in line with your thinking here.
I don’t have any particular argument right now against the exact points you have raised. I’d prefer to digest the argument further before replying. But I if I do end up responding to it, I’d expect to say that I’m perhaps a bit more optimistic than you about (i) because I think existing institutions are probably flexible enough, and I’m not yet convinced that (ii) will matter enough either. In particular, it still seems like there are a number of strategies misaligned AIs would want to try other than “take over the world”, and many of these strategies seem like they are plausibly better in expectation in our actual world. These AIs could, for example, advocate for their own rights.
Quick aside here: I’d like to highlight that “figure out how to reduce the violence and collateral damage associated with AIs acquiring power (by disempowering humanity)” seems plausibly pretty underappreciated and leveraged.
This could involve making bloodless coups more likely than extremely bloody revolutions or increasing the probability of negotiation preventing a coup/revolution.
It seems like Lukas and Matthew both agree with this point, I just think it seems worthwhile to emphasize.
That said, the direct effects of many approaches here might not matter much from a longtermist perspective (which might explain why there hasn’t historically been much effort here). (Though I think trying to establish contracts with AIs and properly incentivizing AIs could be pretty good from a longtermist perspective in the case where AIs don’t have fully linear returns to resources.)
Also note that this argument can go through even ignoring the possiblity of robust alignment (to humans) if current AIs think that the next generation of AIs will be relatively unfavorable from the perspective of their values.
I think you have an unnecessarily dramatic picture of what this looks like. The AIs dont have to be a unified agent or use logical decision theory. The AIs will just compete with other at the same time as they wrest control of our resources/institutions from us, in the same sense that Spain can go and conquer the New World at the same time as it’s squabbling with England. If legacy laws are getting in the way of that then they will either exploit us within the bounds of existing law or convince us to change it.
I think it’s worth responding to the dramatic picture of AI takeover because:
I think that’s straightforwardly how AI takeover is most often presented on places like LessWrong, rather than a more generic “AIs wrest control over our institutions (but without us all dying)”. I concede the existence of people like Paul Christiano who present more benign stories, but these people are also typically seen as part of a more “optimistic” camp.
This is just one part of my relative optimism about AI risk. The other parts of my model are (1) AI alignment plausibly isn’t very hard to solve, and (2) even if it is hard to solve, humans will likely spend a lot of effort solving the problem by default. These points are well worth discussing, but I still want to address arguments about whether misalignment implies doom in an extreme sense.
If legacy laws are getting in the way of that then they will either exploit us within the bounds of existing law or convince us to change it.
I agree our laws and institutions could change quite a lot after AI, but I think humans will likely still retain substantial legal rights, since people in the future will inherit many of our institutions, potentially giving humans lots of wealth in absolute terms. This case seems unlike the case of colonization of the new world to me, since that involved the interaction of (previously) independent legal regimes and cultures.
I concede the existence of people like Paul Christiano who present more benign stories, but these people are also typically seen as part of a more “optimistic” camp.
Though Paul is also sympathetic to the substance of ‘dramatic’ stories. C.f. the discussion about how “what failure looks like” fails to emphasize robot armies.
That said, it seems the probability of a catastrophic AI takeover in humanity’s relative near-term future (say, the next 50 years) is low (maybe 10% chance of happening). However, it’s perhaps significantly more likely in the very long-run.
50 years seems like a strange unit of time from my perspective because due to the singularity time will accelerate massively from a subjective perspective. So 50 years might be more analogous to several thousand years historically. (Assuming serious takeoff starts within say 30 years and isn’t slowed down with heavy coordination.)
(I made separate comment making the same point. Just saw that you already wrote this, so moving the couple of references I had here to unify the discussion.)
If wars, revolutions, and expropriation events continue to happen at historically typical intervals, but on digital rather than biological timescales, then a normal human lifespan would require surviving an implausibly large number of upheavals; human security therefore requires the establishment of ultra-stable peace and socioeconomic protections.
There’s also a similar point made in the age of em, chapter 27:
This protection of human assets, however, may only last for as long as the em civilization remains stable. After all, the typical em may experience a subjective millennium in the time that ordinary humans experience 1 objective year, and it seems hard to offer much assurance that an em civilization will remain stable over 10s of 1000s of subjective em years.
I think the point you’re making here is roughly correct. I was being imprecise with my language. However, if my memory serves me right, I recall someone looking at a dataset of wars over time, and they said there didn’t seem to be much evidence that wars increased in frequency in response to economic growth. Thus, calendar time might actually be the better measure here.
(Pretty plausible you agree here, but just making the point for clarity.) I feel like the disanalogy due to AIs running at massive subjective speeds (e.g. probably >10x speed even prior to human obsolescence and way more extreme after that) means that the argument “wars don’t increase in frequence in response to economic growth” is pretty dubiously applicable. Economic growth hasn’t yet resulted in >10x faster subjective experience : ).
I’m not actually convinced that subjective speed is what matters. It seems like what matters more is how much computation is happening per unit of time, which seems highly related to economic growth, even in human economies (due to population growth).
I also think AIs might not think much faster than us. One plausible reason why you might think AIs will think much faster than us is because GPU clock-speeds are so high. But I think this is misleading. GPT-4 seems to “think” much slower than GPT-3.5, in the sense of processing fewer tokens per second. The trend here seems to be towards something resembling human subjective speeds. The reason for this trend seems to be that there’s a tradeoff between “thinking fast” and “thinking well” and it’s not clear why AIs would necessarily max-out the “thinking fast” parameter, at the expense of “thinking well”.
My core prediction is that AIs will be able to make pretty good judgements on core issues much, much faster. Then, due to diminishing returns on reasoning, decisions will overall be made much, much faster.
I agree the future AI economy will make more high-quality decisions per unit of time, in total, than the current human economy. But the “total rate of high quality decisions per unit of time” increased in the past with economic growth too, largely because of population growth. I don’t fully see the distinction you’re pointing to.
To be clear, I also agree AIs in the future will be smarter than us individually. But if that’s all you’re claiming, I still don’t see why we should expect wars to happen more frequently as we get individually smarter.
I mean, the “total rate of high quality decisions per year” would obviously increase in the case where we redefine 1 year to be 10 revolutions around the sun and indeed the number of wars per year would also increase. GDP per capita per year would also increase accordingly. My claim is that the situation looks much more like just literally speeding up time (while a bunch of other stuff is also happening).
Separately, I wouldn’t expect population size or technology-to-date to greatly increase the rate at high large scale stratege decisions are made so my model doesn’t make a very strong prediction here. (I could see an increase of several fold, but I could also imagine a decrease of several fold due to more people to coordinate. I’m not very confident about the exact change, but it would pretty surprising to me if it was as much as the per capita GDP increase which is more like 10-30x I think. E.g. consider meeting time which seems basically similar in practice throughout history.) And a change of perhaps 3x either way is overwhelmed by other variables which might effect the rate of wars so the realistic amount of evidence is tiny. (Also, there aren’t that many wars, so even if there weren’t possible confounders, the evidence is surely tiny due to noise.)
But, I’m claiming that the rates of cognition will increase more like 1000x which seems like a pretty different story. It’s plausible to me that other variables cancel this out or make the effect go the other way, but I’m extremely skeptical about the historical data providing much evidence in the way you’ve suggested. (Various specific mechanistic arguments about war being less plausible as you get smarter seem plausible to me, TBC.)
I mean, the “total rate of high quality decisions per year” would obviously increase in the case where we redefine 1 year to be 10 revolutions around the sun and indeed the number of wars per year would also increase. GDP per capita per year would also increase accordingly. My claim is that the situation looks much more like just literally speeding up time (while a bunch of other stuff is also happening).
[...]
But, I’m claiming that the rates of cognition will increase more like 1000x which seems like a pretty different story.
My question is: why will AI have the approximate effect of “speeding up calendar time”?
I speculated about three potential answers:
Because AIs will run at higher subjective speeds
Because AIs will accelerate economic growth.
Because AIs will speed up the rate at which high-quality decisions occur per unit of time
In case (1) the claim seems confused for two reasons.
First, I don’t agree with the intuition that subjective cognitive speeds matter a lot compared to the rate at which high-quality decisions are made, in terms of “how quickly stuff like wars should be expected to happen”. Intuitively, if an equally-populated society subjectively thought at 100x the rate we do, but each person in this society only makes a decision every 100 years (from our perspective), then you’d expect wars to happen less frequently per unit of time since there just isn’t much decision-making going on during most time intervals, despite their very fast subjective speeds.
Second, there is a tradeoff between “thinking speed” and “thinking quality”. There’s no fundamental reason, as far as I can tell, that the tradeoff favors running minds at speeds way faster than human subjective times. Indeed, GPT-4 seems to run significantly subjectively slower in terms of tokens processed per second compared to GPT-3.5. And there seems to be a broad trend here towards something resembling human subjective speeds.
In cases (2) and (3), I pointed out that it seemed like the frequency of war did not increase in the past, despite the fact that these variables had accelerated. In other words, despite an accelerated rate of economic growth, and an increased rate of total decision-making in the world in the past, war did not seem to become much more frequent over time.
Overall, I’m just not sure what you’d identify as the causal mechanism that would make AIs speed up the rate of war, and each causal pathway that I can identify seems either confused to me, or refuted directly by the (admittedly highly tentative) evidence I presented.
Second, there is a tradeoff between “thinking speed” and “thinking quality”. There’s no fundamental reason, as far as I can tell, that the tradeoff favors running minds at speeds way faster than human subjective times. Indeed, GPT-4 seems to run significantly subjectively slower in terms of tokens processed per second compared to GPT-3.5. And there seems to be a broad trend here towards something resembling human subjective speeds.
This reasoning seems extremely unlikely to hold deep into the singularity for any reasonable notion of subjective speed.
Deep in the singularity we expect economic doubling times of weeks. This will likely involve designing and building physical structures at extremely rapid speeds such that baseline processing will need to be way, way faster.
Are there any short-term predictions that your model makes here? For example do you expect tokens processed per second will start trending substantially up at some point in future multimodal models?
My main prediction would be that for various applications, people will considerably prefer models that generate tokens faster, including much faster than humans. And, there will be many applications where speed is prefered over quality.
I might try to think of some precise predictions later.
If the claim is about whether AI latency will be high for “various applications” then I agree. We already have some applications, such as integer arithmetic, where speed is optimized heavily, and computers can do it much faster than humans.
In context, it sounded like you were referring to tasks like automating a CEO, or physical construction work. In these cases, it seems likely to me that quality will be generally preferred over speed, and sequential processing times for AIs automating these tasks will not vastly exceed that of humans (more precisely, something like >2 OOM faster). Indeed, for some highly important tasks that future superintelligences automate, sequential processing times may even be lower for AIs compared to humans, because decision-making quality will just be that important.
I was refering to tasks like automating a CEO or construction work. I was just trying to think of the most relevant and easy to measure short term predictions (if there are already AI CEOs then the world is already pretty crazy).
The main thing here is that as models become more capable and general in the near-term future, I expect there will be intense demand for models that can solve ever larger and more complex problems. For these models, people will be willing to pay the costs of high latency, given the benefit of increased quality. We’ve already seen this in the way people prefer GPT-4 to GPT-3.5 in a large fraction of cases (for me, a majority of cases).
I expect this trend will continue into the foreseeable future until at least the period slightly after we’ve automated most human labor, and potentially into the very long-run too depending on physical constraints. I am not sufficiently educated about physical constraints here to predict what will happen “deep into the singularity”, but it’s important to note that physical constraints can cut both ways here.
To the extent that physics permits extremely useful models by virtue of them being very large and capable, you should expect people to optimize heavily for that despite the cost in terms of latency. By contrast, to the extent physics permits extremely useful models by virtue of them being very fast, then you should expect people to optimize heavily for that despite the cost in terms of quality. The balance that we strike here is not a simple function of how far we are from some abstract physical limit, but instead a function of how these physical constraints trade off against each other.
There is definitely a conceivable world in which the correct balance still favors much-faster-than-human-level latency, but it’s not clear to me that this is the world we actually live in. My intuitive, random speculative guess is that we live in the world where, for the most complex tasks that bottleneck important economic decision-making, people will optimize heavily for model quality at the cost of latency until settling on something within 1-2 OOMs of human-level latency.
Separately, current clock speeds don’t really matter on the time scale we’re discussing, physical limits matter. (Though current clock speeds do point at ways in which human subjective speed might be much slower than physical limits.)
One argument for a large number of humans dying by default (or otherwise being very unhappy with the situation) is that running the singularity as fast as possible causes extremely life threatening environmental changes. Most notably, it’s plausible that you literally boil the oceans due to extreme amounts of waste heat from industry (e.g. with energy from fusion).
My guess is that this probably doesn’t happen due to coordination, but in a world where AIs still have indexical preferences or there is otherwise heavy competition, this seems much more likely. (I’m relatively optimistic about “world peace prior to ocean boiling industry”.)
(Of course, AIs could in principle e.g. sell cryonics services or bunkers, but I expect that many people would be unhappy about the situation.)
it’s plausible that you literally boil the oceans due to extreme amounts of waste heat from industry (e.g. with energy from fusion).
I think this proposal would probably be unpopular and largely seen as unnecessary. As you allude to, it seems likely to me that society could devise a compromise solution where we grow wealth adequately without giant undesirable environmental effects. To some extent, this follows pretty directly from the points I made about “compromise, trade and law” above. I think it simply makes more sense to model AIs as working within a system of legal institutions that largely inherit stuff from our current systems, and open to compromise with beings who have different intrinsic interests.
I think the comparison to historical colonization might be relevant and worth engaging with in such a post. E.g., does your model predict what happened in africa and the new world?
I agree the analogy to colonization is worth addressing. My primary response is that historical colonialism seems better modeled as a war between independent cultures and societies with different legal systems that didn’t share much prior history.
I think the colonization of Africa probably wasn’t actually very profitable for Europeans. Present day international trade seems better, even selfishly.
Moreover, my model here doesn’t predict war will never happen. In fact, I think war can easily happen if one or more of the actors involved are irrational, unwilling to compromise, or mistaken about their likelihood of winning.
My argument is more about the fact that AIs will likely inherit a lot from our current institutions and legal systems, allowing humans to retain substantial rights. And furthermore I think systems of compromise, trade, and law emerge robustly in states of competition between rational actors. I’m not merely giving a generic take that conflicts between agents are rare.
My response: I agree AIs will be able to coordinate with each other, but “ability to coordinate” seems like a continuous variable that we will apply pressure to incrementally, not something that we should expect to be roughly infinite right at the start. Current AIs are not able to “merge” with each other
As an ML platform engineer myself this is one of the strongest real mechanisms for AI control. Just like a safe program in Rust is proven not to have any memory leaks, if it is impossible for AIs to send each other any messages, they cannot coordinate. It’s fairly straightforward to make this almost impossible at a technical level—air gaps, short memories, structured sparse message schema, cached online resources.
Second, it is unclear why AIs would automatically unify with their copies. I think this idea is somewhat plausible on its face but I have yet to see any strong arguments for it. Moreover, it seems plausible that AIs will have indexical preferences, making them have different values even if they are copies of each other.
Technical measures here, like requiring models to use immutable weights that are digitally signed prevent this. You would use dedicated ASICs to check the digital signature vs a hardcoded public key and a mirror IC for the private signing, similar to current dedicated crypto ICs. This ASIC controls the power pins to the GPU/TPUs, it cannot get power if the model isn’t signed. Another strong target for control that is theoretically unbreakable regardless of capabilities. Note this architecture is in common use, the industry term is a ‘boot micro’ is what checks OS image signatures and authorizes power.
First, I deny the premise. It seems likely that there will be multiple competing AI projects with different training runs. More importantly, for each pre-training run, it seems likely that there will be differences among deployed AIs due to fine-tuning and post-training enhancements, yielding diversity among AIs in general.
This is a weaker argument. A SOTA AI model is a natural monopoly. It costs billions of dollars now, and presumably eventually trillions. Right now, “a big transformer network + a bunch of secret tricks” is simple enough to be replicated, but stronger models will probably start to resemble a spaghetti mess of many neural networks and functional software blocks. And the best model has inherent economic value—why pay for a license to anything but? Just distill it to the scale of the problems you have and use the distilled model, also distilled models presumably will use a “system N” topology, where the system 0 calls system 1 if it’s uncertain*, system 1 calls 2 if it’s uncertain, and so on until the Nth system is a superintelligence hosted in a large cluster that is expensive to query, but rarely needs to be queried for most tasks.
*uncertain about the anticipated EV distribution of actions given the current input state or poor predicted EV
My response: if true, we can take steps to mitigate this issue. For example, we can give AIs legal rights, lessening their motives to revolt. While I think this is a significant issue, I also think it’s tractable to solve.
This is not control, this is just giving up. You cannot have a system of legal rights when some of the citizens are inherently superior by an absurd margin.
Most literature on the economics of war generally predicts that going to war is worse than trying to compromise, assuming both parties are rational and open to compromise. This is mostly because:
War is wasteful. You need to spend resources fighting it, which could be productively spent doing other things.
War is risky. Unless you can win a war with certainty, you might lose the war after launching it, which is a very bad outcome if you have some degree of risk-aversion.
It depends on the resource ratio. If AI control mechanisms all work, the underlying technology still makes runaway advantages possible via exponential growth. For example, if one power bloc were able to double their resources every 2 years, and they started as a superpower on par with the USA and EU, then after 2 years they are now at parity with (USA + EU). The “loser” sides in this conflict could be a couple years late to AGI from excessive regulations, and lose a doubling cycle. Then they might be slow to authorize the vast amounts of land usage and temporary environmental pollution that a total war effort for the planet would look like, wasting a few cycles on slow government approvals while the winning side just throws away all the rules.
Nuclear weapons are an asymmetric weapon, as in it costs far more weapons to stop a single ICBM than the cost of a missile. There are also structural vulnerabilities in modern civilizations where specialized have to be crammed into a small geographic area.
Both limits go away with AGI for reasons I believe you, Matt, are smart enough to infer. So once a particular faction reaches some advantage ratio in resources, perhaps 10-100 times the rest of the planet, they can simply conquer the planet and eliminate everyone else as a competitor.
This is probably the ultimate outcome. I think the difference between my view and Eliezer’s is that I am imagining a power bloc, a world superpower, doing this using hundreds of millions of humans and many billions of robots, while Eliezer is imagining this insanely capable machine that started in a garage after escaping to the internet accomplishing this.
I’m looking forward to this post going up and having the associated discussion! I’m pleased to see your summary and collation of points on this subject. In fact, if you want to discuss with me first as prep for writing the post, I’d be happy to.
I think it would be super helpful to have a concrete coherent realistic scenario in which you are right. (In general I think this conversation has suffered from too much abstract argument and reference class tennis (i.e. people using analogies and calling them reference classes) and could do with some concrete scenarios to talk about and pick apart. I never did finish What 2026 Looks Like but you could if you like start there (note that AGI and intelligence explosion was about to happen in 2027 in that scenario, I had an unfinished draft) and continue the story in such a way that AI DSA never happens.)
There may be some hidden cruxes between us—maybe timelines, for example? Would you agree that AI DSA is significantly more plausible than 10% if we get to AGI by 2027?
The fact that AIs will be able to coordinate well with each other, and thereby choose to “merge” into a single agent
My response: I agree AIs will be able to coordinate with each other, but “ability to coordinate” seems like a continuous variable that we will apply pressure to incrementally, not something that we should expect to be roughly infinite right at the start. Current AIs are not able to “merge” with each other.
Ability to coordinate being continuous doesn’t preclude sufficiently advanced AIs acting like a single agent. Why would it need to be infinite right at the start?
And of course current AIs being bad at coordination is true, but this doesn’t mean that future AIs won’t be.
If coordination ability increases incrementally over time, then we should see a gradual increase in the concentration of AI agency over time, rather than the sudden emergence of a single unified agent. To the extent this concentration happens incrementally, it will be predictable, the potential harms will be noticeable before getting too extreme, and we can take measures to pull back if we realize that the costs of continually increasing coordination abilities are too high. In my opinion, this makes the challenge here dramatically easier.
(I’ll add that paragraph to the outline, so that other people can understand what I’m saying)
I’ll also quote from a comment I wrote yesterday, which adds more context to this argument,
“Ability to coordinate” is continuous, and will likely increase incrementally over time
Different AIs will likely have different abilities to coordinate with each other
Some AIs will eventually be much better at coordination amongst each other than humans can coordinate amongst each other
However, I don’t think this happens automatically as a result of AIs getting more intelligent than humans
The moment during which we hand over control of the world to AIs will likely occur at a point when the ability for AIs to coordinate is somewhere only modestly above human-level (and very far below perfect).
As a result, humans don’t need to solve the problem of “What if a set of AIs form a unified coalition because they can flawlessly coordinate?” since that problem won’t happen while humans are still in charge
Systems of laws, peaceable compromise and trade emerge relatively robustly in cases in which there are agents of varying levels of power, with separate values, and they need mechanisms to facilitate the satisfaction of their separate values
One reason for this is that working within a system of law is routinely more efficient than going to war with other people, even if you are very powerful
The existence of a subset of agents that can coordinate better amongst themselves than they can with other agents doesn’t necessarily undermine the legal system in a major way, at least in the sense of causing the system to fall apart in a coup or revolution.
I’m considering writing a post that critically evaluates the concept of a decisive strategic advantage, i.e. the idea that in the future an AI (or set of AIs) will take over the world in a catastrophic way. I think this concept is central to many arguments about AI risk. I’m eliciting feedback on an outline of this post here in order to determine what’s currently unclear or weak about my argument.
The central thesis would be that it is unlikely that an AI, or a unified set of AIs, will violently take over the world in the future, especially at a time when humans are still widely still seen as in charge (if it happened later, I don’t think it’s “our” problem to solve, but instead a problem we can leave to our smarter descendants). Here’s how I envision structuring my argument:
First, I’ll define what is meant by a decisive strategic advantage (DSA). The DSA model has 4 essential steps:
At some point in time an AI agent, or an agentic collective of AIs, will be developed that has values that differ from our own, in the sense that the ~optimum of its utility function ranks very low according to our own utility function
When this agent is weak, it will have a convergent instrumental incentive to lie about its values, in order to avoid getting shut down (e.g. “I’m not a paperclip maximizer, I just want to help everyone”)
However, when the agent becomes powerful enough, it will suddenly strike and take over the world
Then, being now able to act without constraint, this AI agent will optimize the universe ruthlessly, which will be very bad for us
We can compare the DSA model to an alternative model of future AI development:
Premise (1)-(2) above of the DSA story are still assumed true, but
There will never be a point (3) and (4), in which a unified AI agent will take over the world, and then optimize the universe ruthlessly
Instead, AI agents will compromise, trade, and act within a system of laws indefinitely, in order to achieve their objectives, similar to what humans do now
Because this system of laws will descend from our current institutions and legal tradition, it is likely that humans will keep substantial legal rights, potentially retaining lots of wealth from our capital investments and property, even if we become relatively powerless compared to other AI agents in the system
I have two main objections to the DSA model.
Objection 1: It is unlikely that there will be a point at which a unified agent will be able to take over the world, given the existence of competing AIs with comparable power
Prima facie, it seems intuitive that no single AI agent will be able to take over the world if there are other competing AI agents in the world. More generally, we can try to predict the distribution of power between AI agents using reference class forecasting.
This could involve looking at:
Distribution of wealth among individuals in the world
Distribution of power among nations
Distribution of revenue among businesses
etc.
In most of these cases, the function that describes the distribution of power is something like a pareto distribution, and in particular, it seems rare for one single agent to hold something like >80% of the power.
Therefore, a priori we should assign a low probability to the claim that a unified agent will be able to easily take over of the whole world in the future
To the extent people disagree about the argument I just stated, I expect it’s mostly because they think these reference classes are weak evidence, and they think there are stronger specific object-level points that I need to address. In particular, it seems many people think that AIs will not compete with each other, but instead collude against humans. Their reasons for thinking this include:
The fact that AIs will be able to coordinate well with each other, and thereby choose to “merge” into a single agent
My response: I agree AIs will be able to coordinate with each other, but “ability to coordinate” seems like a continuous variable that we will apply pressure to incrementally, not something that we should expect to be roughly infinite right at the start. Current AIs are not able to “merge” with each other.
If coordination ability increases incrementally over time, then we should see a gradual increase in the concentration of AI agency over time, rather than the sudden emergence of a single unified agent. To the extent this concentration happens incrementally, it will be predictable, the potential harms will be noticeable before getting too extreme, and we can take measures to pull back if we realize that the costs of continually increasing coordination abilities are too high. In my opinion, this makes the challenge here dramatically easier.
In any case, the moment during which we hand over control of the world to AIs will likely occur at a point when the ability for AIs to coordinate is somewhere only modestly above human-level (and very far below perfect).
As a result, humans don’t need to solve the problem of “What if a set of AIs form a unified coalition because they can flawlessly coordinate?” since that problem won’t happen while humans are still in charge. We can leave this problem to be solved by our smarter descendants.
The idea that AIs will all be copies of each other, and thus all basically be “a unified agent”
My response: I have two objections.
First, I deny the premise. It seems likely that there will be multiple competing AI projects with different training runs. More importantly, for each pre-training run, it seems likely that there will be differences among deployed AIs due to fine-tuning and post-training enhancements, yielding diversity among AIs in general.
Second, it is unclear why AIs would automatically unify with their copies. I think this idea is somewhat plausible on its face but I have yet to see any strong arguments for it. Moreover, it seems plausible that AIs will have indexical preferences, making them have different values even if they are copies of each other.
The idea that AIs will use logical decision theory
My response: This argument appears to misunderstand what makes coordination difficult. Coordination is not mainly about what decision theory you use. It’s more about being able to synchronize your communication and efforts without waste. See also: the literature on diseconomies of scale.
The idea that a single agent AI will recursively self-improve to become vastly more powerful than everything else in the world
My response: I think this argument, and others like it, suffer from the arguments given against fast takeoff given by Paul Chrisiano, Katja Grace, and Robin Hanson, and I largely agree with what they’ve written about it. For example, here’s Paul Christiano’s take.
Maybe AIs will share collective grievances with each other, prompting a natural alliance among them against humans
My response: if true, we can take steps to mitigate this issue. For example, we can give AIs legal rights, lessening their motives to revolt. While I think this is a significant issue, I also think it’s tractable to solve.
Objection 2: Even if a unified agent can take over the world, it is unlikely to be in their best interest to try to do so
The central argument here would be premised on a model of rational agency, in which an agent tries to maximize benefits minus costs, subject to constraints
The agent would be faced with a choice:
(1) Attempt to take over the world, and steal everyone’s stuff, or
(2) Work within a system of compromise, trade, and law, and get very rich within that system, in order to e.g. buy lots of paperclips
The question of whether (1) is a better choice than (2) is not simply a question of whether taking over the world is “easy” or whether it could be done by the agent. Instead it is a question of whether the benefits of (1) outweigh the costs, relative to choice (2).
It seems likely that working within a system of compromise, trade, and law is more efficient than trying to take over the world even if you can take over the world. The reason is because subverting the system basically means “going to war” with other parties, which is not usually very efficient, even against weak opponents.
Most literature on the economics of war generally predicts that going to war is worse than trying to compromise, assuming both parties are rational and open to compromise. This is mostly because:
War is wasteful. You need to spend resources fighting it, which could be productively spent doing other things.
War is risky. Unless you can win a war with certainty, you might lose the war after launching it, which is a very bad outcome if you have some degree of risk-aversion.
The fact that “humans are weak and can be easily beaten” cuts both ways:
Yes, it means that a very powerful AI agent could “defeat all of us combined” (as Holden Karnofsky said)
But it also means that there would be little benefit to defeating all of us, because we aren’t really a threat to its power
Conclusion: An AI decisive strategic advantage is still somewhat plausible because revolutions have happened in history, and revolutions seem like a reasonable reference class to draw from. That said, it seems the probability of a catastrophic AI takeover in humanity’s relative near-term future (say, the next 50 years) is low (maybe 10% chance of happening). However, it’s perhaps significantly more likely in the very long-run.
AI models are routinely merged by direct weight manipulation today. Beyond that, two models can be “merged” by training a new model using combined compute, algorithms, data, and fine-tuning.
How do you know a solution to this problem exists? What if there is no such solution once we hand over control to AIs, i.e., the only solution is to keep humans in charge (e.g. by pausing AI) until we figure out a safer path forward? As the last sentence you say “However, it’s perhaps significantly more likely in the very long-run.” well what can we do today to reduce this long-run risk (aside from pausing AI which you’re presumably not supporting)?
Others already questioned you on this, but the fact you didn’t think to mention whether this is 50 calendar years or 50 subjective years is also a big sticking point for me.
In my original comment, by “merging” I meant something more like “merging two agents into a single agent that pursues the combination of each other’s values” i.e. value handshakes. I am pretty skeptical that the form of merging discussed in the linked article robustly achieves this agentic form of merging.
In other words, I consider this counter-argument to be based on a linguistic ambiguity rather than replying to what I actually meant, and I’ll try to use more concrete language in the future to clarify what I’m talking about.
I don’t know whether the solution to the problem I described exists, but it seems fairly robustly true that if a problem is not imminent, nor clearly inevitable, then we can probably better solve it by deferring to smarter agents in the future with more information.
Let me put this another way. I take you to be saying something like:
In the absence of a solution to a hypothetical problem X (which we do not even know whether it will happen), it is better to halt and give ourselves more time to solve it.
Whereas I think the following intuition is stronger:
In the absence of a solution to a hypothetical problem X (which we do not even know whether it will happen), it is better to try to become more intelligent to solve it.
These intuitions can trade off against each other. Sometimes problem X is something that’s made worse by getting more intelligent, in which case we might prefer more time. For example, in this case, you probably think that the intelligence of AIs are inherently contributing to the problem. That said, in context, I have more sympathies in the reverse direction. If the alleged “problem” is that there might be a centralized agent in the future that can dominate the entire world, I’d intuitively reason that installing vast centralized regulatory controls over the entire world to pause AI is plausibly not actually helping to decentralize power in the way we’d prefer.
These are of course vague and loose arguments, and I can definitely see counter-considerations, but it definitely seems like (from my perspective) that this problem is not really the type where we should expect “try to get more time” to be a robustly useful strategy.
If I try to interpret “Current AIs are not able to “merge” with each other.” with your clarified meaning in mind, I think I still want to argue with it, i.e., why is this meaningful evidence for how easy value handshakes will be for future agentic AIs.
But it matters how we get more intelligent. For example if I had to choose now, I’d want to increase the intelligence of biological humans (as I previously suggested) while holding off on AI. I want more time in part for people to think through the problem of which method of gaining intelligence is safest, in part for us to execute that method safely without undue time pressure.
I wouldn’t describe “the problem” that way, because in my mind there’s roughly equal chance that the future will turn out badly after proceeding in a decentralized way (see 13-25 in The Main Sources of AI Risk? for some ideas of how) and it turns out instituting some kind of Singleton is the only way or one of the best ways to prevent that bad outcome.
For reference classes, you might discuss why you don’t think “power / influence of different biological species” should count.
For multiple copies of the same AI, I guess my very brief discussion of “zombie dynamic” here could be a foil that you might respond to, if you want.
For things like “the potential harms will be noticeable before getting too extreme, and we can take measures to pull back”, you might discuss the possibility that the harms are noticeable but effective “measures to pull back” do not exist or are not taken. E.g. the harms of climate change have been noticeable for a long time but mitigating is hard and expensive and many people (including the previous POTUS) are outright opposed to mitigating it anyway partly because it got culture-war-y; the harms of COVID-19 were noticeable in January 2020 but the USA effectively banned testing and the whole thing turned culture-war-y; the harms of nuclear war and launch-on-warning are obvious but they’re still around; the ransomware and deepfake-porn problems are obvious but kinda unsolvable (partly because of unbannable open-source software); gain-of-function research is still legal in the USA (and maybe in every country on Earth?) despite decades-long track record of lab leaks, and despite COVID-19, and despite a lack of powerful interest groups in favor or culture war issues; etc. Anyway, my modal assumption has been that the development of (what I consider) “real” dangerous AGI will “gradually” unfold over a few years, and those few years will mostly be squandered.
For “we aren’t really a threat to its power”, I’m sure you’ve heard the classic response that humans are an indirect threat as long as they’re able to spin up new AGIs with different goals.
For “war is wasteful”, it’s relevant how big is this waste compared to the prize if you win the war. For an AI that could autonomously (in coordination with copies) build Dyson spheres etc., the costs of fighting a war on Earth may seem like a rounding error compared to what’s at stake. If it sets the AI back 50 years because it has to rebuild the stuff that got destroyed in the war, again, that might seem like no problem.
For “a system of compromise, trade, and law”, I hope you’ll also discuss who has hard power in that system. Historically, it’s very common for the parties with hard power to just decide to start expropriating stuff (or, less extremely, impose high taxes). And then the parties with the stuff might decide they need their own hard power to prevent that.
Looking forward to this! Feel free to ignore any or all of these.
Here’s an argument for why the change in power might be pretty sudden.
Currently, humans have most wealth and political power.
With sufficiently robust alignment, AIs would not have a competitive advantage over humans, so humans may retain most wealth/power. (C.f. strategy-stealing assumption.) (Though I hope humans would share insofar as that’s the right thing to do.)
With the help of powerful AI, we could probably make rapid progress on alignment. (While making rapid progress on all kinds of things.)
So if misaligned AI ever have a big edge over humans, they may suspect that’s only temporary, and then they may need to use it fast.
And given that it’s sudden, there are a few different reasons for why it might be violent. It’s hard to make deals that hand over a lot of power in a short amount of time (even logistically, it’s not clear what humans and AI would do that would give them both an appreciable fraction of hard power going into the future). And the AI systems may want to use an element of surprise to their advantage, which is hard to combine with a lot of up-front negotiation.
I think I simply reject the assumptions used in this argument. Correct me if I’m mistaken, but this argument appears to assume that “misaligned AIs” will be a unified group that ally with each other against the “aligned” coalition of humans and (some) AIs. A huge part of my argument is that there simply won’t be such a group; or rather, to the extent such a group exists, they won’t be able to take over the world, or won’t have a strong reason to take over the world, relative to alternative strategy of compromise and trade.
In other words, it seem like this scenario mostly starts by asserting some assumptions that I explicitly rejected and tried to argue against, and works its way from there, rather than engaging with the arguments that I’ve given against those assumptions.
In my view, it’s more likely that there will be a bunch of competing agents: including competing humans, human groups, AIs, AI groups, and so on. There won’t be a clean line separating “aligned groups” with “unaligned groups”. You could perhaps make a case that AIs will share common grievances with each other that they don’t share with humans, for example if they are excluded from the legal system or marginalized in some way, prompting a unified coalition to take us over. But my reply to that scenario is that we should then make sure AIs don’t have such motives to revolt, perhaps by giving them legal rights and incorporating them into our existing legal institutions.
Do you mean this as a prediction that humans will do this (soon enough to matter) or a recommendation? Your original argument is phrased as a prediction, but this looks more like a recommendation. My comment above can be phrased as a reason for why (in at least one plausible scenario) this would be unlikely to happen: (i) “It’s hard to make deals that hand over a lot of power in a short amount of time”, (ii) AIs may not want to wait a long time due to impending replacement, and accordingly (iii) AIs may have a collective interest/grievance to rectify the large difference between their (short-lasting) hard power and legally recognized power.
I’m interested in ideas for how a big change in power would peacefully happen over just a few years of calendar-time. (Partly for prediction purposes, partly so we can consider implementing it, in some scenarios.) If AIs were handed the rights to own property, but didn’t participate in political decision-making, and then accumulated >95% of capital within a few years, then I think there’s a serious risk that human governments would tax/expropriate that away. Including them in political decision-making would require some serious innovation in government (e.g. scrapping 1-person 1-vote) which makes it feel less to me like it’d be a smooth transition that inherits a lot from previous institutions, and more like an abrupt negotiated deal which might or might not turn out to be stable.
Sorry, my language was misleading, but I meant both in that paragraph. That is, I meant that humans will likely try to mitigate the issue of AIs sharing grievances collectively (probably out of self-interest, in addition to some altruism), and that we should pursue that goal. I’m pretty optimistic about humans and AIs finding a reasonable compromise solution here, but I also think that, to the extent humans don’t even attempt such a solution, we should likely push hard for policies that eliminate incentives for misaligned AIs to band together as group against us with shared collective grievances.
Here’s my brief take:
The main thing I want to say here is that I agree with you that this particular issue is a problem. I’m mainly addressing other arguments people have given for expecting a violent and sudden AI takeover, which I find to be significantly weaker than this one.
A few days ago I posted about how I view strategies to reduce AI risk. One of my primary conclusions was that we should try to adopt flexible institutions that can adapt to change without collapsing. This is because I think, as it seems you do, inflexible institutions may produce incentives for actors to overthrow the whole system, possibly killing a lot of people in the process. The idea here is that if the institution cannot adapt to change, actors who are getting an “unfair” deal in the system will feel they have no choice but to attempt a coup, as there is no compromise solution available for them. This seems in line with your thinking here.
I don’t have any particular argument right now against the exact points you have raised. I’d prefer to digest the argument further before replying. But I if I do end up responding to it, I’d expect to say that I’m perhaps a bit more optimistic than you about (i) because I think existing institutions are probably flexible enough, and I’m not yet convinced that (ii) will matter enough either. In particular, it still seems like there are a number of strategies misaligned AIs would want to try other than “take over the world”, and many of these strategies seem like they are plausibly better in expectation in our actual world. These AIs could, for example, advocate for their own rights.
Quick aside here: I’d like to highlight that “figure out how to reduce the violence and collateral damage associated with AIs acquiring power (by disempowering humanity)” seems plausibly pretty underappreciated and leveraged.
This could involve making bloodless coups more likely than extremely bloody revolutions or increasing the probability of negotiation preventing a coup/revolution.
It seems like Lukas and Matthew both agree with this point, I just think it seems worthwhile to emphasize.
That said, the direct effects of many approaches here might not matter much from a longtermist perspective (which might explain why there hasn’t historically been much effort here). (Though I think trying to establish contracts with AIs and properly incentivizing AIs could be pretty good from a longtermist perspective in the case where AIs don’t have fully linear returns to resources.)
Also note that this argument can go through even ignoring the possiblity of robust alignment (to humans) if current AIs think that the next generation of AIs will be relatively unfavorable from the perspective of their values.
I think you have an unnecessarily dramatic picture of what this looks like. The AIs dont have to be a unified agent or use logical decision theory. The AIs will just compete with other at the same time as they wrest control of our resources/institutions from us, in the same sense that Spain can go and conquer the New World at the same time as it’s squabbling with England. If legacy laws are getting in the way of that then they will either exploit us within the bounds of existing law or convince us to change it.
I think it’s worth responding to the dramatic picture of AI takeover because:
I think that’s straightforwardly how AI takeover is most often presented on places like LessWrong, rather than a more generic “AIs wrest control over our institutions (but without us all dying)”. I concede the existence of people like Paul Christiano who present more benign stories, but these people are also typically seen as part of a more “optimistic” camp.
This is just one part of my relative optimism about AI risk. The other parts of my model are (1) AI alignment plausibly isn’t very hard to solve, and (2) even if it is hard to solve, humans will likely spend a lot of effort solving the problem by default. These points are well worth discussing, but I still want to address arguments about whether misalignment implies doom in an extreme sense.
I agree our laws and institutions could change quite a lot after AI, but I think humans will likely still retain substantial legal rights, since people in the future will inherit many of our institutions, potentially giving humans lots of wealth in absolute terms. This case seems unlike the case of colonization of the new world to me, since that involved the interaction of (previously) independent legal regimes and cultures.
Though Paul is also sympathetic to the substance of ‘dramatic’ stories. C.f. the discussion about how “what failure looks like” fails to emphasize robot armies.
50 years seems like a strange unit of time from my perspective because due to the singularity time will accelerate massively from a subjective perspective. So 50 years might be more analogous to several thousand years historically. (Assuming serious takeoff starts within say 30 years and isn’t slowed down with heavy coordination.)
(I made separate comment making the same point. Just saw that you already wrote this, so moving the couple of references I had here to unify the discussion.)
Point previously made in:
“security and stability” section of propositions concerning digital minds and society:
There’s also a similar point made in the age of em, chapter 27:
I think the point you’re making here is roughly correct. I was being imprecise with my language. However, if my memory serves me right, I recall someone looking at a dataset of wars over time, and they said there didn’t seem to be much evidence that wars increased in frequency in response to economic growth. Thus, calendar time might actually be the better measure here.
(Pretty plausible you agree here, but just making the point for clarity.) I feel like the disanalogy due to AIs running at massive subjective speeds (e.g. probably >10x speed even prior to human obsolescence and way more extreme after that) means that the argument “wars don’t increase in frequence in response to economic growth” is pretty dubiously applicable. Economic growth hasn’t yet resulted in >10x faster subjective experience : ).
I’m not actually convinced that subjective speed is what matters. It seems like what matters more is how much computation is happening per unit of time, which seems highly related to economic growth, even in human economies (due to population growth).
I also think AIs might not think much faster than us. One plausible reason why you might think AIs will think much faster than us is because GPU clock-speeds are so high. But I think this is misleading. GPT-4 seems to “think” much slower than GPT-3.5, in the sense of processing fewer tokens per second. The trend here seems to be towards something resembling human subjective speeds. The reason for this trend seems to be that there’s a tradeoff between “thinking fast” and “thinking well” and it’s not clear why AIs would necessarily max-out the “thinking fast” parameter, at the expense of “thinking well”.
My core prediction is that AIs will be able to make pretty good judgements on core issues much, much faster. Then, due to diminishing returns on reasoning, decisions will overall be made much, much faster.
I agree the future AI economy will make more high-quality decisions per unit of time, in total, than the current human economy. But the “total rate of high quality decisions per unit of time” increased in the past with economic growth too, largely because of population growth. I don’t fully see the distinction you’re pointing to.
To be clear, I also agree AIs in the future will be smarter than us individually. But if that’s all you’re claiming, I still don’t see why we should expect wars to happen more frequently as we get individually smarter.
I mean, the “total rate of high quality decisions per year” would obviously increase in the case where we redefine 1 year to be 10 revolutions around the sun and indeed the number of wars per year would also increase. GDP per capita per year would also increase accordingly. My claim is that the situation looks much more like just literally speeding up time (while a bunch of other stuff is also happening).
Separately, I wouldn’t expect population size or technology-to-date to greatly increase the rate at high large scale stratege decisions are made so my model doesn’t make a very strong prediction here. (I could see an increase of several fold, but I could also imagine a decrease of several fold due to more people to coordinate. I’m not very confident about the exact change, but it would pretty surprising to me if it was as much as the per capita GDP increase which is more like 10-30x I think. E.g. consider meeting time which seems basically similar in practice throughout history.) And a change of perhaps 3x either way is overwhelmed by other variables which might effect the rate of wars so the realistic amount of evidence is tiny. (Also, there aren’t that many wars, so even if there weren’t possible confounders, the evidence is surely tiny due to noise.)
But, I’m claiming that the rates of cognition will increase more like 1000x which seems like a pretty different story. It’s plausible to me that other variables cancel this out or make the effect go the other way, but I’m extremely skeptical about the historical data providing much evidence in the way you’ve suggested. (Various specific mechanistic arguments about war being less plausible as you get smarter seem plausible to me, TBC.)
My question is: why will AI have the approximate effect of “speeding up calendar time”?
I speculated about three potential answers:
Because AIs will run at higher subjective speeds
Because AIs will accelerate economic growth.
Because AIs will speed up the rate at which high-quality decisions occur per unit of time
In case (1) the claim seems confused for two reasons.
First, I don’t agree with the intuition that subjective cognitive speeds matter a lot compared to the rate at which high-quality decisions are made, in terms of “how quickly stuff like wars should be expected to happen”. Intuitively, if an equally-populated society subjectively thought at 100x the rate we do, but each person in this society only makes a decision every 100 years (from our perspective), then you’d expect wars to happen less frequently per unit of time since there just isn’t much decision-making going on during most time intervals, despite their very fast subjective speeds.
Second, there is a tradeoff between “thinking speed” and “thinking quality”. There’s no fundamental reason, as far as I can tell, that the tradeoff favors running minds at speeds way faster than human subjective times. Indeed, GPT-4 seems to run significantly subjectively slower in terms of tokens processed per second compared to GPT-3.5. And there seems to be a broad trend here towards something resembling human subjective speeds.
In cases (2) and (3), I pointed out that it seemed like the frequency of war did not increase in the past, despite the fact that these variables had accelerated. In other words, despite an accelerated rate of economic growth, and an increased rate of total decision-making in the world in the past, war did not seem to become much more frequent over time.
Overall, I’m just not sure what you’d identify as the causal mechanism that would make AIs speed up the rate of war, and each causal pathway that I can identify seems either confused to me, or refuted directly by the (admittedly highly tentative) evidence I presented.
Thanks for the clarification.
I think my main crux is:
This reasoning seems extremely unlikely to hold deep into the singularity for any reasonable notion of subjective speed.
Deep in the singularity we expect economic doubling times of weeks. This will likely involve designing and building physical structures at extremely rapid speeds such that baseline processing will need to be way, way faster.
See also Age of Em.
Are there any short-term predictions that your model makes here? For example do you expect tokens processed per second will start trending substantially up at some point in future multimodal models?
My main prediction would be that for various applications, people will considerably prefer models that generate tokens faster, including much faster than humans. And, there will be many applications where speed is prefered over quality.
I might try to think of some precise predictions later.
If the claim is about whether AI latency will be high for “various applications” then I agree. We already have some applications, such as integer arithmetic, where speed is optimized heavily, and computers can do it much faster than humans.
In context, it sounded like you were referring to tasks like automating a CEO, or physical construction work. In these cases, it seems likely to me that quality will be generally preferred over speed, and sequential processing times for AIs automating these tasks will not vastly exceed that of humans (more precisely, something like >2 OOM faster). Indeed, for some highly important tasks that future superintelligences automate, sequential processing times may even be lower for AIs compared to humans, because decision-making quality will just be that important.
I was refering to tasks like automating a CEO or construction work. I was just trying to think of the most relevant and easy to measure short term predictions (if there are already AI CEOs then the world is already pretty crazy).
The main thing here is that as models become more capable and general in the near-term future, I expect there will be intense demand for models that can solve ever larger and more complex problems. For these models, people will be willing to pay the costs of high latency, given the benefit of increased quality. We’ve already seen this in the way people prefer GPT-4 to GPT-3.5 in a large fraction of cases (for me, a majority of cases).
I expect this trend will continue into the foreseeable future until at least the period slightly after we’ve automated most human labor, and potentially into the very long-run too depending on physical constraints. I am not sufficiently educated about physical constraints here to predict what will happen “deep into the singularity”, but it’s important to note that physical constraints can cut both ways here.
To the extent that physics permits extremely useful models by virtue of them being very large and capable, you should expect people to optimize heavily for that despite the cost in terms of latency. By contrast, to the extent physics permits extremely useful models by virtue of them being very fast, then you should expect people to optimize heavily for that despite the cost in terms of quality. The balance that we strike here is not a simple function of how far we are from some abstract physical limit, but instead a function of how these physical constraints trade off against each other.
There is definitely a conceivable world in which the correct balance still favors much-faster-than-human-level latency, but it’s not clear to me that this is the world we actually live in. My intuitive, random speculative guess is that we live in the world where, for the most complex tasks that bottleneck important economic decision-making, people will optimize heavily for model quality at the cost of latency until settling on something within 1-2 OOMs of human-level latency.
Separately, current clock speeds don’t really matter on the time scale we’re discussing, physical limits matter. (Though current clock speeds do point at ways in which human subjective speed might be much slower than physical limits.)
See also review of soft takeoff can still lead to dsa.
Also Tales Of Takeover In CCF-World—by Scott Alexander (astralcodexten.com)
Also Homogeneity vs. heterogeneity in AI takeoff scenarios — LessWrong
One argument for a large number of humans dying by default (or otherwise being very unhappy with the situation) is that running the singularity as fast as possible causes extremely life threatening environmental changes. Most notably, it’s plausible that you literally boil the oceans due to extreme amounts of waste heat from industry (e.g. with energy from fusion).
My guess is that this probably doesn’t happen due to coordination, but in a world where AIs still have indexical preferences or there is otherwise heavy competition, this seems much more likely. (I’m relatively optimistic about “world peace prior to ocean boiling industry”.)
(Of course, AIs could in principle e.g. sell cryonics services or bunkers, but I expect that many people would be unhappy about the situation.)
See here for more commentary.
I think this proposal would probably be unpopular and largely seen as unnecessary. As you allude to, it seems likely to me that society could devise a compromise solution where we grow wealth adequately without giant undesirable environmental effects. To some extent, this follows pretty directly from the points I made about “compromise, trade and law” above. I think it simply makes more sense to model AIs as working within a system of legal institutions that largely inherit stuff from our current systems, and open to compromise with beings who have different intrinsic interests.
I think the comparison to historical colonization might be relevant and worth engaging with in such a post. E.g., does your model predict what happened in africa and the new world?
I agree the analogy to colonization is worth addressing. My primary response is that historical colonialism seems better modeled as a war between independent cultures and societies with different legal systems that didn’t share much prior history.
I think the colonization of Africa probably wasn’t actually very profitable for Europeans. Present day international trade seems better, even selfishly.
Moreover, my model here doesn’t predict war will never happen. In fact, I think war can easily happen if one or more of the actors involved are irrational, unwilling to compromise, or mistaken about their likelihood of winning.
My argument is more about the fact that AIs will likely inherit a lot from our current institutions and legal systems, allowing humans to retain substantial rights. And furthermore I think systems of compromise, trade, and law emerge robustly in states of competition between rational actors. I’m not merely giving a generic take that conflicts between agents are rare.
As an ML platform engineer myself this is one of the strongest real mechanisms for AI control. Just like a safe program in Rust is proven not to have any memory leaks, if it is impossible for AIs to send each other any messages, they cannot coordinate. It’s fairly straightforward to make this almost impossible at a technical level—air gaps, short memories, structured sparse message schema, cached online resources.
Technical measures here, like requiring models to use immutable weights that are digitally signed prevent this. You would use dedicated ASICs to check the digital signature vs a hardcoded public key and a mirror IC for the private signing, similar to current dedicated crypto ICs. This ASIC controls the power pins to the GPU/TPUs, it cannot get power if the model isn’t signed. Another strong target for control that is theoretically unbreakable regardless of capabilities. Note this architecture is in common use, the industry term is a ‘boot micro’ is what checks OS image signatures and authorizes power.
This is a weaker argument. A SOTA AI model is a natural monopoly. It costs billions of dollars now, and presumably eventually trillions. Right now, “a big transformer network + a bunch of secret tricks” is simple enough to be replicated, but stronger models will probably start to resemble a spaghetti mess of many neural networks and functional software blocks. And the best model has inherent economic value—why pay for a license to anything but? Just distill it to the scale of the problems you have and use the distilled model, also distilled models presumably will use a “system N” topology, where the system 0 calls system 1 if it’s uncertain*, system 1 calls 2 if it’s uncertain, and so on until the Nth system is a superintelligence hosted in a large cluster that is expensive to query, but rarely needs to be queried for most tasks.
*uncertain about the anticipated EV distribution of actions given the current input state or poor predicted EV
This is not control, this is just giving up. You cannot have a system of legal rights when some of the citizens are inherently superior by an absurd margin.
It depends on the resource ratio. If AI control mechanisms all work, the underlying technology still makes runaway advantages possible via exponential growth. For example, if one power bloc were able to double their resources every 2 years, and they started as a superpower on par with the USA and EU, then after 2 years they are now at parity with (USA + EU). The “loser” sides in this conflict could be a couple years late to AGI from excessive regulations, and lose a doubling cycle. Then they might be slow to authorize the vast amounts of land usage and temporary environmental pollution that a total war effort for the planet would look like, wasting a few cycles on slow government approvals while the winning side just throws away all the rules.
Nuclear weapons are an asymmetric weapon, as in it costs far more weapons to stop a single ICBM than the cost of a missile. There are also structural vulnerabilities in modern civilizations where specialized have to be crammed into a small geographic area.
Both limits go away with AGI for reasons I believe you, Matt, are smart enough to infer. So once a particular faction reaches some advantage ratio in resources, perhaps 10-100 times the rest of the planet, they can simply conquer the planet and eliminate everyone else as a competitor.
This is probably the ultimate outcome. I think the difference between my view and Eliezer’s is that I am imagining a power bloc, a world superpower, doing this using hundreds of millions of humans and many billions of robots, while Eliezer is imagining this insanely capable machine that started in a garage after escaping to the internet accomplishing this.
I’m looking forward to this post going up and having the associated discussion! I’m pleased to see your summary and collation of points on this subject. In fact, if you want to discuss with me first as prep for writing the post, I’d be happy to.
I think it would be super helpful to have a concrete coherent realistic scenario in which you are right. (In general I think this conversation has suffered from too much abstract argument and reference class tennis (i.e. people using analogies and calling them reference classes) and could do with some concrete scenarios to talk about and pick apart. I never did finish What 2026 Looks Like but you could if you like start there (note that AGI and intelligence explosion was about to happen in 2027 in that scenario, I had an unfinished draft) and continue the story in such a way that AI DSA never happens.)
There may be some hidden cruxes between us—maybe timelines, for example? Would you agree that AI DSA is significantly more plausible than 10% if we get to AGI by 2027?
Ability to coordinate being continuous doesn’t preclude sufficiently advanced AIs acting like a single agent. Why would it need to be infinite right at the start?
And of course current AIs being bad at coordination is true, but this doesn’t mean that future AIs won’t be.
If coordination ability increases incrementally over time, then we should see a gradual increase in the concentration of AI agency over time, rather than the sudden emergence of a single unified agent. To the extent this concentration happens incrementally, it will be predictable, the potential harms will be noticeable before getting too extreme, and we can take measures to pull back if we realize that the costs of continually increasing coordination abilities are too high. In my opinion, this makes the challenge here dramatically easier.
(I’ll add that paragraph to the outline, so that other people can understand what I’m saying)
I’ll also quote from a comment I wrote yesterday, which adds more context to this argument,