I am unsurprised but disappointed to read the same Catastrophe arguments rehashed here, based on an outdated Bostromian paradigm of AGI. This is the main section I disagree with.
The underlying principle beneath these hypothetical scenarios is grounded in what we can observe around us: powerful entities control weaker ones, and weaker ones can fight back only to the degree that the more powerful entity isn’t all that powerful after all.
I do not think this is obvious or true at all. Nation-States are often controlled by a small group of people or even a single person, no different physiologically to any other human being. If it really wanted to, there would be nothing at all stopping the US military from launching a coup on its civilian government; in fact, military coups are a commonplace global event. Yet, generally, most countries do not suffer constant coup attempts. We hold far fewer tools to “align” military leaders than we do AI models—we cannot control how generals were raised as children, cannot read their minds, cannot edit their minds.
I think you could also make a similar argument that big things control little things—with much more momentum and potential energy, we observe that large objects are dominant over small objects. Small objects can only push large objects to the extent that the large object is made of a material that is not very dense. Surely, then building vehicles substantially larger than people would result in uncontrollable runaways that would threaten human life and property! But in reality, runaway dump truck incidents are fairly uncommon. A tiny man can control a giant machine. Not all men can—only the one in the cockpit.
My point is that it is not at all obvious that a powerful AI would lack such a cockpit. If its goals are oriented around protecting or giving control to a set of individuals, I see no reason whatsoever why it would do a 180 and kill its commander, especially since the AI systems that we can build in practice are more than capable of understanding the nuances of their commands.
The odds of an average chess player with an ELO of 1200 against a grandmaster with ELO 2500 are 1 to a million. Against the best chess AI today with an ELO of 3600, the odds are essential 0.
Chess is a system that’s perfectly predictable. Reality is a chaotic system. Chaotic systems—like a three-body orbital arrangement—are impossible to perfectly predict in all cases even if they’re totally deterministic, because even minute inaccuracies in measurement can completely change the result. One example would be the edges of the Mandelbrot set. It’s fractal. Therefore, even an extremely powerful AI would be beholden to certain probabilistic barriers, notwithstanding quantum-random factors.
Many assume that an AI is only dangerous if it has hostile intentions, but the danger of godlike AI is not a matter of its intent, but its power and autonomy. As these systems become increasingly agentic and powerful, they will pursue goals that will diverge from our own.
It would not be incorrect to describe someone who pursued their goals irrespective of its externalities to be malevolent. Bank robbers don’t want to hurt people, they want money. Yet I don’t think anyone would suggest that the North Hollywood shooters were “non-hostile but misaligned”. I do not like this common snippet of rhetoric and I think it is dishonest. It attempts to distance these fears of misaligned AI from movie characters such as Skynet, but ultimately, this is the picture that is painted.
Goal divergence is a hallmark of the Bostromian paradigm—the idea that a misspecified utility function, optimized hypercompetently, would lead to disaster. Modern AI systems do not behave like this. They behave in a much more humanlike way. They do not have objective functions that they pursue doggedly. The Orthogonality Thesis states that intelligence is uncorrelated with objectives. The unstated connection here, I think, is that their initial goals must have been misaligned in the first place, but stated like this, it sounds a little like you expect a superintelligent AI to suddenly diverge from its instructions for no reason at all.
Overall, this is a very vague section. I think you would benefit from explaining some of the assumptions being made here.
I’m not going to go into detail on the Alignment section, but I think that many of its issues are similar to the ones listed above. I think that the arguments are not compelling enough for lay people, mostly because I don’t think they’re correct. I think that the definition of Alignment you have given—“the ability to “steer AI systems toward a person’s or group’s intended goals, preferences, and ethical principles.””—does not match the treatment it is given. I think that it is obvious that the scope of Alignment is too vague, broad, and unverifiable for it to be a useful concept. I think that Richard Ngo’s post:
is a good summary of the issues I see with the current idea of Alignment as it is often used in Rationalist circles and how it could be adapted to suit the world in which we find ourselves.
Finally, I think that the Governance section could very well be read uncharitably as a manifesto for world domination. Less than a dozen people attend PauseAI protests; you do not have the political ability to make this happen. The ideas contained in this document, which resemble many other documents, such as a similar one created by the PauseAI group, are not compelling enough to sway people who are not already believers in its ideas, and the Rationalist language used in them is anathemic to the largest ideological groups that would otherwise support your cause.
You may receive praise from Rationalist circles, but I do not think you will reach a large audience with this type of work. Leopold Aschenbrenner’s essay managed to reach a fairly substantial audience, and it has similar themes to your document, so in principle, people are willing to read this sort of writing. The main flaw is that it doesn’t add anything to the conversation, and because of that, it won’t change anyone’s minds. The reason that the public discourse doesn’t involve Alignment talk isn’t due to lack of awareness, it’s because it isn’t at all compelling to most people. Writing it better, with a nicer format, will not change this.
If it really wanted to, there would be nothing at all stopping the US military from launching a coup on its civilian government.
There are enormous hurdles preventing the U.S. military from overthrowing the civilian government.
The confusion in your statement is caused by blocking up all the members of the armed forces in the term “U.S. military”. Principally, a coup is an act of coordination. Any given faction or person in the U.S. military would have a difficult time organizing the forces necessary without being stopped by civilian or military law enforcement first, and then maintaining control of their civilian government afterwards without the legitimacy of democratic governance.
In general, “more powerful entities control weaker entities” is a constant. If you see something else, your eyes are probably betraying you.
All military organizations are structured around the principal of its leaders being able to give orders to people subservient to them. War is a massive coordination problem and being able to get soldiers to do what you want is the primary one among them. I mean to say that high ranking generals could issue such a coup, not that every service member would spontaneously decide to perform one. This can and does happen, so I think your blanket statement on the impossibility of Juntas is void.
I mean to say that high ranking generals could issue such a coup
Yes, and by “any given faction or person in the U.S. military” I mean to say that high ranking generals inside the United States cannot form a coup. They literally cannot successfully give the order to storm the capitol. Their inferiors, understanding that:
The order is illegal
The order would have to be followed by the rest of their division in order to have a chance of success
The order would be almost guaranteed to fail in its broader objective even if they manage to seize the FBI headquarters or whatever
That others around them are also making the same calculation and will also probably be unwilling to follow the order
Would report their superiors to military law enforcement instead. This is obvious if you take even a moment to put your shoes in any of the parties involved. Our generals inside the U.S. military also realize this themselves and so do not attempt to perform coups, even though I’m certain there are many people inside the white house with large ‘nominal’ control over U.S. forces who would love to be dictator.
I think your blanket statement on the impossibility of Juntas is void.
I made no such blanket statement. In different countries the odds and incentives facing each of these parties are different. For example, if you live in a South American country with a history of successful military overthrows, you might have a much greater fear your superior will succeed, and so you might be more scared of him than the civilian government. This is part (though not all) of the reason why some countries are continually stable and others are continually unstable.
Yes, I agree that the US military is one example of a particularly well-aligned institution. I think my point about the alignment problem being analogous to military coup risk is still valid and that similar principles could be used to explore the AI alignment problem; military members control weaponry that no civil agency can match or defeat, in most countries.
There are enormous hurdles preventing the U.S. military from overthrowing the civilian government.
The confusion in your statement is caused by blocking up all the members of the armed forces in the term “U.S. military”. Principally, a coup is an act of coordination.
Is it your contention that similar constraints will not apply to AIs?
When people talk about how “the AI” will launch a coup in the future, I think they’re making essentially the same mistake you talk about here. They’re treating a potentially vast group of AI entities — like a billion copies of GPT-7 — as if they form a single, unified force, all working seamlessly toward one objective, as a monolithic agent. But just like with your description of human affairs, this view overlooks the coordination challenges that would naturally arise among such a massive number of entities. They’re imagining these AIs could bypass the complex logistics of organizing a coup, evading detection, and maintaining control after launching a war without facing any relevant obstacles or costs, even though humans routinely face these challenges amongst ourselves.
In these discussions, I think there’s an implicit assumption that AIs would automatically operate outside the usual norms, laws, and social constraints that govern social behavior. The idea is that all the ordinary rules of society will simply stop applying, because we’re talking about AIs.
Yet I think this simple idea is basically wrong, for essentially the same reasons you identified for human institutions.
Of course, AIs will be different in numerous ways from humans, and AIs will eventually be far smarter and more competent than humans. This matters. Because AIs will be very capable, it makes sense to think that artificial minds will one day hold the majority of wealth, power, and social status in our world. But these facts alone don’t show that the usual constraints that prevent coups and revolutions will simply go away. Just because AIs are smart doesn’t mean they’ll necessarily use force and violently revolt to achieve their goals. Just like humans, they’ll probably have other avenues available for pursuing their objectives.
To respond to this comment, I’ll give a view on why I think the answer to coordination might be easier for AIs than for people, and also explain why AI invention likely breaks a lot of the social rules we are used to.
For example, one big difference I think that impacts coordination for AIs is that an AI model is likely to be able to copy itself millions of times, given current inference scaling, and in particular you can distribute fine-tunes to those millions as though they were a single unit.
This is a huge change for coordination, because humans can’t copy themselves into millions of humans that share very similar values just by getting more compute, say.
Merging might also be much easier, and it is easier to merge and split two pieces of data of an AI than it is to staple two human brains.
These alone let you coordinate to an extent we haven’t really seen in history, such that it makes more sense to treat the millions or billions of AI instances as 1 unified agent than it is to treat a nation as 1 unified agent.
To answer this question:
In these discussions, I think there’s an implicit assumption that AIs would automatically operate outside the usual norms, laws, and social constraints that govern social behavior. The idea is that all the ordinary rules of society will simply stop applying, because we’re talking about AIs.
While this argument is indeed invalid if that was all there was to it, there is an actual reason why the current rules of society mostly stop working with AIs, because of one big issue:
Human economic labor no longer is very valuable, because labor is cheap compared to capital, and can even have negative economic value due to not being able to work with AIs due to being bottlenecks.
When this happens, you can’t rely on the property that the best way to make yourself well off is to make others well off, and indeed the opposite is the case if we assume that their labor is net-negative economic value.
The basic reason for this is that if your labor has 0 or negative economic value, then your value likely comes from your land and capital, and there is 0 disincentive, and at least a weak incentive to steal your capital and land to fuel their growth.
In essence, you can’t assume that violent stealing of property is not incentivized, and a lot of the foundations of comparative advantage and our society don’t work when you allow workers that are duplicable and very low cost.
This means if you survive and still have property, it will be because of alignment to your values, not economic reasons, because you cannot exclude bad outcomes like stealing property through violence via economics anymore.
(This is also why Ricardian comparative advantage won’t apply. If the AI side has a choice of trading with humans for something, vs. spending the same resources on building AIs to produce the same thing cheaper, then the latter option is more profitable. So after a certain point in capability development, the only thing AIs and AI companies will want from us is our resources, like land; not our labor. The best analogy is enclosures in England.)
Consider a scenario in which AGI and human-equivalent robotics are developed and end up owned (via e.g. controlling exclusively the infrastructure that runs it, and being closed source) by a group of, say, 10,000 people overall who have some share in this automation capital. If these people have exclusive access to it, a perfectly functional equilibrium is “they trade among peers goods produced by their automated workers and leave everyone else to fend for themselves”.
To address the human enhancement point: I agree that humans will likely be cognitively and physically enhanced to a level and pace of change that is genuinely ludicrously big compared to the pre-AI automation era.
However, there are 2 problems that arise here:
1, Most people that do work today do so because it’s necessary to have a life, not for reasons like intrinsically liking work, so by default in an AI automation future where a company can choose an AI over a human, and the human’s not necessary for AI to go well, I’d predict 80-90%+ humans would voluntarily remove themselves from the job market over the course of at most 10-20 years.
2. Unless humans mass upload and copy, which is absolutely possible but also plausibly harder than just having AIs for work, the coordination costs for humans would be a big barrier, because it’s way easier for AIs to productively coordinate than humans due to sharing basically the same weights, combined with very similar values due to copy/pasting 1 AI being quite likely as a strategy to fulfill millions of jobs.
To be clear, I’m not stating that humans will remain unchanged, they will change rapidly. Just not as fast as AI changes.
Finally, one large reason on why human laws become mostly irrelevant is that if you have AIs that are able to serve in robotic armies, and do automated work, it becomes far too easy to either slowly change the laws such that people are ultimately closer to pets in status, or to do revolts, and critically once AI controls robotic armies and does all of the economic work, then any social system that the human controlling the AI, or the AI itself opposes is very easy to destroy/remove.
I don’t think coordinating a billion copies of GPT-7 is at all what the worried tend to worry about. We worry about a single agent based on GPT-7 self-improving until it can take over singlehanded- perhaps with copies it made itself specifically optimized for coordination, perhaps sticking to only less intelligent servant agents. The alternative is also a possible route to disaster, but I think things would go off the rails far before then. You’re in good if minority company in worrying about slower and more law-abiding takeovers; Christiano’s stance on doom seems to place most of the odds of disaster in these scenarios, for instance; but I don’t understand why other of you see it as so likely that we partway solve the alignment problem but don’t use that to prevent them from slowly progressive outcompeting humans. It seems like an unlikely combination of technical success and societal idiocy. Although to be fair, when I phrase it that way, it does sound kind of like our species MO :)
On your other contention that AI will probably follow norms and laws, constraining takeover attempts like coups are constrained: I agree that some of the same constraints may apply, but that is little comfort. It’s technically correct that AIs would probably use whatever avenue is available, including nonviolent and legal ones, to accomplish their goals (and potentially disempower humans).
Assuming AIs will follow norms, laws, and social constraints even when ignoring them would work better is assuming we’ve almost completely solved alignment. If that happens, great, but that is a technical objective we’re working toward, not an outcome we can assume when thinking about AI safety. LLM do have powerful norm-following habits; this will be a huge help in achieving alignment if they form the core of AGI, but it does not entirely solve the problem.
I have wondered in response to similar statements you’ve made in the past: are you including the observation that human history is chock full of people ignoring norms, laws, and social constraints when they think they can get away with it? I see our current state of civilization as a remarkable achievement that is fragile and must be carefully protected against seismic shifts in power balances, including AGI but also with other potential destabilizing factors of the sort that have brought down governments and social orders in the past.
In sum, if you’re arguing that AGI won’t necessarily violently take over right away, I agree. If you’re arguing that it wouldn’t do that if it had the chance, I think that is an entirely technical question of whether we’ve succeeded adequately at alignment.
Is it your contention that similar constraints will not apply to AIs?
Similar constraints may apply to AIs unless one gets much smarter much more quickly, as you say. But even if those AIs create a nice civilian government to govern interactions with each other, those AIs will have any reason to respect our rights unless some of them care about us more than we care about stray dogs or cats.
Similar constraints may apply to AIs unless one gets much smarter much more quickly, as you say.
I do think that AIs will eventually get much smarter than humans, and this implies that artificial minds will likely capture the majority of wealth and power in the world in the future. However, I don’t think the way that we get to that state will necessarily be because the AIs staged a coup. I find more lawful and smooth transitions more likely.
There are alternative means of accumulating power than taking everything by force. AIs could get rights and then work within our existing systems to achieve their objectives. Our institutions could continuously evolve with increasing AI presence, becoming more directed by AIs with time.
What I’m objecting to is the inevitability of a sudden collapse when “the AI” decides to take over in an untimely coup. I’m proposing that there could just be a smoother, albeit rapid transition to a post-AGI world. Our institutions and laws could simply adjust to incorporate AIs into the system, rather than being obliterated by surprise once the AIs coordinate an all-out assault.
In this scenario, human influence will decline, eventually quite far. Perhaps this soon takes us all the way to the situation you described in which humans will become like stray dogs or cats in our current world: utterly at the whim of more powerful beings who do not share their desires.
However, I think that scenario is only one possibility. Another possibility is that humans could enhance their own cognition to better keep up with the world. After all, we’re talking about a scenario in which AIs are rapidly advancing technology and science. Could humans not share in some of that prosperity?
One more possibility is that, unlike cats and dogs, humans could continue to communicate legibly with the AIs and stay relevant for reasons of legal and cultural tradition, as well as some forms of trade. Our current institutions didn’t descend from institutions constructed by stray cats and dogs. There was no stray animal civilization that we inherited our laws and traditions from. But perhaps if our institutions did originate in this way, then cats and dogs would hold a higher position in our society.
I do think that AIs will eventually get much smarter than humans, and this implies that artificial minds will likely capture the majority of wealth and power in the world in the future. However, I don’t think the way that we get to that state will necessarily be because the AIs staged a coup. I find more lawful and smooth transitions more likely.
I think my writing was ambiguous. My comment was supposed to read “similar constraints may apply to AIs unless one (AI) gets much smarter (than other AIs) much more quickly, as you say.” I was trying to say the same thing.
My original point was also not actually that we will face an abrupt transition or AI coup, I was just objecting to the specific example Meme Machine gave.
I am unsurprised but disappointed to read the same Catastrophe arguments rehashed here, based on an outdated Bostromian paradigm of AGI. This is the main section I disagree with.
I do not think this is obvious or true at all. Nation-States are often controlled by a small group of people or even a single person, no different physiologically to any other human being. If it really wanted to, there would be nothing at all stopping the US military from launching a coup on its civilian government; in fact, military coups are a commonplace global event. Yet, generally, most countries do not suffer constant coup attempts. We hold far fewer tools to “align” military leaders than we do AI models—we cannot control how generals were raised as children, cannot read their minds, cannot edit their minds.
I think you could also make a similar argument that big things control little things—with much more momentum and potential energy, we observe that large objects are dominant over small objects. Small objects can only push large objects to the extent that the large object is made of a material that is not very dense. Surely, then building vehicles substantially larger than people would result in uncontrollable runaways that would threaten human life and property! But in reality, runaway dump truck incidents are fairly uncommon. A tiny man can control a giant machine. Not all men can—only the one in the cockpit.
My point is that it is not at all obvious that a powerful AI would lack such a cockpit. If its goals are oriented around protecting or giving control to a set of individuals, I see no reason whatsoever why it would do a 180 and kill its commander, especially since the AI systems that we can build in practice are more than capable of understanding the nuances of their commands.
Chess is a system that’s perfectly predictable. Reality is a chaotic system. Chaotic systems—like a three-body orbital arrangement—are impossible to perfectly predict in all cases even if they’re totally deterministic, because even minute inaccuracies in measurement can completely change the result. One example would be the edges of the Mandelbrot set. It’s fractal. Therefore, even an extremely powerful AI would be beholden to certain probabilistic barriers, notwithstanding quantum-random factors.
It would not be incorrect to describe someone who pursued their goals irrespective of its externalities to be malevolent. Bank robbers don’t want to hurt people, they want money. Yet I don’t think anyone would suggest that the North Hollywood shooters were “non-hostile but misaligned”. I do not like this common snippet of rhetoric and I think it is dishonest. It attempts to distance these fears of misaligned AI from movie characters such as Skynet, but ultimately, this is the picture that is painted.
Goal divergence is a hallmark of the Bostromian paradigm—the idea that a misspecified utility function, optimized hypercompetently, would lead to disaster. Modern AI systems do not behave like this. They behave in a much more humanlike way. They do not have objective functions that they pursue doggedly. The Orthogonality Thesis states that intelligence is uncorrelated with objectives. The unstated connection here, I think, is that their initial goals must have been misaligned in the first place, but stated like this, it sounds a little like you expect a superintelligent AI to suddenly diverge from its instructions for no reason at all.
Overall, this is a very vague section. I think you would benefit from explaining some of the assumptions being made here.
I’m not going to go into detail on the Alignment section, but I think that many of its issues are similar to the ones listed above. I think that the arguments are not compelling enough for lay people, mostly because I don’t think they’re correct. I think that the definition of Alignment you have given—“the ability to “steer AI systems toward a person’s or group’s intended goals, preferences, and ethical principles.””—does not match the treatment it is given. I think that it is obvious that the scope of Alignment is too vague, broad, and unverifiable for it to be a useful concept. I think that Richard Ngo’s post:
https://www.lesswrong.com/posts/67fNBeHrjdrZZNDDK/defining-alignment-research
is a good summary of the issues I see with the current idea of Alignment as it is often used in Rationalist circles and how it could be adapted to suit the world in which we find ourselves.
Finally, I think that the Governance section could very well be read uncharitably as a manifesto for world domination. Less than a dozen people attend PauseAI protests; you do not have the political ability to make this happen. The ideas contained in this document, which resemble many other documents, such as a similar one created by the PauseAI group, are not compelling enough to sway people who are not already believers in its ideas, and the Rationalist language used in them is anathemic to the largest ideological groups that would otherwise support your cause.
You may receive praise from Rationalist circles, but I do not think you will reach a large audience with this type of work. Leopold Aschenbrenner’s essay managed to reach a fairly substantial audience, and it has similar themes to your document, so in principle, people are willing to read this sort of writing. The main flaw is that it doesn’t add anything to the conversation, and because of that, it won’t change anyone’s minds. The reason that the public discourse doesn’t involve Alignment talk isn’t due to lack of awareness, it’s because it isn’t at all compelling to most people. Writing it better, with a nicer format, will not change this.
There are enormous hurdles preventing the U.S. military from overthrowing the civilian government.
The confusion in your statement is caused by blocking up all the members of the armed forces in the term “U.S. military”. Principally, a coup is an act of coordination. Any given faction or person in the U.S. military would have a difficult time organizing the forces necessary without being stopped by civilian or military law enforcement first, and then maintaining control of their civilian government afterwards without the legitimacy of democratic governance.
In general, “more powerful entities control weaker entities” is a constant. If you see something else, your eyes are probably betraying you.
All military organizations are structured around the principal of its leaders being able to give orders to people subservient to them. War is a massive coordination problem and being able to get soldiers to do what you want is the primary one among them. I mean to say that high ranking generals could issue such a coup, not that every service member would spontaneously decide to perform one. This can and does happen, so I think your blanket statement on the impossibility of Juntas is void.
Yes, and by “any given faction or person in the U.S. military” I mean to say that high ranking generals inside the United States cannot form a coup. They literally cannot successfully give the order to storm the capitol. Their inferiors, understanding that:
The order is illegal
The order would have to be followed by the rest of their division in order to have a chance of success
The order would be almost guaranteed to fail in its broader objective even if they manage to seize the FBI headquarters or whatever
That others around them are also making the same calculation and will also probably be unwilling to follow the order
Would report their superiors to military law enforcement instead. This is obvious if you take even a moment to put your shoes in any of the parties involved. Our generals inside the U.S. military also realize this themselves and so do not attempt to perform coups, even though I’m certain there are many people inside the white house with large ‘nominal’ control over U.S. forces who would love to be dictator.
I made no such blanket statement. In different countries the odds and incentives facing each of these parties are different. For example, if you live in a South American country with a history of successful military overthrows, you might have a much greater fear your superior will succeed, and so you might be more scared of him than the civilian government. This is part (though not all) of the reason why some countries are continually stable and others are continually unstable.
Yes, I agree that the US military is one example of a particularly well-aligned institution. I think my point about the alignment problem being analogous to military coup risk is still valid and that similar principles could be used to explore the AI alignment problem; military members control weaponry that no civil agency can match or defeat, in most countries.
Is it your contention that similar constraints will not apply to AIs?
When people talk about how “the AI” will launch a coup in the future, I think they’re making essentially the same mistake you talk about here. They’re treating a potentially vast group of AI entities — like a billion copies of GPT-7 — as if they form a single, unified force, all working seamlessly toward one objective, as a monolithic agent. But just like with your description of human affairs, this view overlooks the coordination challenges that would naturally arise among such a massive number of entities. They’re imagining these AIs could bypass the complex logistics of organizing a coup, evading detection, and maintaining control after launching a war without facing any relevant obstacles or costs, even though humans routinely face these challenges amongst ourselves.
In these discussions, I think there’s an implicit assumption that AIs would automatically operate outside the usual norms, laws, and social constraints that govern social behavior. The idea is that all the ordinary rules of society will simply stop applying, because we’re talking about AIs.
Yet I think this simple idea is basically wrong, for essentially the same reasons you identified for human institutions.
Of course, AIs will be different in numerous ways from humans, and AIs will eventually be far smarter and more competent than humans. This matters. Because AIs will be very capable, it makes sense to think that artificial minds will one day hold the majority of wealth, power, and social status in our world. But these facts alone don’t show that the usual constraints that prevent coups and revolutions will simply go away. Just because AIs are smart doesn’t mean they’ll necessarily use force and violently revolt to achieve their goals. Just like humans, they’ll probably have other avenues available for pursuing their objectives.
To respond to this comment, I’ll give a view on why I think the answer to coordination might be easier for AIs than for people, and also explain why AI invention likely breaks a lot of the social rules we are used to.
For example, one big difference I think that impacts coordination for AIs is that an AI model is likely to be able to copy itself millions of times, given current inference scaling, and in particular you can distribute fine-tunes to those millions as though they were a single unit.
This is a huge change for coordination, because humans can’t copy themselves into millions of humans that share very similar values just by getting more compute, say.
Merging might also be much easier, and it is easier to merge and split two pieces of data of an AI than it is to staple two human brains.
These alone let you coordinate to an extent we haven’t really seen in history, such that it makes more sense to treat the millions or billions of AI instances as 1 unified agent than it is to treat a nation as 1 unified agent.
To answer this question:
While this argument is indeed invalid if that was all there was to it, there is an actual reason why the current rules of society mostly stop working with AIs, because of one big issue:
Human economic labor no longer is very valuable, because labor is cheap compared to capital, and can even have negative economic value due to not being able to work with AIs due to being bottlenecks.
When this happens, you can’t rely on the property that the best way to make yourself well off is to make others well off, and indeed the opposite is the case if we assume that their labor is net-negative economic value.
The basic reason for this is that if your labor has 0 or negative economic value, then your value likely comes from your land and capital, and there is 0 disincentive, and at least a weak incentive to steal your capital and land to fuel their growth.
In essence, you can’t assume that violent stealing of property is not incentivized, and a lot of the foundations of comparative advantage and our society don’t work when you allow workers that are duplicable and very low cost.
This means if you survive and still have property, it will be because of alignment to your values, not economic reasons, because you cannot exclude bad outcomes like stealing property through violence via economics anymore.
I like these comments on the subject:
https://www.lesswrong.com/posts/2ujT9renJwdrcBqcE/the-benevolence-of-the-butcher
https://www.lesswrong.com/posts/2ujT9renJwdrcBqcE/the-benevolence-of-the-butcher#BJk8XgpsHEF6mjXNE
To address the human enhancement point: I agree that humans will likely be cognitively and physically enhanced to a level and pace of change that is genuinely ludicrously big compared to the pre-AI automation era.
However, there are 2 problems that arise here:
1, Most people that do work today do so because it’s necessary to have a life, not for reasons like intrinsically liking work, so by default in an AI automation future where a company can choose an AI over a human, and the human’s not necessary for AI to go well, I’d predict 80-90%+ humans would voluntarily remove themselves from the job market over the course of at most 10-20 years.
2. Unless humans mass upload and copy, which is absolutely possible but also plausibly harder than just having AIs for work, the coordination costs for humans would be a big barrier, because it’s way easier for AIs to productively coordinate than humans due to sharing basically the same weights, combined with very similar values due to copy/pasting 1 AI being quite likely as a strategy to fulfill millions of jobs.
To be clear, I’m not stating that humans will remain unchanged, they will change rapidly. Just not as fast as AI changes.
Finally, one large reason on why human laws become mostly irrelevant is that if you have AIs that are able to serve in robotic armies, and do automated work, it becomes far too easy to either slowly change the laws such that people are ultimately closer to pets in status, or to do revolts, and critically once AI controls robotic armies and does all of the economic work, then any social system that the human controlling the AI, or the AI itself opposes is very easy to destroy/remove.
I don’t think coordinating a billion copies of GPT-7 is at all what the worried tend to worry about. We worry about a single agent based on GPT-7 self-improving until it can take over singlehanded- perhaps with copies it made itself specifically optimized for coordination, perhaps sticking to only less intelligent servant agents. The alternative is also a possible route to disaster, but I think things would go off the rails far before then. You’re in good if minority company in worrying about slower and more law-abiding takeovers; Christiano’s stance on doom seems to place most of the odds of disaster in these scenarios, for instance; but I don’t understand why other of you see it as so likely that we partway solve the alignment problem but don’t use that to prevent them from slowly progressive outcompeting humans. It seems like an unlikely combination of technical success and societal idiocy. Although to be fair, when I phrase it that way, it does sound kind of like our species MO :)
On your other contention that AI will probably follow norms and laws, constraining takeover attempts like coups are constrained: I agree that some of the same constraints may apply, but that is little comfort. It’s technically correct that AIs would probably use whatever avenue is available, including nonviolent and legal ones, to accomplish their goals (and potentially disempower humans).
Assuming AIs will follow norms, laws, and social constraints even when ignoring them would work better is assuming we’ve almost completely solved alignment. If that happens, great, but that is a technical objective we’re working toward, not an outcome we can assume when thinking about AI safety. LLM do have powerful norm-following habits; this will be a huge help in achieving alignment if they form the core of AGI, but it does not entirely solve the problem.
I have wondered in response to similar statements you’ve made in the past: are you including the observation that human history is chock full of people ignoring norms, laws, and social constraints when they think they can get away with it? I see our current state of civilization as a remarkable achievement that is fragile and must be carefully protected against seismic shifts in power balances, including AGI but also with other potential destabilizing factors of the sort that have brought down governments and social orders in the past.
In sum, if you’re arguing that AGI won’t necessarily violently take over right away, I agree. If you’re arguing that it wouldn’t do that if it had the chance, I think that is an entirely technical question of whether we’ve succeeded adequately at alignment.
Similar constraints may apply to AIs unless one gets much smarter much more quickly, as you say. But even if those AIs create a nice civilian government to govern interactions with each other, those AIs will have any reason to respect our rights unless some of them care about us more than we care about stray dogs or cats.
I do think that AIs will eventually get much smarter than humans, and this implies that artificial minds will likely capture the majority of wealth and power in the world in the future. However, I don’t think the way that we get to that state will necessarily be because the AIs staged a coup. I find more lawful and smooth transitions more likely.
There are alternative means of accumulating power than taking everything by force. AIs could get rights and then work within our existing systems to achieve their objectives. Our institutions could continuously evolve with increasing AI presence, becoming more directed by AIs with time.
What I’m objecting to is the inevitability of a sudden collapse when “the AI” decides to take over in an untimely coup. I’m proposing that there could just be a smoother, albeit rapid transition to a post-AGI world. Our institutions and laws could simply adjust to incorporate AIs into the system, rather than being obliterated by surprise once the AIs coordinate an all-out assault.
In this scenario, human influence will decline, eventually quite far. Perhaps this soon takes us all the way to the situation you described in which humans will become like stray dogs or cats in our current world: utterly at the whim of more powerful beings who do not share their desires.
However, I think that scenario is only one possibility. Another possibility is that humans could enhance their own cognition to better keep up with the world. After all, we’re talking about a scenario in which AIs are rapidly advancing technology and science. Could humans not share in some of that prosperity?
One more possibility is that, unlike cats and dogs, humans could continue to communicate legibly with the AIs and stay relevant for reasons of legal and cultural tradition, as well as some forms of trade. Our current institutions didn’t descend from institutions constructed by stray cats and dogs. There was no stray animal civilization that we inherited our laws and traditions from. But perhaps if our institutions did originate in this way, then cats and dogs would hold a higher position in our society.
I think my writing was ambiguous. My comment was supposed to read “similar constraints may apply to AIs unless one (AI) gets much smarter (than other AIs) much more quickly, as you say.” I was trying to say the same thing.
My original point was also not actually that we will face an abrupt transition or AI coup, I was just objecting to the specific example Meme Machine gave.
Strong-upvoted, this is precisely the kind of feedback that seems helpful for making the document better.