This story reminds me of a Twitter debate between Yud and D’Angelo (NOTE: this is from 6 MONTHS AGO and it is a snapshot of their thinking from a specific point in time):
Adam D’Angelo:
What are the strongest arguments against the possibility of an outcome where strong AI is widely accessible but there is a “balance of power” between different AI systems (or humans empowered by AI) that enables the law to be enforced and otherwise maintains stability?
Eliezer Yudkowsky:
That superintelligences that can eg do logical handshakes with each other or coordinate on building mutually trusted cognitive systems, form a natural coalition that excludes mammals. So there’s a balance of power among them, but not with us in it.
Adam:
You could argue a similar thing about lawyers, that prosecutors and defense lawyers speak the same jargon and have more of a repeated game going than citizens they represent. And yet we have a system that mostly works.
Even if they could (and assuming AGI could) they wouldn’t want to; it would be worse for them than keeping the rest of humanity alive, and also against their values. So I wouldn’t expect them to.
Yud:
I agree that many lawyers wouldn’t want to exterminate humanity, but building at least one AGI like that is indeed the alignment problem; failing that, an AGI coalition has no instrumental interest in protecting us.
Adam:
Can you remind us again of the apparently obvious logic that the default behavior for an AGI is to want to exterminate us?
Yud:
1: You don’t want humans building other SIs that could compete with you for resources. 2: You may want to do large-scale stuff that eg builds lots of fusion plants and boils the oceans as a heatsink for an early burst of computation. 3: You might directly use those atoms.
Adam:
For 1, seems much easier to just stop humans from doing that than to exterminate them all. For 2, if you have that kind of power it’s probably easy to preserve humanity. For 3, I have trouble seeing a shortage of atoms as being a bottleneck to anything.
David Xu:
1: With a large enough power disparity, the easiest way to “stop” an opponent from doing something is to make them stop existing entirely. 2: Easier still to get rid of them, as per 1. 3: It’s not a bottleneck; but you’d still rather have those atoms than not have them.
Adam
1: Humans aren’t a single opponent. If an ant gets into my house I might kill it but I don’t waste my time trying to kill all the ants outside. 2: This assumes no value placed on humans, which I think is unlikely 3: But it takes energy, which likely is a bottleneck
Yud:
If you have literally any noticeable value placed on humans living happily ever after, it’s a trivial cost to upload their mind-states as you kill their bodies, and run them on a basketball-sized computer somewhere in some galaxy, experiencing a million years of glorious transhumanist future over the course of a few days—modulo that it’s not possible for them to have real children—before you shut down and repurpose the basketball.
We do not know how to make any AGI with any goal that is maximally satisfied by doing that rather than something else. Mostly because we don’t know how to build an AGI that ends up with any precise preference period, but also because it’s not trivial to specify a utility function whose maximum falls there rather than somewhere else. If we could pull off that kind of hat trick, even to the extent of getting a million years of subjective afterlife over the course of a few days, we could just as easily get all the galaxies in reach for sentient life.
You could argue a similar thing about lawyers, that prosecutors and defense lawyers speak the same jargon and have more of a repeated game going than citizens they represent. And yet we have a system that mostly works.
Uh… what? If we stretch the definition of “lawyer” a bit to mean “anyone who carries out, enforces, or whose livelihood primarily depends on the law”—that is, we include government agents, cops, soldiers, and so on… yes they absolutely totally can? (In the same sense that anyone can drench their own house with gasoline and burn it down.) But maybe that’s only a weird tangent—although I’d argue that some of the power dynamics that fall out of that are likely disquietingly similar.
Strong upvoted because I liked the ending.
This story reminds me of a Twitter debate between Yud and D’Angelo (NOTE: this is from 6 MONTHS AGO and it is a snapshot of their thinking from a specific point in time):
Adam D’Angelo:
What are the strongest arguments against the possibility of an outcome where strong AI is widely accessible but there is a “balance of power” between different AI systems (or humans empowered by AI) that enables the law to be enforced and otherwise maintains stability?
Eliezer Yudkowsky:
That superintelligences that can eg do logical handshakes with each other or coordinate on building mutually trusted cognitive systems, form a natural coalition that excludes mammals. So there’s a balance of power among them, but not with us in it.
Adam:
You could argue a similar thing about lawyers, that prosecutors and defense lawyers speak the same jargon and have more of a repeated game going than citizens they represent. And yet we have a system that mostly works.
Yud:
Lawyers combined cannot casually exterminate nonlawyers.
Adam:
Even if they could (and assuming AGI could) they wouldn’t want to; it would be worse for them than keeping the rest of humanity alive, and also against their values. So I wouldn’t expect them to.
Yud:
I agree that many lawyers wouldn’t want to exterminate humanity, but building at least one AGI like that is indeed the alignment problem; failing that, an AGI coalition has no instrumental interest in protecting us.
Adam:
Can you remind us again of the apparently obvious logic that the default behavior for an AGI is to want to exterminate us?
Yud:
1: You don’t want humans building other SIs that could compete with you for resources.
2: You may want to do large-scale stuff that eg builds lots of fusion plants and boils the oceans as a heatsink for an early burst of computation.
3: You might directly use those atoms.
Adam:
For 1, seems much easier to just stop humans from doing that than to exterminate them all.
For 2, if you have that kind of power it’s probably easy to preserve humanity.
For 3, I have trouble seeing a shortage of atoms as being a bottleneck to anything.
David Xu:
1: With a large enough power disparity, the easiest way to “stop” an opponent from doing something is to make them stop existing entirely.
2: Easier still to get rid of them, as per 1.
3: It’s not a bottleneck; but you’d still rather have those atoms than not have them.
Adam
1: Humans aren’t a single opponent. If an ant gets into my house I might kill it but I don’t waste my time trying to kill all the ants outside.
2: This assumes no value placed on humans, which I think is unlikely
3: But it takes energy, which likely is a bottleneck
Yud:
If you have literally any noticeable value placed on humans living happily ever after, it’s a trivial cost to upload their mind-states as you kill their bodies, and run them on a basketball-sized computer somewhere in some galaxy, experiencing a million years of glorious transhumanist future over the course of a few days—modulo that it’s not possible for them to have real children—before you shut down and repurpose the basketball.
We do not know how to make any AGI with any goal that is maximally satisfied by doing that rather than something else. Mostly because we don’t know how to build an AGI that ends up with any precise preference period, but also because it’s not trivial to specify a utility function whose maximum falls there rather than somewhere else. If we could pull off that kind of hat trick, even to the extent of getting a million years of subjective afterlife over the course of a few days, we could just as easily get all the galaxies in reach for sentient life.
Uh… what? If we stretch the definition of “lawyer” a bit to mean “anyone who carries out, enforces, or whose livelihood primarily depends on the law”—that is, we include government agents, cops, soldiers, and so on… yes they absolutely totally can? (In the same sense that anyone can drench their own house with gasoline and burn it down.) But maybe that’s only a weird tangent—although I’d argue that some of the power dynamics that fall out of that are likely disquietingly similar.