Steven Byrnes comments on High-level hopes for AI alignment

Steven Byrnes 15 Dec 2022 20:13 UTC
3 points
1
The orthogonality thesis is about final goals, not instrumental goals. I think what you’re getting at is the following hypothesis:
“For almost any final goal that an AGI might have, it is instrumentally useful for the AGI to not wipe out humanity, because without humanity keeping the power grid on (etc. etc.), the AGI would not be able to survive and get things done. Therefore we should not expect AGIs to wipe out humanity.”.
See for example 1, 2, 3 making that point.
Some counterpoints to that argument would be:
- If that claim is true now, and for early AGIs, well it won’t be true forever. Once we have AGIs that are faster and cheaper and more insightful than humans, they will displace humans over more and more of the economy, and relatedly robots will rapidly spread throughout the economy, etc. And when the AGIs are finally in a position to wipe out humans, they will, if they’re misaligned. See for example this Paul Christiano post.
- Even if that claim is true, it doesn’t rule out AGI takeover. It just means that the AGI would take over in a way that doesn’t involve killing all the humans. By the same token, when Stalin took over Russia, he couldn’t run the Russian economy all by himself, and therefore he didn’t literally kill everyone, but he still had a great deal of control. Now imagine that Stalin could live forever, while gradually distributing more and more clones of himself in more and more positions of power, and then replace “Russia” with “everywhere”.
- Maybe the claim is not true even for early AGIs. For example I was disagreeing with it here. A lot depends on things like how far recursive self-improvement can go, whether human-level human-speed AGI can run on 1 xbox GPU versus a datacenter of 10,000 high-end GPUs, and various questions like that. I would specifically push back on the relevance of “If you start pulling strings on ‘how much of the global economy needs to operate in order to keep a data center functioning’, you end up with a huge portion of the global economy”. There are decisions that trade off between self-sufficiency and convenience / price, and everybody will choose convenience / price. So you can’t figure out the minimal economy that would theoretically support chip production by just looking at the actual economy; you need to think more creatively, just like a person stuck on a desert island will think of resourceful solutions. By analogy, an insanely massive infrastructure across the globe supports my consumption of a chocolate bar for snack just now, but you can’t conclude from that observation that there’s no way for me to have a snack if that massive global infrastructure didn’t exist. I could instead grow grapes in my garden or whatever. Thus, I claim that there are much more expensive and time-consuming ways to get energy and chips and robot parts, that require much less infrastructure and manpower, e.g. e-beam lithography instead of EUV photolithography, and that after an AGI has wiped out humanity, it might well be able to survive indefinitely and gradually work its way up to a civilization of trillions of its clones colonizing the galaxy, starting with these more artisanal solutions, scavenged supplies, and whatever else. At the very least, I think we should have some uncertainty here.