Biological humans and the rising tide of AI
The Hanson-Yudkowsky AI-Foom Debate focused on whether AI progress is winner-take-all. But even if it isn’t, humans might still fare badly.
Suppose Robin is right. Instead of one basement project going foom, AI progresses slowly as many organizations share ideas with each other, leading to peaceful economic growth worldwide—a rising tide of AI. (I’m including uploads in that.)
With time, keeping biological humans alive will become a less and less profitable use of resources compared to other uses. Robin says humans can still thrive by owning a lot of resources, as long as property rights prevent AIs from taking resources by force.
But how long is that? Recall the displacement of nomadic civilizations by farming ones (which happened by force, not farmers buying land from nomads) or enclosure in England (which also happened by force). When potential gains in efficiency become large enough, property rights get trampled.
Robin argues that it won’t happen, because it would lead to a slippery slope of AIs fighting each other for resources. But the potential gains from that are smaller, like a landowner trying to use enclosure on another landowner. And most such gains can be achieved by AIs sharing improvements, which is impossible with humans. So AIs won’t be worried about that slippery slope, and will happily take our resources by force.
Maybe humans owning resources could upload themselves and live off rent, instead of staying biological? But even uploaded humans might be very inefficient users of resources (e.g. due to having too many neurons) compared to optimized AIs, so the result is the same.
Instead of hoping that institutions like property rights will protect us, we should assume that everything about the future, including institutions, will be determined by the values of AIs. To achieve our values, working on AI alignment is necessary, whether we face a “basement foom” or “rising tide” scenario.
- Factorio, Accelerando, Empathizing with Empires and Moderate Takeoffs by 4 Feb 2018 2:33 UTC; 51 points) (
- 25 Oct 2018 14:09 UTC; 23 points) 's comment on What will the long-term future of employment look like? by (
- 5 Apr 2023 20:44 UTC; 5 points) 's comment on Best arguments against instrumental convergence? by (
I briefly discuss non-winner-take-all AI-takeover scenarios in sections 4.1.2 − 4.2. of Disjunctive Scenarios of Catastrophic AI Risk.
Personally I would emphasize that even if AIs did respect the law, well, if they got enough power, what would prevent them from changing the law? Besides, humans get tricked into making totally legal deals that go against their interests, all the time.
I discuss the 4.1.2. scenario in a bit more detail in this post.
Also of note is section 5.2.1. of my paper, economic incentives to turn power over to AI systems:
Thank you Kaj! I agree with pretty much all of that. You don’t quite say what happens to humans when AIs outcompete them, but it’s easy enough to read between the lines and end up with my post :-)
Has Robin ever claimed property rights will never get trampled? My impression is that he’s only saying it can be avoided in the time period he’s trying to analyze.
The strongest statement of Robin’s position I can find is this post:
I think my post works as a counterpoint, even for the time period Robin is analyzing.
I’ve got a bit more time now.
I agree “Things need to be done” in a rising tide scenario. However different things need to be done to the foom scenario. The distribution of AI safety knowledge is different in an important way.
Discovering ai alignment is not enough in the rising tide scenario. You want to make sure the proportion of aligned AIs vs misaligned AIs is sufficient to stop the misaligned AIs outcompeting the aligned AIs. There will be some misaligned AIs due to parts wear, experiments gone wrong, AIs aligned with insane people that are not sufficiently aligned with the rest of humanity to allow negotiation/discussion.
The biggest risk is around the beginning. Everyone will be enthusiastic to play around with AGI. If they don’t have good knowledge of alignment (because it has been a secret project) then they may not know how it should work and how it should be used safely. They may also buy AGI products from people that haven’t done there due diligence in making sure their product is aligned.
It might be that it requires special hardware for alignment (e.g there is the equivalent of spectre that needs to be fixed in current architectures to enable safe AI), then there is the risk of the software getting out and being run on emulators that don’t fix the alignment problem. Then you might get lots of misaligned AGIs.
In this scenario you need lots of things that are antithetical to the strategy of fooming AGI, of keeping things secret and hoping that a single group brings it home. You need a well educated populace/international community, regulation of computer hardware and AGI vendors (preferably before AGI hits). All that kind of stuff.
Knowing whether we are fooming or not is pretty important. The same strategy does not work for both. IMO.
I have a neat idea. If there were two comparable AGIs, they would effectively merge into one, even if they have unaligned goals. To be more precise, they should model how a conflict between them would turn out and then figure out a kind of contract that reaches a similar outcome without wasting the resources for a real conflict. Of course, if they are not comparable, then the stronger one could just devour the weaker one.
Yeah, that’s a good idea. It was proposed a decade ago by Wei Dai and Tim Freeman on the SL4 mailing list and got some discussion in various places. Some starting points are this SL4 post or this LW post, though the discussion kinda diverges. Here’s my current view:
1) Any conflict can be decomposed into bargaining (which Pareto optimal outcome do we want to achieve?) and enforcement (how do we achieve that outcome without anyone cheating?)
2) Bargaining is hard. We tried and failed many times to find a “fair” way to choose among Pareto optimal outcomes. The hardest part is nailing down the difference between bargaining and extortion.
3) Assuming we have some solution to bargaining, enforcement is easy enough for AIs. Most imaginable mechanisms for enforcement, like source code inspection, lead to the same set of outcomes.
4) The simplest way to think about enforcement is two AIs jointly building a new AI and passing all resources to it. If the two original AIs were Bayesian-rational and had utility functions U1 and U2, the new one should also be Bayesian-rational and have a utility function that’s a weighted sum of U1 and U2. This generalizes to any number of AIs.
5) The only subtlety is that weights shouldn’t be set by bargaining, as you might think. Instead, bargaining should determine some probability distribution over weights, then one sample from that distribution should be used as the actual weights. Think of it as flipping a coin to break ties between U1 and u2. That’s necessary to deal with flat Pareto frontiers, like the divide-the-dollar game.
At first I proposed this math as a solution to another problem by Stuart (handling meta-uncertainty about which utility function you have), but it works for AI merging too.
Ah, but my idea is different! It’s not just that these two AIs will physically merge. I claim that two AIs that are able to communicate are already indistinguishable from one AI with a different utility function. I reject the entire concept of meaningfully counting AIs.
There is a trivial idea that two humans together form a kind of single agent. This agent is not a human (there are too many conditions for being a human), and it might not be very smart (if the humans’ goals don’t align).
Now consider the same idea for two superintelligent AIs. I claim that the “combined” mind is also superintelligent, and it acts as though its utility function was a combination of the two initial utility functions. There are only complications from the possibly distributed physical architecture of the AI.
To take it even further, I claim that given any two AIs called A and B, if they together would choose strategy S, then there also exists a single AI called M(A,B), that would also choose strategy S. If we take the paperclip and staple maximizers, they might physically merge (or they might just randomly destroy one of them?). Now I claim that there is another single AI, with a slightly funky but reasonable architecture, which would be rewarded both for 60% staples and for 60% paperclips, and that this AI would choose to construct a new AI with a more coherent utility function (or it would choose to self modify to make its own utility coherent).
Also, thank you for digging for the old threads. It’s frustrating that there is so much out there that I would never know to even look for.
Edit: damn, I think the second link basically has the same idea as well.
I think if you carefully read everything in these links and let it stew for a bit, you’ll get something like my approach.
More generally, having ideas is great but don’t stop there! Always take the next step, make things slightly more precise, push a little bit past the point where you have everything figured out. That way you’re almost guaranteed to find new territory soon enough. I have an old post about that.
Yes, secrecy is a bad idea in a rising tide scenario. But I don’t think it’s a good idea in a winner-take-all scenario either! I argued against it for years and like to think I swayed a few people.
This article from AlexMennen has some relevant discussion and links.
Thank you! I was trying to give an econ-centric counterargument to Robin’s claim, but AI-centric strategic thinking (of which I’ve read a lot) is valuable too.
(moved to main post)
If development of capabilities proceeds gradually, then development of values will. The market will select AIs that are obedient, not self-actualising, not overly literal and so on. Why would they jump to being utility monsters without a jump in capability?
The market doesn’t solve alignment. Firms have always acted callously toward people who don’t matter to the bottom line. AI will simply lead to most people ending up in that group.
Market behaviour is a known quantity, with, up to a point, known fixes. introducing gradually improving AI is not going to change the game.
I am not convinced AIs will avoid fighting each other for resources. If they are not based on human minds as WBE, then we have less reason to expect they will value the preservation of themselves or other agents. If they are based on human minds, we have lots of good reasons to expect that they will value things above self-preservation. I am not aware of any mechanisms that would preclude a Thucydides’ Trap style scenario from taking place.
It also seems highly likely that AIs will be employed for enforcing property rights, so even in the case where bandit-AIs prefer to target humans, conflict with some type of AI seems likely in a rising tide scenario.
Yeah. I was trying to show that humans don’t fare well by default even in a peaceful “rising tide” scenario, but in truth there will probably be more conflict, where AIs protecting humans don’t necessarily win.
I didn’t know that!
I do think there is a difference in strategy though still. In the foom scenario you want to keep small the number of key players or people that might become key players.
In the non-foom you have the unhappy compromise between trying to avoid too many accidents and building up defense early vs practically everyone in time being a key player and needing to know how to handle AGI.
What do you mean by “winner-take-all”. Aren’t we generally assuming that most AI scenarios are “everyone-loses” and “universe-tiled-in-paperclips”? Is this post assuming that alignment is solved, but only in a weak way that still alows the AI to hurt people if it really wants to? I wish the starting assumptions were stated clearly somewhere
The starting assumptions for my post are roughly the same as Robin’s assumptions in the debate.
And what are they? Are they listed in the pdf you linked to? Can you point to relevant pages, at least vaguely? I’d rather not read all of it.
Also, are these assumptions reasonable, whatever they are? Do you understand why I question them in my first comment?
The pdf has a summary written by Kaj on pages 505-561, but my advice is to just read the whole thing. That way you learn not just the positions (which I think are reasonable), but also the responses to many objections. It’s a good overview that gets you close to the frontier of thinking about this topic.