I think the objection is a good one that “if the AI was really aligned with one agent, it’d figure out a way to help them avoid multipolar traps”.
My reply is that I’m worried that avoiding races-to-the-bottom will continue to be hard, especially since competition operates on so many levels.
Part of the objection is in avoiding multipolar traps, but there is also a more basic story like:
Humans own capital/influence.
They use this influence to serve their own interests and have an (aligned) AI system which faithfully represents their interests.
Given that AIs can make high quality representation very cheap, the AI representation is very good and granular. Thus, something like the strategy-stealing assumption can hold and we might expect that humans end up with the same expected fraction of captial/influence they started with (at least to the extent they are interested in saving rather than consumption).
Even without any coordination, this can potentially work OK. There are objections to the strategy-stealing assumption, but none of these seem existential if we get to a point where everyone has wildly superintelligent and aligned AI representatives and we’ve ensured humans are physically robust to offense dominant technologies like bioweapons.
(I’m optimistic about being robust to bioweapons within a year or two of having wildly superhuman AIs, though we might run into huge issues during this transitional period… Regardless, bioweapons deployed by terrorists or as part of a power grab in a brief transitional period doesn’t seem like the threat model you’re describing.)
I expect some issues with races-to-the-bottom / negative sum dynamics / negative externalities like:
By default, increased industry on earth shortly after the creation of very powerful AI will result in boiling the oceans (via fusion power). If you don’t participate in this industry, you might be substantially outcompeted by others[1]. However, I don’t think it will be that expensive to protect humans through this period, especially if you’re willing to use strategies like converting people into emulated minds. Thus, this doesn’t seem at all likely to be literally existential. (I’m also optimistic about coordination here.)
There might be one time shifts in power between humans via mechanisms like states becoming more powerful. But, ultimately these states will be controlled by humans or appointed successors of humans if alignment isn’t an issue. Mechanisms like competing over the quantity of bribery are zero sum as they just change the distribution of power and this can be priced in as a one time shift even without coordination to race to the bottom on bribes.
But, this still doesn’t seem to cause issues with humans retaining control via their AI representatives? Perhaps the distribution of power between humans is problematic and may be extremely unequal and the biosphere will physically be mostly destroyed (though humans will survive), but I thought you were making stronger claims.
Edit in response to your edit: If we align the AI to some arbitrary target which is seriously misaligned with humanity as a whole (due to infighting or other issues), I agree this can cause existential problems.
(I think I should read the paper in more detail before engaging more than this!)
It’s unclear if boiling the oceans would result in substantial acceleration. This depends on how quickly you can develop industry in space and dyson sphere style structures. I’d guess the speed up is much less than a year.
Curious what you think of these arguments, which offer objections to the strategy stealing assumption in this setting, instead arguing that it’s difficult for capital owners to maintain their share of capital ownership as the economy grows and technology changes.
Thanks for this. Discussions of things like “one time shifts in power between humans via mechanisms like states becoming more powerful” and personal AI representatives is exactly the sort of thing I’d like to hear more about. I’m happy to have finally found someone who has something substantial to say about this transition!
But over the last 2 years I asked a lot of people at the major labs about for any kind of details about a positive post-AGI future and almost no one had put anywhere close to as much thought into it as you have, and no one mentioned the things above. Most people clearly hadn’t put much thought into it at all. If anyone at the labs had much more of plan than “we’ll solve alignment while avoiding an arms race”, I managed to fail to even hear about its existence despite many conversations, including with founders.
The closest thing to a plan was Sam Bowman’s checklist: https://sleepinyourhat.github.io/checklist/ which is exactly the sort of thing I was hoping for, except it’s almost silent on issues of power, the state, and the role of post-AGI humans.
If you have any more related reading for the main “things might go OK” plan in your eyes, I’m all ears.
Yeah, people at labs are generally not thoughtful about AI futurism IMO, though of course most people aren’t thoughtful about AI futurism. And labs don’t really have plans IMO. (TBC, I think careful futurism is hard, hard to check, and not clearly that useful given realistic levels of uncertainty.)
If you have any more related reading for the main “things might go OK” plan in your eyes, I’m all ears.
I don’t have a ready to go list. You might be interested in this post and comments responding to it, though I’d note I disagree substantially with the post.
Part of the objection is in avoiding multipolar traps, but there is also a more basic story like:
Humans own capital/influence.
They use this influence to serve their own interests and have an (aligned) AI system which faithfully represents their interests.
Given that AIs can make high quality representation very cheap, the AI representation is very good and granular. Thus, something like the strategy-stealing assumption can hold and we might expect that humans end up with the same expected fraction of captial/influence they started with (at least to the extent they are interested in saving rather than consumption).
Even without any coordination, this can potentially work OK. There are objections to the strategy-stealing assumption, but none of these seem existential if we get to a point where everyone has wildly superintelligent and aligned AI representatives and we’ve ensured humans are physically robust to offense dominant technologies like bioweapons.
(I’m optimistic about being robust to bioweapons within a year or two of having wildly superhuman AIs, though we might run into huge issues during this transitional period… Regardless, bioweapons deployed by terrorists or as part of a power grab in a brief transitional period doesn’t seem like the threat model you’re describing.)
I expect some issues with races-to-the-bottom / negative sum dynamics / negative externalities like:
By default, increased industry on earth shortly after the creation of very powerful AI will result in boiling the oceans (via fusion power). If you don’t participate in this industry, you might be substantially outcompeted by others[1]. However, I don’t think it will be that expensive to protect humans through this period, especially if you’re willing to use strategies like converting people into emulated minds. Thus, this doesn’t seem at all likely to be literally existential. (I’m also optimistic about coordination here.)
There might be one time shifts in power between humans via mechanisms like states becoming more powerful. But, ultimately these states will be controlled by humans or appointed successors of humans if alignment isn’t an issue. Mechanisms like competing over the quantity of bribery are zero sum as they just change the distribution of power and this can be priced in as a one time shift even without coordination to race to the bottom on bribes.
But, this still doesn’t seem to cause issues with humans retaining control via their AI representatives? Perhaps the distribution of power between humans is problematic and may be extremely unequal and the biosphere will physically be mostly destroyed (though humans will survive), but I thought you were making stronger claims.
Edit in response to your edit: If we align the AI to some arbitrary target which is seriously misaligned with humanity as a whole (due to infighting or other issues), I agree this can cause existential problems.
(I think I should read the paper in more detail before engaging more than this!)
It’s unclear if boiling the oceans would result in substantial acceleration. This depends on how quickly you can develop industry in space and dyson sphere style structures. I’d guess the speed up is much less than a year.
Curious what you think of these arguments, which offer objections to the strategy stealing assumption in this setting, instead arguing that it’s difficult for capital owners to maintain their share of capital ownership as the economy grows and technology changes.
Thanks for this. Discussions of things like “one time shifts in power between humans via mechanisms like states becoming more powerful” and personal AI representatives is exactly the sort of thing I’d like to hear more about. I’m happy to have finally found someone who has something substantial to say about this transition!
But over the last 2 years I asked a lot of people at the major labs about for any kind of details about a positive post-AGI future and almost no one had put anywhere close to as much thought into it as you have, and no one mentioned the things above. Most people clearly hadn’t put much thought into it at all. If anyone at the labs had much more of plan than “we’ll solve alignment while avoiding an arms race”, I managed to fail to even hear about its existence despite many conversations, including with founders.
The closest thing to a plan was Sam Bowman’s checklist:
https://sleepinyourhat.github.io/checklist/
which is exactly the sort of thing I was hoping for, except it’s almost silent on issues of power, the state, and the role of post-AGI humans.
If you have any more related reading for the main “things might go OK” plan in your eyes, I’m all ears.
Yeah, people at labs are generally not thoughtful about AI futurism IMO, though of course most people aren’t thoughtful about AI futurism. And labs don’t really have plans IMO. (TBC, I think careful futurism is hard, hard to check, and not clearly that useful given realistic levels of uncertainty.)
I don’t have a ready to go list. You might be interested in this post and comments responding to it, though I’d note I disagree substantially with the post.