I love the genre of “Katja takes an AI risk analogy way more seriously than other people and makes long lists of ways the analogous thing could work.” (the previous post in the genre being the classic “Beyond fire alarms: freeing the groupstuck.”)
Digging into the implications of this post:
In sum, for AI systems to be to humans as we are to ants, would be for us to be able to do many tasks better than AI, and for the AI systems to be willing to pay us grandly for them, but for them to be unable to tell us this, or even to warn us to get out of the way. Is this what AI will be like? No. AI will be able to communicate with us, though at some point we will be less useful to AI systems than ants could be to us if they could communicate.
I’m curious how much you think the arguments in this post should affect our expectations of AI-human relations overall? At its core, my concern is:
sure, the AI will definitely trade with useful human organizations / institutions when it’s weak (~human-level),
and it might trade with them a decent amount when it’s strong but not world-defeating (~human-organization-level)
eventually AI will be human-civilization-level, and probably soon after that it’s building Dyson spheres and stuff. Why trade with humanity then? Do we have a comparative advantage, or are we just a waste of atoms?
I can think of a few reasons that human-AI trade might matter for the end-state:
We can bargain for the future while AIs are relatively weak. i.e., when humans have stuff AI wants, they can trade the stuff for an assurance that when the AI is strong, it’ll give us .000001% of the universe.
This requires both leverage (to increase the share we get) and verification / trust (so the AI keeps its promise). If we have a lot of verification ability, though, we could also just try to build safe AIs?
(related: this Nate post saying we can’t assume unaligned AIs will cooperate / trade with us, unless we can model them well enough to distinguish a true commitment from a lie. See “Objection: But what if we have something to bargain with?”)
It seems possible that an AI built by humans and trained on human-ish content ends up with some sentimental desire for “authentic” human goods & services.
In order for this to end up good for the humans, we’d want the AI to value this pretty highly (so we get more stuff), and have a concept of “authenticity” that means that it doesn’t torture / lobotomize us to get what it wants.
This is mostly by analogy to humans buying “authentic” products of other, poorer humans, but there’s a spectrum between goods/services, and something more like “the pleasure of seeing something exist” a la zoo animals.
(goofy attempt at illustrating the spectrum: a woven shawl vs a performed monologue vs reality tv vs a zoo.)
So a simpler, perhaps more likely, version of the ‘desirable authentic labor’ possibility is a ‘human zoo’, where the AI just likes having “authentic” humans around. which is not very tradelike. But maybe the best bad AI case we could hope for is something like this—Earth left as a ‘human zoo’ while the AI takes over the rest of the lightcone.
In general, this post has prompted me to think more about the transition period between AI that’s weaker than humans and stronger than all of human civilization, and that’s been interesting! A lot of people assume that that takeoff will happen very quickly, but if it lasts for multiple years (or even decades) then the dynamics of that transition period could matter a lot, and trade is one aspect of that.
some stray thoughts on what that transition period could look like:
Some doomy-feeling states don’t immediately kill us. We might get an AI that’s able to defeat humanity before it’s able to cheaply replicate lots of human labor, because it gets a decisive strategic advantage via specialized skill in some random domain and can’t easily skill itself up in other domains.
When would an AI prefer to trade rather than coerce or steal?
maybe if the transition period is slow, and it knows it’s in the earlier part of the period, so reputation matters
maybe if it’s being cleverly watched or trained by the org building it, since they want to avoid bad press
maybe there’s some core of values you can imprint that leads to this? but maybe actually being able to solve this issue is basically equivalent to solving alignment, in which case you might as well do that.
In a transition period, powerful human orgs would find various ways to interface with AI and vice versa, since they would be super useful tools / partners for each other. Even if the transition period is short, it might be long enough to change things, e.g. by getting the world’s most powerful actors interested in building + using AI and not leaving it in the hands of a few AGI labs, by favoring labs that build especially good interfaces & especially valuable services, etc. (While in a world with a short take off rather than a long transition period, maybe big tech & governments don’t recognize what’s happening before ASI / doom.)
I love the genre of “Katja takes an AI risk analogy way more seriously than other people and makes long lists of ways the analogous thing could work.” (the previous post in the genre being the classic “Beyond fire alarms: freeing the groupstuck.”)
Digging into the implications of this post:
I’m curious how much you think the arguments in this post should affect our expectations of AI-human relations overall? At its core, my concern is:
sure, the AI will definitely trade with useful human organizations / institutions when it’s weak (~human-level),
and it might trade with them a decent amount when it’s strong but not world-defeating (~human-organization-level)
eventually AI will be human-civilization-level, and probably soon after that it’s building Dyson spheres and stuff. Why trade with humanity then? Do we have a comparative advantage, or are we just a waste of atoms?
I can think of a few reasons that human-AI trade might matter for the end-state:
We can bargain for the future while AIs are relatively weak. i.e., when humans have stuff AI wants, they can trade the stuff for an assurance that when the AI is strong, it’ll give us .000001% of the universe.
This requires both leverage (to increase the share we get) and verification / trust (so the AI keeps its promise). If we have a lot of verification ability, though, we could also just try to build safe AIs?
(related: this Nate post saying we can’t assume unaligned AIs will cooperate / trade with us, unless we can model them well enough to distinguish a true commitment from a lie. See “Objection: But what if we have something to bargain with?”)
It seems possible that an AI built by humans and trained on human-ish content ends up with some sentimental desire for “authentic” human goods & services.
In order for this to end up good for the humans, we’d want the AI to value this pretty highly (so we get more stuff), and have a concept of “authenticity” that means that it doesn’t torture / lobotomize us to get what it wants.
This is mostly by analogy to humans buying “authentic” products of other, poorer humans, but there’s a spectrum between goods/services, and something more like “the pleasure of seeing something exist” a la zoo animals.
(goofy attempt at illustrating the spectrum: a woven shawl vs a performed monologue vs reality tv vs a zoo.)
So a simpler, perhaps more likely, version of the ‘desirable authentic labor’ possibility is a ‘human zoo’, where the AI just likes having “authentic” humans around. which is not very tradelike. But maybe the best bad AI case we could hope for is something like this—Earth left as a ‘human zoo’ while the AI takes over the rest of the lightcone.
In general, this post has prompted me to think more about the transition period between AI that’s weaker than humans and stronger than all of human civilization, and that’s been interesting! A lot of people assume that that takeoff will happen very quickly, but if it lasts for multiple years (or even decades) then the dynamics of that transition period could matter a lot, and trade is one aspect of that.
some stray thoughts on what that transition period could look like:
Some doomy-feeling states don’t immediately kill us. We might get an AI that’s able to defeat humanity before it’s able to cheaply replicate lots of human labor, because it gets a decisive strategic advantage via specialized skill in some random domain and can’t easily skill itself up in other domains.
When would an AI prefer to trade rather than coerce or steal?
maybe if the transition period is slow, and it knows it’s in the earlier part of the period, so reputation matters
maybe if it’s being cleverly watched or trained by the org building it, since they want to avoid bad press
maybe there’s some core of values you can imprint that leads to this? but maybe actually being able to solve this issue is basically equivalent to solving alignment, in which case you might as well do that.
In a transition period, powerful human orgs would find various ways to interface with AI and vice versa, since they would be super useful tools / partners for each other. Even if the transition period is short, it might be long enough to change things, e.g. by getting the world’s most powerful actors interested in building + using AI and not leaving it in the hands of a few AGI labs, by favoring labs that build especially good interfaces & especially valuable services, etc. (While in a world with a short take off rather than a long transition period, maybe big tech & governments don’t recognize what’s happening before ASI / doom.)