Meta-questions: How relevant are nanotechnological considerations for x-risk from AI? How suited are scenarios involving nanotech for making a plausible argument for x-risk from AI, i.e. one that convinces people to take the risk seriously and to become active in attempting to reduce it?
The AI x-risk thesis doesn’t require nanotech. Dangerously competent AIs are not going to openly betray us until they think they can win, which means, at minimum, they don’t need us to maintain the compute infrastructure they’d need to stay alive. Currently, AI chips do require our globalized economy to produce.
AI takeover is a highly disjunctive claim; there are a lot of different ways it could happen, but the thesis only requires one. We could imagine a future society that has become more and more dependent on AIs and has semiautonomous domestic and industrial robots in widespread use. (It’s harder to imagine that not happening, unless some other doom happens first.) One could imagine a lot of them getting hacked by a stealthy rogue AI and then turning on us all at once. The AI doom thesis only needs this level. paulfchristiano describes loss-of-control scenarios in What failure looks like without mentioning bio or nano.
But I think Yudkowsky’s point in bringing up bio/nano (“diamondoid bacteria”) was that a superintelligence could build its own self-sustaining infrastructure faster than you might think using methods you might not have thought of, for example, through bioengineering instead of robots. Or, you know, something else we haven’t thought of. He’s said something to the effect that since a superintelligence is smarter than him, he doesn’t know what it would actually be able to do and suggests nanotech as a lower bound of what might be possible, because he’s read Nanosystems, and he thought of it, even if he’s not smart enough to implement it now.
Scenarios like these could happen sooner and might be harder to see coming. That makes it relevant for planning interventions, like what regulations to ask for or what messages to spread. Unfortunately, I think we run into inferential gaps with our audience more from Yudkowsky’s scenarios than Christiano’s.
So I think a lot of confusion comes from people calling it things like “the AI x-risk thesis”. As far as I can tell, there are very few people who think that there will not be significant new dangers arising as ML systems grow more capable and the scaffolding around them grows more elaborate, and that those dangers stand a nontrivial chance of leading to the extinction of biological humans. But when you try to come up with a plan more specific than “try to ban general-purpose computing”, it turns out that the exact threat model matters. The set of things that would be helpful to prevent Yudkowsky’s FOOM and the set of things that would be likely to prevent the gradual irrelevance and uncompetitiveness of humans in Christiano’s Whimper are almost entirely disjoint (but are both referred to as “AI x-risk”).
To the extent that there are cheap actions that help with one but not the other, it is helpful on the margin to take those actions. When it comes to actions that destroy a lot of value but might marginally help with only one of the threat models, I think you would do well to figure out if that threat model is realistic.
Dangerously competent AIs are not going to openly betray us until they think they can win
Nit: Would seeing an AGI trying to betray us and fail count as evidence against a future sharp left turn? Failing that, what future observations does your world model say are really unlikely. If there are no such future observations, consider whether you are putting a lot of weight on an unfalsifiable model. Yudkowsky boasts that he derived his world model using the null string as input. I think that to the extent that’s true, it should be interpreted as a giant flashing red flag instead of something to brag about.
But I think Yudkowsky’s point in bringing up bio/nano (“diamondoid bacteria”) was that a superintelligence could build its own self-sustaining infrastructure faster than you might think using methods you might not have thought of, for example, through bioengineering instead of robots.
Not a nit: the “you can build a computational substrate that is similar to what our current chip fabs produce without needing to build up giant supply chains or do lots of slow expensive experiments or deal with Amdahl’s Law, by using the One Weird Trick of using nanotech / biotech” is exactly the assertion people are asking for evidence of. It’s kind of central in the whole FOOM story.
Nice comment again, as usual, faul_sname. I think you hit the nail on the head with ‘Can AI get compute-infrastructure-independence with self-reproducing microfactory tech?’
If the answer to this is yes, then the danger is higher than it would otherwise be that the AI will move against us soon after going rogue.
Although, even if the answer is no, there are still a lot of dangers. Persuasion/threats (e.g. becoming a wealthy dictator), collaboration/organization (e.g. working with a dictator), brain control via engineered viruses and/or BCIs, etc. So I think there’s sort of multiple branching paths of possibility, with variations and gradations. I don’t think it’s quite a smooth continuum since I think certain combos of ability and speed are unlikely or unworkable. Here’s something like a summary of my mental model of the main three paths:
Can the AI get smart super fast and get quick independence? (FOOM)
Can the AI get smart moderately fast and gain slower semi-independence (e.g. nation with aid from an independent-AI which is militarily unconquerable)? (Thump)
Can the AI get more competent gradually and slowly make humans irrelevant, until we’re no longer in a position to turn all AI off? (Whimper)
Personally, I think Thump is more likely than FOOM or Whimper. I think if strongly agentic AGI gets out of hand there will be some sort of humans+rogue_AI versus humans+controlled_AI standoff, like a cold war. Or maybe things will devolve into a hot war. I hope not, but that’s something to start preparing against by forming strong international treaties now based on agreeing to ally against any nation which can be proven to be working with Rogue AGI or something. I dunno. International politics is not my area of expertise. I just think it should be in people’s mental models as a possibility.
Note how your optimal response changes a lot based on threat model.
Foom : stop the reaction from being able to begin. Foom is too fast to control. Optimal response : AI pauses (which add the duration of the pause to some living people’s lives, who still may die after the foom). Political action : request AI pauses.
Thump: you need to keep up with the arms race. Dictator upgrading from technicals and ak-47s to hypersonic drone swarms? You need to be developing your own restricted models so you have your own equivalent weapons in greater numbers. The free world has vastly more resources so they can afford less efficient (and more controllable) AI models to r&d and build equivalent weaponry. Political action: request government funded moonshot effort to develop AI.
Whimper: you need to be upgrading humans with neural implants or other methods over time so they remain above some intelligence floor needed to not be scammed out of power. Humans don’t need to stay the smartest creatures around but they need AI assistants they can trust and enough intelligence to double check their work. This let’s humans retain property rights over the solar system, refusing to give AI any rights at all in sol, and they can enjoy utopia. Political action: request FDA overhaul.
Yeah, different regulatory strategies for different scenarios for sure. It’s tricky though that we don’t know which scenario will come to pass. I myself feel quite uncertain.
There is an important distinction around FOOM scenarios. They are too fast to legislate while they are in progress. The others give humanity a chance to see what is happening and change the rules ‘in flight ’.
Preventative legislation for a scenario that has never yet happened and sounds like implausible science fiction is a particularly hard ask. I can see why, if someone thought FOOM was highly likely, they could be pessimistic about governance as a path to safety.
Good point. Some specific narrow-domain superhuman skills, like persuasion, could also prevent in-flight regulation of slower scenarios. Another possible narrow domain would be one which enabled misuse on a scale that disrupted governments substantially, such as bioweapons.
I can see why, if someone thought FOOM was highly likely, they could be pessimistic about governance as a path to safety.
It’s worse than that because foom is so powerful the difference between “no government restricts AI meaningfully” and “9 out of 10 power blocs able to build AI restrict it” is small. Foom for a 90 day takeover implies a doubling time under a week, if all power blocs were equal in starting resources, the “”90 percent ” regulation case vs the “no regulations ” case is 4 doublings or about 4 weeks.
One governance solution proposed to handle this is “nuke em”, but 7 day doubling times imply other things, like some method of building infrastructure that doesn’t need humans current cities and factories and specialists, because by definition humans are not that fast at building anything. Just shipping parts around takes days.
It would be like trying to stop machine cancer. Nukes just buy time.
I personally don’t think the above is possible starting from current technology, I am just trying to take the scenario seriously. (If it’s possible at all I think you would need to bootstrap there through many intermediate stages of technology that take unavoidable amounts of time)
That is a really good point that there are intermediate scenarios—“thump” sounds pretty plausible to me as well, and the likely-to-be-effective mitigation measures are again different.
I also postulate “splat”: one AI/human coalition comes to believe that they are militarily unconquerable, another coalition disagrees, and the resulting military conflict is sufficient to destroy supply chains and also drops us into an equilibrium where supply chains as complex as the ones we have can’t re-form. Technically you don’t need an AI for this one, but if you had an AI tuned to for example pander to an egotistical dictator without having to deal with silly constraints like “being unwilling to advocate for suicidal policies” I could see that AI making this failure mode a lot more likely.
But when you try to come up with a plan more specific than “try to ban general-purpose computing”, it turns out that the exact threat model matters.
I think this is why I’m more partial to Holden’s “playbook, not plan” way of thinking about this, even if I’m not sure what to think of his 4 key categories of interventions.
Meta-questions: How relevant are nanotechnological considerations for x-risk from AI? How suited are scenarios involving nanotech for making a plausible argument for x-risk from AI, i.e. one that convinces people to take the risk seriously and to become active in attempting to reduce it?
The AI x-risk thesis doesn’t require nanotech. Dangerously competent AIs are not going to openly betray us until they think they can win, which means, at minimum, they don’t need us to maintain the compute infrastructure they’d need to stay alive. Currently, AI chips do require our globalized economy to produce.
AI takeover is a highly disjunctive claim; there are a lot of different ways it could happen, but the thesis only requires one. We could imagine a future society that has become more and more dependent on AIs and has semiautonomous domestic and industrial robots in widespread use. (It’s harder to imagine that not happening, unless some other doom happens first.) One could imagine a lot of them getting hacked by a stealthy rogue AI and then turning on us all at once. The AI doom thesis only needs this level. paulfchristiano describes loss-of-control scenarios in What failure looks like without mentioning bio or nano.
But I think Yudkowsky’s point in bringing up bio/nano (“diamondoid bacteria”) was that a superintelligence could build its own self-sustaining infrastructure faster than you might think using methods you might not have thought of, for example, through bioengineering instead of robots. Or, you know, something else we haven’t thought of. He’s said something to the effect that since a superintelligence is smarter than him, he doesn’t know what it would actually be able to do and suggests nanotech as a lower bound of what might be possible, because he’s read Nanosystems, and he thought of it, even if he’s not smart enough to implement it now.
Scenarios like these could happen sooner and might be harder to see coming. That makes it relevant for planning interventions, like what regulations to ask for or what messages to spread. Unfortunately, I think we run into inferential gaps with our audience more from Yudkowsky’s scenarios than Christiano’s.
So I think a lot of confusion comes from people calling it things like “the AI x-risk thesis”. As far as I can tell, there are very few people who think that there will not be significant new dangers arising as ML systems grow more capable and the scaffolding around them grows more elaborate, and that those dangers stand a nontrivial chance of leading to the extinction of biological humans. But when you try to come up with a plan more specific than “try to ban general-purpose computing”, it turns out that the exact threat model matters. The set of things that would be helpful to prevent Yudkowsky’s FOOM and the set of things that would be likely to prevent the gradual irrelevance and uncompetitiveness of humans in Christiano’s Whimper are almost entirely disjoint (but are both referred to as “AI x-risk”).
To the extent that there are cheap actions that help with one but not the other, it is helpful on the margin to take those actions. When it comes to actions that destroy a lot of value but might marginally help with only one of the threat models, I think you would do well to figure out if that threat model is realistic.
Nit: Would seeing an AGI trying to betray us and fail count as evidence against a future sharp left turn? Failing that, what future observations does your world model say are really unlikely. If there are no such future observations, consider whether you are putting a lot of weight on an unfalsifiable model. Yudkowsky boasts that he derived his world model using the null string as input. I think that to the extent that’s true, it should be interpreted as a giant flashing red flag instead of something to brag about.
Not a nit: the “you can build a computational substrate that is similar to what our current chip fabs produce without needing to build up giant supply chains or do lots of slow expensive experiments or deal with Amdahl’s Law, by using the One Weird Trick of using nanotech / biotech” is exactly the assertion people are asking for evidence of. It’s kind of central in the whole FOOM story.
Nice comment again, as usual, faul_sname. I think you hit the nail on the head with ‘Can AI get compute-infrastructure-independence with self-reproducing microfactory tech?’
If the answer to this is yes, then the danger is higher than it would otherwise be that the AI will move against us soon after going rogue.
Although, even if the answer is no, there are still a lot of dangers. Persuasion/threats (e.g. becoming a wealthy dictator), collaboration/organization (e.g. working with a dictator), brain control via engineered viruses and/or BCIs, etc. So I think there’s sort of multiple branching paths of possibility, with variations and gradations. I don’t think it’s quite a smooth continuum since I think certain combos of ability and speed are unlikely or unworkable. Here’s something like a summary of my mental model of the main three paths:
Can the AI get smart super fast and get quick independence? (FOOM)
Can the AI get smart moderately fast and gain slower semi-independence (e.g. nation with aid from an independent-AI which is militarily unconquerable)? (Thump)
Can the AI get more competent gradually and slowly make humans irrelevant, until we’re no longer in a position to turn all AI off? (Whimper)
Personally, I think Thump is more likely than FOOM or Whimper. I think if strongly agentic AGI gets out of hand there will be some sort of humans+rogue_AI versus humans+controlled_AI standoff, like a cold war. Or maybe things will devolve into a hot war. I hope not, but that’s something to start preparing against by forming strong international treaties now based on agreeing to ally against any nation which can be proven to be working with Rogue AGI or something. I dunno. International politics is not my area of expertise. I just think it should be in people’s mental models as a possibility.
Note how your optimal response changes a lot based on threat model.
Foom : stop the reaction from being able to begin. Foom is too fast to control. Optimal response : AI pauses (which add the duration of the pause to some living people’s lives, who still may die after the foom). Political action : request AI pauses.
Thump: you need to keep up with the arms race. Dictator upgrading from technicals and ak-47s to hypersonic drone swarms? You need to be developing your own restricted models so you have your own equivalent weapons in greater numbers. The free world has vastly more resources so they can afford less efficient (and more controllable) AI models to r&d and build equivalent weaponry. Political action: request government funded moonshot effort to develop AI.
Whimper: you need to be upgrading humans with neural implants or other methods over time so they remain above some intelligence floor needed to not be scammed out of power. Humans don’t need to stay the smartest creatures around but they need AI assistants they can trust and enough intelligence to double check their work. This let’s humans retain property rights over the solar system, refusing to give AI any rights at all in sol, and they can enjoy utopia. Political action: request FDA overhaul.
Yeah, different regulatory strategies for different scenarios for sure. It’s tricky though that we don’t know which scenario will come to pass. I myself feel quite uncertain. There is an important distinction around FOOM scenarios. They are too fast to legislate while they are in progress. The others give humanity a chance to see what is happening and change the rules ‘in flight ’.
Preventative legislation for a scenario that has never yet happened and sounds like implausible science fiction is a particularly hard ask. I can see why, if someone thought FOOM was highly likely, they could be pessimistic about governance as a path to safety.
“The others give humanity a chance to see what is happening and change the rules ‘in flight ’.”
This is possible in non-Foom scenarios, but not a given (e.g. super-human persuasion AIs).
Good point. Some specific narrow-domain superhuman skills, like persuasion, could also prevent in-flight regulation of slower scenarios. Another possible narrow domain would be one which enabled misuse on a scale that disrupted governments substantially, such as bioweapons.
It’s worse than that because foom is so powerful the difference between “no government restricts AI meaningfully” and “9 out of 10 power blocs able to build AI restrict it” is small. Foom for a 90 day takeover implies a doubling time under a week, if all power blocs were equal in starting resources, the “”90 percent ” regulation case vs the “no regulations ” case is 4 doublings or about 4 weeks.
One governance solution proposed to handle this is “nuke em”, but 7 day doubling times imply other things, like some method of building infrastructure that doesn’t need humans current cities and factories and specialists, because by definition humans are not that fast at building anything. Just shipping parts around takes days.
It would be like trying to stop machine cancer. Nukes just buy time.
I personally don’t think the above is possible starting from current technology, I am just trying to take the scenario seriously. (If it’s possible at all I think you would need to bootstrap there through many intermediate stages of technology that take unavoidable amounts of time)
That is a really good point that there are intermediate scenarios—“thump” sounds pretty plausible to me as well, and the likely-to-be-effective mitigation measures are again different.
I also postulate “splat”: one AI/human coalition comes to believe that they are militarily unconquerable, another coalition disagrees, and the resulting military conflict is sufficient to destroy supply chains and also drops us into an equilibrium where supply chains as complex as the ones we have can’t re-form. Technically you don’t need an AI for this one, but if you had an AI tuned to for example pander to an egotistical dictator without having to deal with silly constraints like “being unwilling to advocate for suicidal policies” I could see that AI making this failure mode a lot more likely.
I think this is why I’m more partial to Holden’s “playbook, not plan” way of thinking about this, even if I’m not sure what to think of his 4 key categories of interventions.