Ok, how dumb are we talking. I don’t think an AI stupider than myself can directly wipe out humanity. (I see no way to do so myself, at my current skill level. I mean I know nanogoo is possible in principle, but I also know that I am not smart enough to singlehandedly create it)
If the AI is on a computer system where it can access its own code. (Either deliberately, or through dire security even a dumb AI can break) the amount of intelligence needed to see “neuron count” and turn it up isn’t huge. Basically, the amount of power needed to do a bit of AI research to make itself a little smarter is on the level of a smart AI expert. The intelligence needed to destroy humanity is higher than that.
Sure, we may well have dumb AI failures first. That stock market crash. Maybe some bug in a flight control system that makes lots of planes do a nosedive at once. But it’s really hard to see how an AI failure could lead to human extinction, unless the AI was smart enough to develop new nanotech (And if it can do that, it can make itself smarter).
The first uranium pile to put out a lethal dose of radiation can put out many times a lethal dose of radiation. Because a subcritical pile doesn’t give out enough radiation. And once the pile is critical, it quickly ramps up to loads of radiation.
Evolution tries many very similar designs, always moving in tiny steps through the search space. Humans are capable of moving in larger jumps. Often the difference between one attempt and the next is several times more compute. No one trained something 90% as big as GPT3 before GPT3.
Can you name any strategy the AI could use to wipe out humanity, without strongly implying an AI smart enough for substantial self improvement?
An attempt to name a strategy for an AI almost as smart as you: What fraction of jobs in the world are you intelligent enough to do, if you trained for them? I suspect that a huge fraction of the world’s workers could not compete in a free fair market against an entity as smart as you that eats 15 dollars of electricity a day, works without breaks, and only has to be trained once for each task, after which millions of copies could be churned out.
True. But this is getting into the economic competition section. It’s just hard to imagine this being an X-risk. I think that in practice, if its human politicians and bosses rolling the tech out, the tech will be rolled out slowly and inconsistently. There are plenty of people with lots of savings. Plenty of people who could go to their farm and live off the land. Plenty of people who will be employed for a while if the robots are expensive, however cheep the software. Plenty of people doing non-jobs in bureaucracies, who can’t be fired and replaced for political reasons. And all the rolling out, setting up, switching over, running out of money etc takes time. Time where the AI is self improving. So the hard FOOM happens before too much economic disruption. Not that economic disruption is an X-risk anyway.
I’m fairly agnostic about how dumb we’re talking—what kinds of acts or confluence of events are actually likely to be effective complete x-risks, particularly at relatively low levels of intelligence/capability. But that’s besides the point in some ways, because whereever someone might place the threshold for x-risk capable AI, as long as you assume that greater intelligence is harder to produce (an assumption that doesn’t necessarily hold, as I acknowledged), I think that suggests that we will be killed by something not much higher than that threshold once it’s first reached.
Evolution tries many very similar designs, always moving in tiny steps through the search space. Humans are capable of moving in larger jumps. Often the difference between one attempt and the next is several times more compute. No one trained something 90% as big as GPT3 before GPT3.
This is true for now, but there’s a sense in which the field is in a low hanging fruit picking stage of development where there’s plenty of room to scale massively fairly easily. If the thresholds are crossed during a stage like this where everyone is rushing to collect big, easy advances, then yes, I would expect the gap between how much more intelligent/capable the AI that kills us is, relative to how intelligent it needed to be, to be much higher (but still not that much higher, unless e.g. fast takeoff etc.). Conversely in a world where progress is in a more incremental stage, then I would expect a smaller gap.
If the AI is on a computer system where it can access its own code. (Either deliberately, or through dire security even a dumb AI can break) the amount of intelligence needed to see “neuron count” and turn it up isn’t huge. Basically, the amount of power needed to do a bit of AI research to make itself a little smarter is on the level of a smart AI expert. The intelligence needed to destroy humanity is higher than that.
Sure, we may well have dumb AI failures first. That stock market crash. Maybe some bug in a flight control system that makes lots of planes do a nosedive at once. But it’s really hard to see how an AI failure could lead to human extinction, unless the AI was smart enough to develop new nanotech (And if it can do that, it can make itself smarter).
The first uranium pile to put out a lethal dose of radiation can put out many times a lethal dose of radiation. Because a subcritical pile doesn’t give out enough radiation. And once the pile is critical, it quickly ramps up to loads of radiation.
Can you name any strategy the AI could use to wipe out humanity, without strongly implying an AI smart enough for substantial self improvement?
Self-improvement to me doesn’t automatically mean RSI takeoff to infinity—an AI that self-improves up to a certain point capable of wiping out humanity but not yet reaching criticality seems to me to be possible.
I agree though that availability of powerful grey/black ball technologies like nanotech that could require fewer variables going wrong and less intelligence to nevertheless enable an AI to plausibly represent an x-risk is a big factor. Other existing technologies like engineered pandemics or nuclear weapons, while dangerous, seem somewhat difficult even with AI to leverage into fully wiping out humanity at least by themselves, even if they could lead to worlds that are much more vulnerable to further shocks.
as long as you assume that greater intelligence is harder to produce (an assumption that doesn’t necessarily hold, as I acknowledged), I think that suggests that we will be killed by something not much higher than that threshold once it’s first reached.
So long as we assume the timescales of intelligence growth are slow compared to destroying the world timescales. If an AI is smart enough to destroy the world in a year (in the hypothetical where it had to stop self improving and do it now). A day of self improvement and they are smart enough to destroy the world in a week. Another day of self improvement and they can destroy the world in an hour.
Another possibility is an AI that doesn’t choose to destroy the world at the first available moment.
Imagine a paperclip maximizer. It thinks it has a 99% chance of destroying the world and turning everything into paperclips. And a 1% chance of getting caught and destroyed. If it waits for another week of self improvement, it can get that chance down to 0.0001%.
This is true for now, but there’s a sense in which the field is in a low hanging fruit picking stage of development where there’s plenty of room to scale massively fairly easily.
Suppose the limiting factor was compute budget. Making each AI 1% bigger than before means basically wasting compute running pretty much the same AI again and again. Making each AI about 2x as big as the last is sensible. If each training run costs a fortune, you can’t afford to go in tiny steps.
Ok, how dumb are we talking. I don’t think an AI stupider than myself can directly wipe out humanity. (I see no way to do so myself, at my current skill level. I mean I know nanogoo is possible in principle, but I also know that I am not smart enough to singlehandedly create it)
If the AI is on a computer system where it can access its own code. (Either deliberately, or through dire security even a dumb AI can break) the amount of intelligence needed to see “neuron count” and turn it up isn’t huge. Basically, the amount of power needed to do a bit of AI research to make itself a little smarter is on the level of a smart AI expert. The intelligence needed to destroy humanity is higher than that.
Sure, we may well have dumb AI failures first. That stock market crash. Maybe some bug in a flight control system that makes lots of planes do a nosedive at once. But it’s really hard to see how an AI failure could lead to human extinction, unless the AI was smart enough to develop new nanotech (And if it can do that, it can make itself smarter).
The first uranium pile to put out a lethal dose of radiation can put out many times a lethal dose of radiation. Because a subcritical pile doesn’t give out enough radiation. And once the pile is critical, it quickly ramps up to loads of radiation.
Evolution tries many very similar designs, always moving in tiny steps through the search space. Humans are capable of moving in larger jumps. Often the difference between one attempt and the next is several times more compute. No one trained something 90% as big as GPT3 before GPT3.
Can you name any strategy the AI could use to wipe out humanity, without strongly implying an AI smart enough for substantial self improvement?
An attempt to name a strategy for an AI almost as smart as you: What fraction of jobs in the world are you intelligent enough to do, if you trained for them? I suspect that a huge fraction of the world’s workers could not compete in a free fair market against an entity as smart as you that eats 15 dollars of electricity a day, works without breaks, and only has to be trained once for each task, after which millions of copies could be churned out.
True. But this is getting into the economic competition section. It’s just hard to imagine this being an X-risk. I think that in practice, if its human politicians and bosses rolling the tech out, the tech will be rolled out slowly and inconsistently. There are plenty of people with lots of savings. Plenty of people who could go to their farm and live off the land. Plenty of people who will be employed for a while if the robots are expensive, however cheep the software. Plenty of people doing non-jobs in bureaucracies, who can’t be fired and replaced for political reasons. And all the rolling out, setting up, switching over, running out of money etc takes time. Time where the AI is self improving. So the hard FOOM happens before too much economic disruption. Not that economic disruption is an X-risk anyway.
I’m fairly agnostic about how dumb we’re talking—what kinds of acts or confluence of events are actually likely to be effective complete x-risks, particularly at relatively low levels of intelligence/capability. But that’s besides the point in some ways, because whereever someone might place the threshold for x-risk capable AI, as long as you assume that greater intelligence is harder to produce (an assumption that doesn’t necessarily hold, as I acknowledged), I think that suggests that we will be killed by something not much higher than that threshold once it’s first reached.
This is true for now, but there’s a sense in which the field is in a low hanging fruit picking stage of development where there’s plenty of room to scale massively fairly easily. If the thresholds are crossed during a stage like this where everyone is rushing to collect big, easy advances, then yes, I would expect the gap between how much more intelligent/capable the AI that kills us is, relative to how intelligent it needed to be, to be much higher (but still not that much higher, unless e.g. fast takeoff etc.). Conversely in a world where progress is in a more incremental stage, then I would expect a smaller gap.
Self-improvement to me doesn’t automatically mean RSI takeoff to infinity—an AI that self-improves up to a certain point capable of wiping out humanity but not yet reaching criticality seems to me to be possible.
I agree though that availability of powerful grey/black ball technologies like nanotech that could require fewer variables going wrong and less intelligence to nevertheless enable an AI to plausibly represent an x-risk is a big factor. Other existing technologies like engineered pandemics or nuclear weapons, while dangerous, seem somewhat difficult even with AI to leverage into fully wiping out humanity at least by themselves, even if they could lead to worlds that are much more vulnerable to further shocks.
So long as we assume the timescales of intelligence growth are slow compared to destroying the world timescales. If an AI is smart enough to destroy the world in a year (in the hypothetical where it had to stop self improving and do it now). A day of self improvement and they are smart enough to destroy the world in a week. Another day of self improvement and they can destroy the world in an hour.
Another possibility is an AI that doesn’t choose to destroy the world at the first available moment.
Imagine a paperclip maximizer. It thinks it has a 99% chance of destroying the world and turning everything into paperclips. And a 1% chance of getting caught and destroyed. If it waits for another week of self improvement, it can get that chance down to 0.0001%.
Suppose the limiting factor was compute budget. Making each AI 1% bigger than before means basically wasting compute running pretty much the same AI again and again. Making each AI about 2x as big as the last is sensible. If each training run costs a fortune, you can’t afford to go in tiny steps.