I’m fairly agnostic about how dumb we’re talking—what kinds of acts or confluence of events are actually likely to be effective complete x-risks, particularly at relatively low levels of intelligence/capability. But that’s besides the point in some ways, because whereever someone might place the threshold for x-risk capable AI, as long as you assume that greater intelligence is harder to produce (an assumption that doesn’t necessarily hold, as I acknowledged), I think that suggests that we will be killed by something not much higher than that threshold once it’s first reached.
Evolution tries many very similar designs, always moving in tiny steps through the search space. Humans are capable of moving in larger jumps. Often the difference between one attempt and the next is several times more compute. No one trained something 90% as big as GPT3 before GPT3.
This is true for now, but there’s a sense in which the field is in a low hanging fruit picking stage of development where there’s plenty of room to scale massively fairly easily. If the thresholds are crossed during a stage like this where everyone is rushing to collect big, easy advances, then yes, I would expect the gap between how much more intelligent/capable the AI that kills us is, relative to how intelligent it needed to be, to be much higher (but still not that much higher, unless e.g. fast takeoff etc.). Conversely in a world where progress is in a more incremental stage, then I would expect a smaller gap.
If the AI is on a computer system where it can access its own code. (Either deliberately, or through dire security even a dumb AI can break) the amount of intelligence needed to see “neuron count” and turn it up isn’t huge. Basically, the amount of power needed to do a bit of AI research to make itself a little smarter is on the level of a smart AI expert. The intelligence needed to destroy humanity is higher than that.
Sure, we may well have dumb AI failures first. That stock market crash. Maybe some bug in a flight control system that makes lots of planes do a nosedive at once. But it’s really hard to see how an AI failure could lead to human extinction, unless the AI was smart enough to develop new nanotech (And if it can do that, it can make itself smarter).
The first uranium pile to put out a lethal dose of radiation can put out many times a lethal dose of radiation. Because a subcritical pile doesn’t give out enough radiation. And once the pile is critical, it quickly ramps up to loads of radiation.
Can you name any strategy the AI could use to wipe out humanity, without strongly implying an AI smart enough for substantial self improvement?
Self-improvement to me doesn’t automatically mean RSI takeoff to infinity—an AI that self-improves up to a certain point capable of wiping out humanity but not yet reaching criticality seems to me to be possible.
I agree though that availability of powerful grey/black ball technologies like nanotech that could require fewer variables going wrong and less intelligence to nevertheless enable an AI to plausibly represent an x-risk is a big factor. Other existing technologies like engineered pandemics or nuclear weapons, while dangerous, seem somewhat difficult even with AI to leverage into fully wiping out humanity at least by themselves, even if they could lead to worlds that are much more vulnerable to further shocks.
as long as you assume that greater intelligence is harder to produce (an assumption that doesn’t necessarily hold, as I acknowledged), I think that suggests that we will be killed by something not much higher than that threshold once it’s first reached.
So long as we assume the timescales of intelligence growth are slow compared to destroying the world timescales. If an AI is smart enough to destroy the world in a year (in the hypothetical where it had to stop self improving and do it now). A day of self improvement and they are smart enough to destroy the world in a week. Another day of self improvement and they can destroy the world in an hour.
Another possibility is an AI that doesn’t choose to destroy the world at the first available moment.
Imagine a paperclip maximizer. It thinks it has a 99% chance of destroying the world and turning everything into paperclips. And a 1% chance of getting caught and destroyed. If it waits for another week of self improvement, it can get that chance down to 0.0001%.
This is true for now, but there’s a sense in which the field is in a low hanging fruit picking stage of development where there’s plenty of room to scale massively fairly easily.
Suppose the limiting factor was compute budget. Making each AI 1% bigger than before means basically wasting compute running pretty much the same AI again and again. Making each AI about 2x as big as the last is sensible. If each training run costs a fortune, you can’t afford to go in tiny steps.
I’m fairly agnostic about how dumb we’re talking—what kinds of acts or confluence of events are actually likely to be effective complete x-risks, particularly at relatively low levels of intelligence/capability. But that’s besides the point in some ways, because whereever someone might place the threshold for x-risk capable AI, as long as you assume that greater intelligence is harder to produce (an assumption that doesn’t necessarily hold, as I acknowledged), I think that suggests that we will be killed by something not much higher than that threshold once it’s first reached.
This is true for now, but there’s a sense in which the field is in a low hanging fruit picking stage of development where there’s plenty of room to scale massively fairly easily. If the thresholds are crossed during a stage like this where everyone is rushing to collect big, easy advances, then yes, I would expect the gap between how much more intelligent/capable the AI that kills us is, relative to how intelligent it needed to be, to be much higher (but still not that much higher, unless e.g. fast takeoff etc.). Conversely in a world where progress is in a more incremental stage, then I would expect a smaller gap.
Self-improvement to me doesn’t automatically mean RSI takeoff to infinity—an AI that self-improves up to a certain point capable of wiping out humanity but not yet reaching criticality seems to me to be possible.
I agree though that availability of powerful grey/black ball technologies like nanotech that could require fewer variables going wrong and less intelligence to nevertheless enable an AI to plausibly represent an x-risk is a big factor. Other existing technologies like engineered pandemics or nuclear weapons, while dangerous, seem somewhat difficult even with AI to leverage into fully wiping out humanity at least by themselves, even if they could lead to worlds that are much more vulnerable to further shocks.
So long as we assume the timescales of intelligence growth are slow compared to destroying the world timescales. If an AI is smart enough to destroy the world in a year (in the hypothetical where it had to stop self improving and do it now). A day of self improvement and they are smart enough to destroy the world in a week. Another day of self improvement and they can destroy the world in an hour.
Another possibility is an AI that doesn’t choose to destroy the world at the first available moment.
Imagine a paperclip maximizer. It thinks it has a 99% chance of destroying the world and turning everything into paperclips. And a 1% chance of getting caught and destroyed. If it waits for another week of self improvement, it can get that chance down to 0.0001%.
Suppose the limiting factor was compute budget. Making each AI 1% bigger than before means basically wasting compute running pretty much the same AI again and again. Making each AI about 2x as big as the last is sensible. If each training run costs a fortune, you can’t afford to go in tiny steps.