Besides invoking “Deep Knowledge” and the analogy of ruling out perpetual motion, another important tool for understanding AI foom risk is security mindset, which Eliezer has written about here.
Maybe this is tangential, but I don’t get why the AI foom debate isn’t framed more often as a matter of basic security considerations. AI foom risk seems like a matter of basic security mindset. I think AI is a risk to humanity for the same reason I think any website can be taken out by a hack if you put a sufficiently large bounty on it.
Humanity has all kinds of vulnerabilities that are exploitable by a team of fast simulated humans, not to mention Von Neumann simulations or superhuman AIs. There are so many plausible attack vectors by which to destroy or control humanity: psychology, financial markets, supply chains, biology, nanotechnology, just to name a few.
It’s very plausible that the AI gets away from us, runs in the cloud, self-improves, and we can’t turn it off. It’s like a nuclear explosion that may start slow, but it’s picking up speed, recursively self-improving or even just speeding up to the level of an adversarial Von Neumann team, and it’s hidden in billions of devices.
We have the example of nuclear weapons. The US was a singleton power for a few years due to developing nukes first. At least a nuclear explosion stops when it burns through its fissile material. AI doesn’t stop, and it’s a much more powerful adversary that will not be contained. It’s like the first nuclear pile you’re testing with has a yield much larger than Tsar Bomba. You try one test and then you’ve permanently crashed your ability to test.
So to summarize my security-mindset view: Humanity is vulnerable to hackers, without much ability to restore a backup once we get hacked, and it’s very easy to think AI becomes a great hacker soon.
As for the many attack vectors, I would also add “many places and stages where things can go wrong”, AI became a genius social and computer hacker.
(By the way, I heard that most hacks are carried out not with the help of computer hacking, but with the help of social engineering, because a person is a much more unreliable and difficult to patch system)
From my point of view, the main problem is not even that the first piece of uranium explodes so that it melts the Earth, the problem is that there are 8 billion people on Earth, each has several electronic devices, and processors (well, or batteries for a more complete analogy) are made of californium. Now you have to hope that literally no one in 8 billion people will cause their device to explode (this is much worse than expecting that no one in just 1 million wizards will be prompted with the idea of transfiguring anti matter, botulinum toxin, thousands of infections, nuclear weapons, strandels, as well as things like “only top quarks”, which cannot be imagined at all), or that literally none of these reactions will go as a chain reaction through all processors (which are also connected to a worldwide network operating on the basis of radiation) in form of a direct explosion or neutron beams, or that you will be able to stop literally every explosive / neutron chain reaction.
We can conditionally calculate that for each of 8 billion people there are three probabilities that they will not fail all three points, and even if on average each of them is very high, we raise each of them to the power of 8 billion, worse, these are all probabilities in a certain period of time, conditionally, a year, the problem is that over time, not even the probabilities grow, but the interval for creating AI is shortened, so that we get the difference between a geometric and exponential progression.
Of course, one can say that one should not consider the average over all, that the number should be reduced for all but the number of processors, but then the number of people who can interfere will be reduced, and the likelihood that one of them will create AI will increase, and again, the problem is that it’s not the chance of creating AI that increases, but the process becomes easier, so that more people have a higher chance of creating it, and that’s why I still count for all people.
Finally, we can say that civilization will react when it sees not smoke, but fire. But civilization is not adequate. Generally. Only here she did not take fire-fighting measures and did not react to smoke. She also showed how she would react to the example of the coronavirus. But only here, “it’s not more dangerous than the flu. Graphic is exponential? Never mind”, “it’s all a conspiracy and not true danger”, “I won’t get vaccinated” will be added “it’s all fiction / cult”, “AI is good” and so on.
I feel you’re quite overstating the ability of the security mindset to show FOOM though. The reason it’s not presented as a direct consequence of a security mindset is… because it’s not one?
Like, once you are convinced of the strong possibility and unavoidability of AGI and superintelligence (maybe through FOOM arguments), then the security mindset actually helps you, and combining it with deep knowledge (like the Orthogonality Thesis) let’s you find a lot more ways of breaking the “humanity security”. But the security mindset applied without arguments for AGI doesn’t let you postulate AGI, for the same reason that the security mindset without arguments about mind-reading doesn’t let you postulate that the hackers might read the password in your mind.
For me the security mindset frame comes down to two questions:
1. Can we always stop AI once we release it?
2. Can we make the first unstoppable AI do what we want?
To which I’d answer “no” and “only with lots of research”.
Without security mindset, one tends to think an unstoppable AI is a-priori likely to do what humans want, since humans built it. With security mindset, one sees that most AIs are nukes that wreak havoc on human values, and getting them to do what humans want is analogous to building crash-proof software for a space probe, except the whole human race only gets to launch one probe and it goes to whoever launches it first.
I’d like to see this kind of discussion with someone who doesn’t agree with MIRI’s sense of danger, in addition to all the discussions about how to extrapolate trends and predict development.
Without security mindset, one tends to think an unstoppable AI is a-priori likely to do what humans want, since humans built it. With security mindset, one sees that most AIs are nukes that wreak havoc on human values, and getting them to do what humans want is analogous to building crash-proof software for a space probe, except the whole human race only gets to launch one probe and it goes to whoever launches it first.
I think this is a really shallow argument that undersells enormously the actual reasons for caring about alignment. We have actual arguments for why unstoppable AI are not-likely to do what human wants, and they don’t need the security mindset at all. The basic outline is something like:
Since we have historically a lot of trouble writing down programs that solve more complex and general problems like language or image recognition (and successes through ML), future AI and AGI will probably the sort to “fill-in the gaps” in our request/specifications
For almost everything we could ask an AI to accomplish, there are actions that would help it and would be bad and counterintuitive from previous technology standpoint (the famous convergent subgoals)
Precisely specifying what we want without relying on common sense is incredibly hard, and doesn’t survive strong optimization (Goodhart’s law)
And competence by itself doesn’t solve the problem, because understanding what humans want doesn’t mean caring about it (Orthogonality thesis).
This line of reasoning (which is not new by any mean, it’s basically straight out of Bostrom and early Yudkowsky’s writing) justify the security mindset for AGI and alignment. Not the other way around.
(And historically, Yudkowsky wanted to build AGI before he found out about these points, which turned him into the biggest user — but not the only one by all mean — of the security mindset in alignment)
Ok I agree there are a bunch of important concepts to be aware of, such as complexity of value, and there are many ways for security mindset by itself to fail at flagging the extent of AI risk if one is ignorant of some of these other concepts.
I just think the outside view and extrapolating trends is so far from how one should reason about mere nukes, and superhuman intelligence is very nuke-like or at least has a very high chance of being nuke-like: that is, unlock unprecedentedly large rapid irreversible effects. Extrapolating from current trends would have been quite unhelpful to nuclear safety. I know Eliezer is just trying to meet other people in the discussion where they are, but it would be nice to have another discussion that seems more on-topic from Eliezer’s own perspective.
Besides invoking “Deep Knowledge” and the analogy of ruling out perpetual motion, another important tool for understanding AI foom risk is security mindset, which Eliezer has written about here.
Maybe this is tangential, but I don’t get why the AI foom debate isn’t framed more often as a matter of basic security considerations. AI foom risk seems like a matter of basic security mindset. I think AI is a risk to humanity for the same reason I think any website can be taken out by a hack if you put a sufficiently large bounty on it.
Humanity has all kinds of vulnerabilities that are exploitable by a team of fast simulated humans, not to mention Von Neumann simulations or superhuman AIs. There are so many plausible attack vectors by which to destroy or control humanity: psychology, financial markets, supply chains, biology, nanotechnology, just to name a few.
It’s very plausible that the AI gets away from us, runs in the cloud, self-improves, and we can’t turn it off. It’s like a nuclear explosion that may start slow, but it’s picking up speed, recursively self-improving or even just speeding up to the level of an adversarial Von Neumann team, and it’s hidden in billions of devices.
We have the example of nuclear weapons. The US was a singleton power for a few years due to developing nukes first. At least a nuclear explosion stops when it burns through its fissile material. AI doesn’t stop, and it’s a much more powerful adversary that will not be contained. It’s like the first nuclear pile you’re testing with has a yield much larger than Tsar Bomba. You try one test and then you’ve permanently crashed your ability to test.
So to summarize my security-mindset view: Humanity is vulnerable to hackers, without much ability to restore a backup once we get hacked, and it’s very easy to think AI becomes a great hacker soon.
As for the many attack vectors, I would also add “many places and stages where things can go wrong”, AI became a genius social and computer hacker. (By the way, I heard that most hacks are carried out not with the help of computer hacking, but with the help of social engineering, because a person is a much more unreliable and difficult to patch system) From my point of view, the main problem is not even that the first piece of uranium explodes so that it melts the Earth, the problem is that there are 8 billion people on Earth, each has several electronic devices, and processors (well, or batteries for a more complete analogy) are made of californium. Now you have to hope that literally no one in 8 billion people will cause their device to explode (this is much worse than expecting that no one in just 1 million wizards will be prompted with the idea of transfiguring anti matter, botulinum toxin, thousands of infections, nuclear weapons, strandels, as well as things like “only top quarks”, which cannot be imagined at all), or that literally none of these reactions will go as a chain reaction through all processors (which are also connected to a worldwide network operating on the basis of radiation) in form of a direct explosion or neutron beams, or that you will be able to stop literally every explosive / neutron chain reaction. We can conditionally calculate that for each of 8 billion people there are three probabilities that they will not fail all three points, and even if on average each of them is very high, we raise each of them to the power of 8 billion, worse, these are all probabilities in a certain period of time, conditionally, a year, the problem is that over time, not even the probabilities grow, but the interval for creating AI is shortened, so that we get the difference between a geometric and exponential progression. Of course, one can say that one should not consider the average over all, that the number should be reduced for all but the number of processors, but then the number of people who can interfere will be reduced, and the likelihood that one of them will create AI will increase, and again, the problem is that it’s not the chance of creating AI that increases, but the process becomes easier, so that more people have a higher chance of creating it, and that’s why I still count for all people. Finally, we can say that civilization will react when it sees not smoke, but fire. But civilization is not adequate. Generally. Only here she did not take fire-fighting measures and did not react to smoke. She also showed how she would react to the example of the coronavirus. But only here, “it’s not more dangerous than the flu. Graphic is exponential? Never mind”, “it’s all a conspiracy and not true danger”, “I won’t get vaccinated” will be added “it’s all fiction / cult”, “AI is good” and so on.
Yes, I do quote the security mindset in the post.
I feel you’re quite overstating the ability of the security mindset to show FOOM though. The reason it’s not presented as a direct consequence of a security mindset is… because it’s not one?
Like, once you are convinced of the strong possibility and unavoidability of AGI and superintelligence (maybe through FOOM arguments), then the security mindset actually helps you, and combining it with deep knowledge (like the Orthogonality Thesis) let’s you find a lot more ways of breaking the “humanity security”. But the security mindset applied without arguments for AGI doesn’t let you postulate AGI, for the same reason that the security mindset without arguments about mind-reading doesn’t let you postulate that the hackers might read the password in your mind.
For me the security mindset frame comes down to two questions:
1. Can we always stop AI once we release it?
2. Can we make the first unstoppable AI do what we want?
To which I’d answer “no” and “only with lots of research”.
Without security mindset, one tends to think an unstoppable AI is a-priori likely to do what humans want, since humans built it. With security mindset, one sees that most AIs are nukes that wreak havoc on human values, and getting them to do what humans want is analogous to building crash-proof software for a space probe, except the whole human race only gets to launch one probe and it goes to whoever launches it first.
I’d like to see this kind of discussion with someone who doesn’t agree with MIRI’s sense of danger, in addition to all the discussions about how to extrapolate trends and predict development.
I think this is a really shallow argument that undersells enormously the actual reasons for caring about alignment. We have actual arguments for why unstoppable AI are not-likely to do what human wants, and they don’t need the security mindset at all. The basic outline is something like:
Since we have historically a lot of trouble writing down programs that solve more complex and general problems like language or image recognition (and successes through ML), future AI and AGI will probably the sort to “fill-in the gaps” in our request/specifications
For almost everything we could ask an AI to accomplish, there are actions that would help it and would be bad and counterintuitive from previous technology standpoint (the famous convergent subgoals)
Precisely specifying what we want without relying on common sense is incredibly hard, and doesn’t survive strong optimization (Goodhart’s law)
And competence by itself doesn’t solve the problem, because understanding what humans want doesn’t mean caring about it (Orthogonality thesis).
This line of reasoning (which is not new by any mean, it’s basically straight out of Bostrom and early Yudkowsky’s writing) justify the security mindset for AGI and alignment. Not the other way around.
(And historically, Yudkowsky wanted to build AGI before he found out about these points, which turned him into the biggest user — but not the only one by all mean — of the security mindset in alignment)
Ok I agree there are a bunch of important concepts to be aware of, such as complexity of value, and there are many ways for security mindset by itself to fail at flagging the extent of AI risk if one is ignorant of some of these other concepts.
I just think the outside view and extrapolating trends is so far from how one should reason about mere nukes, and superhuman intelligence is very nuke-like or at least has a very high chance of being nuke-like: that is, unlock unprecedentedly large rapid irreversible effects. Extrapolating from current trends would have been quite unhelpful to nuclear safety. I know Eliezer is just trying to meet other people in the discussion where they are, but it would be nice to have another discussion that seems more on-topic from Eliezer’s own perspective.