I think that for EY and a large fraction of the LW/alignment community might be frustrating to hear uneducated newcomers make what they think are obvious mistakes and repeat the same arguments they have heard for years. The fact that we are talking about doom does not help a bit either: it must be similar to the desperation felt by a pilot that knows his plane is heading straight to a mountain on a collision course while the crew keeps asking whether the inflatable slides are working.
So this comment is coming from one of those uneducated readers. I know the basics: I read the Sequences (maybe my favourite book), the road to Superintelligence and many other articles on the topic, but there are many, many things that I am aware I don’t fully grasp. Given that I want to correct that, in my position, the best thing I can do is post things with probably silly opinions like this comment, which allows me to be educated by others.
To me, the weakest point in the chain of reasoning of the OP is 4.
The things I see as clearly obvious are (points are mine):
1. Humans are not in the upper bound of intelligence. 2 - Machines will reach eventually (and probably in the next few years) superhuman intelligence. 3 - The (social and economic) changes associated with this will be unprecedented.
The other important things I don’t see as obvious at all but are very often taken for granted are:
4. I don’t see why a machine that is able to make plans is the same as a machine that is able to execute those plans. For example, I can envision a machine that is able to generate the text describing with a lot of detail how to damage the economy of a country X and not necessarily having the power to execute it unless there are humans behind implementing those actions. Imagination and action are different things.
5. I don’t see why a large fraction of the community assumes that extraordinary things like nanotechnology can be achieved very quickly and no major hurdles will be found, even with AGI. Creating a specific industry for new technology could be more complex than we think. The protein-folding problem would not have been solved without decades of crystallography behind. Intelligence by itself might not be a sufficient condition to develop things like advanced nanotechnology that can kill all humans at once.
6. I don’t see why we are taking for granted that there are no limits to the capacity of an AGI in terms of capacity for knowledge/planning. There might be limits in what is possible to be known/planned that we are not aware of and that would dramatically reduce the effectiveness of a machine trying to take over the world. It seems to me that if the discussion about AGI was taken place before the discovery of deterministic chaos, someone could be very well arguing something like: the machine uses its infinite intelligence to predict the weather 10 years from now when there will be a massive blizzard the 10th of October that is also the day that blah blah blah. Today we know that there are systems that are unpredictable even with arbitrarily precise measurements. This is just an example of a limit of what can be known, but there might be many others.
Some other things I think are playing a role in the overly pessimistic take of the LW community:
7. I think there is a vicious circle in which many people have fallen: Doom might be possible, so we talk about it because it is terrifying. Given that there are people talking about this, due to the availability bias, other people update towards higher estimates of p(doom). Which makes the doom scenario even more terrifying.
8. EY has a disproportionate impact on the community (for obvious reasons) and the more moderate predictions are not discussed so much.
I don’t see why a machine that is able to make plans is the same as a machine that is able to execute those plans. For example, I can envision a machine that is able to generate the text describing with a lot of detail how to damage the economy of a country X and not necessarily having the power to execute it unless there are humans behind implementing those actions. Imagination and action are different things.
I suspect one of the generators of disagreements here is that MIRI folks don’t think imagination and action are (fundamentally) different things.
Like, there’s an intuitive human distinction between “events that happen inside your brain” and “events that happen outside your brain”. And there’s an intuitive human distinction between “controlling the direction of thoughts inside your brain so that you can reach useful conclusions” and “controlling the direction of events outside your brain so that you can reach useful outcomes”.
But it isn’t trivial to get an AGI system to robustly recognize and respect that exact distinction, so that it optimizes only ‘things inside its head’ (while nonetheless producing outputs that are useful for external events and are entangled with information about the external world). And it’s even less trivial to make an AGI system robustly incapable of acting on the physical world, while having all the machinery for doing amazing reasoning about the physical world, and for taking all the internal actions required to perform that reasoning.
Thanks for the excellent comment and further questions! A few of these I think I can answer partially, and I’ll try to remember to respond to this post later if I come across any other/better answers to your questions (and perhaps other readers can also answer some now).
I don’t see why a machine that is able to make plans is the same as a machine that is able to execute those plans.
My understanding is that while the two are different in principle, in practice, ensuring that an AGI doesn’t act on that knowledge is an extremely hard problem. Why is it such a hard problem? I have no idea, lol. What is probably relevant here is Yudkowsky’s AI-in-a-box experiment, which purports (successfully imo, though I know it’s controversial) to show that even an AI which can only interface with the world via text can convince humans to act on its behalf, even if the humans are strongly incentivized not to do so. If you have an AI which dreams up an AGI, that AGI is now in existence, albeit heavily boxed. If it can convince the containing AI that releasing it would help it fulfil its goal of predicting things properly or whatever, then we’re still doomed. However, this line of argument feels weak to me, especially if it doesn’t require already having an AGI in order to know how to build one (which I would assume to be the case). Your general point stands, and I don’t know the technical reason why differentiating between “imagination” and “action” (as you excellently put it) is so hard.
I don’t see why a large fraction of the community assumes that extraordinary things like nanotechnology can be achieved very quickly and no major hurdles will be found, even with AGI.
A partial response to this may be that it doesn’t need to be nanotechnology, or any one invention, which will be achieved quickly. All we need for AGI to be existentially dangerous is for it to be able to make a major breakthrough in some area which gives it power to destroy us. See for example this story, where an AI was able to create a whole bunch of extremely deadly chemical weapons with barely any major modifications to its original code. This suggests that while there may in fact be hurdles for an AGI to overcome in nanotech and elsewhere, that won’t really matter much for world-ending purposes. The technology mostly exists already, and it would just be a matter of convincing the right people to take a fairly simple sequence of actions.
I don’t see why we are taken [sic?] for granted that there are no limits to the capacity of an AGI in terms of capacity for knowledge/planning.
Do we take that for granted? I don’t think we really need to assume a FOOM scenario for an AGI to do tremendous damage. Just by ourselves, with human-level intelligence, we’ve gotten close to destroying the world a few too many times to be reassuring. Imagine if an Einstein-level human genius decided to devote themselves to killing humanity. They probably wouldn’t succeed, but I sure wouldn’t bet on it! I can personally think of a few things I could do if I was marginally smarter/more resourceful which could plausibly kill 1,000,000,000+ people (don’t worry, I have no intentions of doing anything nefarious). AGI doesn’t need to be all that smarter than us to be an X-risk level threat, if it’s too horrifically unaligned.
Hi Yitz, just a clarification. In my view p(doom) != 0. I can’t say any meaningful number but if you force me to give you an estimate, it would be probably close to 1% in the next 50 years. Maybe less, maybe a bit more, but in the ballpark. I find EY et al.’s arguments about what is possible compelling: I think that extinction by AI is definitely a possibility. This means that it makes a lot of sense to explore this subject as they are doing, and they have my most sincere admiration for carrying out their research outside conventional academia. What I most disagree about is their estimate of the likelihood of such an event: most of the discussions I have read are about how doom is just a fait accompli: it is not so much a question of will it take place? but, when? And they are looking into the future making a set of predictions that seem bizarrely precise, trying to say how things will happen step by step (I am thinking mostly about the conversations among the MIRI leaders that took place a few months ago). The reasons stated above (and the ones that I added in the comment I made in your other post) are mostly reasons why things could go differently. So for instance, yes, I can envision a machine that is able to imagine and act. But I can also envision the opposite thing, and that’s what I am trying to convey: that there are many reasons why things could go differently. For now, it seems to me that the doom predictions will fail, and will fail badly. Brian Caplan is getting that money.
Something else I want to raise is that we seem to have different definitions of doom.
I can personally think of a few things I could do if I was marginally smarter/more resourceful which could plausibly kill 1,000,000,000+ people (don’t worry, I have no intentions of doing anything nefarious). AGI doesn’t need to be all that smarter than us to be an X-risk level threat, if it’s too horrifically unaligned.
Oh yes, I totally agree with this (although maybe not in 10 years), that’s why I think it makes a lot of sense to carry out research on alignment. But watch out: EY would tell you* that if an AGI decides to kill only 1 billion people, then you would have solved the alignment problem! So it seems we have different versions of doom.
For me, a valid definition of doom is—Everyone who can continue making any significant technological progress dies, and the process is irreversible. If the whole Western World disappears and only China remains, that is a catastrophe, but the world keeps going. If the only people alive are the guys in the Andaman Islands, that is pretty much game over, and then we are talking about a doom scenario.
*I remember reading once that sentence quite literally from EY, I think it was in the context of an AGI killing all the world except China, or something similar. If someone can find the reference that would be great, otherwise, I hope I am not misrepresenting what the big man said himself. If I am, happy to retract this comment.
I’m in the same situation as you re education status. That being said, my understanding of your 5th point is that nanotechnology doesn’t necessarily mean nanotechnology. It’s more of a placeholder for generic magic technology which can’t be forseen specifically. Like gunpowder or the internet. It seems like this is obvious to you, just wanted to make sure of it.
Gunpowder took a few centuries to totally transform the battlefield, the internet a few decades. Looking at history, there are more and more revolutionary inventions taking shorter and shorter to be developed. So it seems safer to be pessimistic and assume that a new disruptive technology could be invented on really short timescales e.g. some super bacteria via. CRISPR or something. These benefit from the centuries of prior research, standing on shoulders etc. There’s also the fruitfulness of combining domains.
Next, there seems to be an assumption that research scales somehow along with intelligence. Maybe not linearly, but still. This seems somewhat valid—humans having invented a lot more than killer whales, who in turn have invented a lot more than marmots. So if you manage to create something a lot more intelligent (or even just like twice, whatever that means), it seems reasonable to assume that it’s possible for it too have appropriate speed ups in research ability. This of course could be invalidated by your 6th point.
Also, a limiting factor in research can be that you have to run lots of experiments to see if things work out. Simulations can help a lot with this. They don’t even have to be too precise to be useful. So you could imagine an AI that want’s to find a way to kill off humans and looks for something poisonous. It could make a model that classifies molecules by toxicity and then tries to find something [maximally toxic](https://www.theverge.com/2022/3/17/22983197/ai-new-possible-chemical-weapons-generative-models-vx), after which it could just test the 10 ten candidates.
It’s not a given that any of these assumptions would hold. But if they did, then Bad Things would happen Fast. Which seems like something worth worrying about a lot. I also have the feeling that it depends on what kind of AI is posited.
If it’s just a better Einstein, then it’s unlikely that it’ll manage to kill everyone off too quickly
If it’s a better Einstein, but which thinks 1000 times faster (human brains don’t work all that fast), then we’re in trouble
If it’s properly superhumanly intelligent (i.e. > 400 IQ? dunno?) then who knows what it could come up with. And that’s before considering how fast it thinks.
I think that for EY and a large fraction of the LW/alignment community might be frustrating to hear uneducated newcomers make what they think are obvious mistakes and repeat the same arguments they have heard for years. The fact that we are talking about doom does not help a bit either: it must be similar to the desperation felt by a pilot that knows his plane is heading straight to a mountain on a collision course while the crew keeps asking whether the inflatable slides are working.
So this comment is coming from one of those uneducated readers. I know the basics: I read the Sequences (maybe my favourite book), the road to Superintelligence and many other articles on the topic, but there are many, many things that I am aware I don’t fully grasp. Given that I want to correct that, in my position, the best thing I can do is post things with probably silly opinions like this comment, which allows me to be educated by others.
To me, the weakest point in the chain of reasoning of the OP is 4.
The things I see as clearly obvious are (points are mine):
1. Humans are not in the upper bound of intelligence. 2 - Machines will reach eventually (and probably in the next few years) superhuman intelligence. 3 - The (social and economic) changes associated with this will be unprecedented.
The other important things I don’t see as obvious at all but are very often taken for granted are:
4. I don’t see why a machine that is able to make plans is the same as a machine that is able to execute those plans. For example, I can envision a machine that is able to generate the text describing with a lot of detail how to damage the economy of a country X and not necessarily having the power to execute it unless there are humans behind implementing those actions. Imagination and action are different things.
5. I don’t see why a large fraction of the community assumes that extraordinary things like nanotechnology can be achieved very quickly and no major hurdles will be found, even with AGI. Creating a specific industry for new technology could be more complex than we think. The protein-folding problem would not have been solved without decades of crystallography behind. Intelligence by itself might not be a sufficient condition to develop things like advanced nanotechnology that can kill all humans at once.
6. I don’t see why we are taking for granted that there are no limits to the capacity of an AGI in terms of capacity for knowledge/planning. There might be limits in what is possible to be known/planned that we are not aware of and that would dramatically reduce the effectiveness of a machine trying to take over the world. It seems to me that if the discussion about AGI was taken place before the discovery of deterministic chaos, someone could be very well arguing something like: the machine uses its infinite intelligence to predict the weather 10 years from now when there will be a massive blizzard the 10th of October that is also the day that blah blah blah. Today we know that there are systems that are unpredictable even with arbitrarily precise measurements. This is just an example of a limit of what can be known, but there might be many others.
Some other things I think are playing a role in the overly pessimistic take of the LW community:
7. I think there is a vicious circle in which many people have fallen: Doom might be possible, so we talk about it because it is terrifying. Given that there are people talking about this, due to the availability bias, other people update towards higher estimates of p(doom). Which makes the doom scenario even more terrifying.
8. EY has a disproportionate impact on the community (for obvious reasons) and the more moderate predictions are not discussed so much.
I suspect one of the generators of disagreements here is that MIRI folks don’t think imagination and action are (fundamentally) different things.
Like, there’s an intuitive human distinction between “events that happen inside your brain” and “events that happen outside your brain”. And there’s an intuitive human distinction between “controlling the direction of thoughts inside your brain so that you can reach useful conclusions” and “controlling the direction of events outside your brain so that you can reach useful outcomes”.
But it isn’t trivial to get an AGI system to robustly recognize and respect that exact distinction, so that it optimizes only ‘things inside its head’ (while nonetheless producing outputs that are useful for external events and are entangled with information about the external world). And it’s even less trivial to make an AGI system robustly incapable of acting on the physical world, while having all the machinery for doing amazing reasoning about the physical world, and for taking all the internal actions required to perform that reasoning.
Thanks for the excellent comment and further questions! A few of these I think I can answer partially, and I’ll try to remember to respond to this post later if I come across any other/better answers to your questions (and perhaps other readers can also answer some now).
My understanding is that while the two are different in principle, in practice, ensuring that an AGI doesn’t act on that knowledge is an extremely hard problem. Why is it such a hard problem? I have no idea, lol. What is probably relevant here is Yudkowsky’s AI-in-a-box experiment, which purports (successfully imo, though I know it’s controversial) to show that even an AI which can only interface with the world via text can convince humans to act on its behalf, even if the humans are strongly incentivized not to do so. If you have an AI which dreams up an AGI, that AGI is now in existence, albeit heavily boxed. If it can convince the containing AI that releasing it would help it fulfil its goal of predicting things properly or whatever, then we’re still doomed. However, this line of argument feels weak to me, especially if it doesn’t require already having an AGI in order to know how to build one (which I would assume to be the case). Your general point stands, and I don’t know the technical reason why differentiating between “imagination” and “action” (as you excellently put it) is so hard.
A partial response to this may be that it doesn’t need to be nanotechnology, or any one invention, which will be achieved quickly. All we need for AGI to be existentially dangerous is for it to be able to make a major breakthrough in some area which gives it power to destroy us. See for example this story, where an AI was able to create a whole bunch of extremely deadly chemical weapons with barely any major modifications to its original code. This suggests that while there may in fact be hurdles for an AGI to overcome in nanotech and elsewhere, that won’t really matter much for world-ending purposes. The technology mostly exists already, and it would just be a matter of convincing the right people to take a fairly simple sequence of actions.
Do we take that for granted? I don’t think we really need to assume a FOOM scenario for an AGI to do tremendous damage. Just by ourselves, with human-level intelligence, we’ve gotten close to destroying the world a few too many times to be reassuring. Imagine if an Einstein-level human genius decided to devote themselves to killing humanity. They probably wouldn’t succeed, but I sure wouldn’t bet on it! I can personally think of a few things I could do if I was marginally smarter/more resourceful which could plausibly kill 1,000,000,000+ people (don’t worry, I have no intentions of doing anything nefarious). AGI doesn’t need to be all that smarter than us to be an X-risk level threat, if it’s too horrifically unaligned.
Hi Yitz, just a clarification. In my view p(doom) != 0. I can’t say any meaningful number but if you force me to give you an estimate, it would be probably close to 1% in the next 50 years. Maybe less, maybe a bit more, but in the ballpark. I find EY et al.’s arguments about what is possible compelling: I think that extinction by AI is definitely a possibility. This means that it makes a lot of sense to explore this subject as they are doing, and they have my most sincere admiration for carrying out their research outside conventional academia. What I most disagree about is their estimate of the likelihood of such an event: most of the discussions I have read are about how doom is just a fait accompli: it is not so much a question of will it take place? but, when? And they are looking into the future making a set of predictions that seem bizarrely precise, trying to say how things will happen step by step (I am thinking mostly about the conversations among the MIRI leaders that took place a few months ago). The reasons stated above (and the ones that I added in the comment I made in your other post) are mostly reasons why things could go differently. So for instance, yes, I can envision a machine that is able to imagine and act. But I can also envision the opposite thing, and that’s what I am trying to convey: that there are many reasons why things could go differently. For now, it seems to me that the doom predictions will fail, and will fail badly. Brian Caplan is getting that money.
Something else I want to raise is that we seem to have different definitions of doom.
Oh yes, I totally agree with this (although maybe not in 10 years), that’s why I think it makes a lot of sense to carry out research on alignment. But watch out: EY would tell you* that if an AGI decides to kill only 1 billion people, then you would have solved the alignment problem! So it seems we have different versions of doom.
For me, a valid definition of doom is—Everyone who can continue making any significant technological progress dies, and the process is irreversible. If the whole Western World disappears and only China remains, that is a catastrophe, but the world keeps going. If the only people alive are the guys in the Andaman Islands, that is pretty much game over, and then we are talking about a doom scenario.
*I remember reading once that sentence quite literally from EY, I think it was in the context of an AGI killing all the world except China, or something similar. If someone can find the reference that would be great, otherwise, I hope I am not misrepresenting what the big man said himself. If I am, happy to retract this comment.
I’m in the same situation as you re education status. That being said, my understanding of your 5th point is that nanotechnology doesn’t necessarily mean nanotechnology. It’s more of a placeholder for generic magic technology which can’t be forseen specifically. Like gunpowder or the internet. It seems like this is obvious to you, just wanted to make sure of it.
Gunpowder took a few centuries to totally transform the battlefield, the internet a few decades. Looking at history, there are more and more revolutionary inventions taking shorter and shorter to be developed. So it seems safer to be pessimistic and assume that a new disruptive technology could be invented on really short timescales e.g. some super bacteria via. CRISPR or something. These benefit from the centuries of prior research, standing on shoulders etc. There’s also the fruitfulness of combining domains.
Next, there seems to be an assumption that research scales somehow along with intelligence. Maybe not linearly, but still. This seems somewhat valid—humans having invented a lot more than killer whales, who in turn have invented a lot more than marmots. So if you manage to create something a lot more intelligent (or even just like twice, whatever that means), it seems reasonable to assume that it’s possible for it too have appropriate speed ups in research ability. This of course could be invalidated by your 6th point.
Also, a limiting factor in research can be that you have to run lots of experiments to see if things work out. Simulations can help a lot with this. They don’t even have to be too precise to be useful. So you could imagine an AI that want’s to find a way to kill off humans and looks for something poisonous. It could make a model that classifies molecules by toxicity and then tries to find something [maximally toxic](https://www.theverge.com/2022/3/17/22983197/ai-new-possible-chemical-weapons-generative-models-vx), after which it could just test the 10 ten candidates.
It’s not a given that any of these assumptions would hold. But if they did, then Bad Things would happen Fast. Which seems like something worth worrying about a lot. I also have the feeling that it depends on what kind of AI is posited.
If it’s just a better Einstein, then it’s unlikely that it’ll manage to kill everyone off too quickly
If it’s a better Einstein, but which thinks 1000 times faster (human brains don’t work all that fast), then we’re in trouble
If it’s properly superhumanly intelligent (i.e. > 400 IQ? dunno?) then who knows what it could come up with. And that’s before considering how fast it thinks.