The more I hear Eliezer discussing this, the more convinced I am he is wrong.
The interviewers on the other hand look pretty sane to me, and the objections they make are very reasonable.
One of Eliezer’s (many) assumptions is that the AGI we create will be some sort of djinni of unlimited power and we will reach that point mostly without any intermediate steps
He takes things that are possibilities (e.g. intelligent beings way more powerful than us) and treats them as inevitabilities, without any nuance. E.g. you have atoms, the machine will want your atoms. Or, nanomachines are possible, the machine (all mighty) will make factories in record time and control the planet using them. Etc etc. The list is too long. There is too much magical thinking in there, and I am saddened that this doomerism has gained so much track in a community as great as LW.
All of these are model outputs. He’s written extensively about all this stuff. You can disagree with his arguments, but your comments so far imply that he has no arguments, which is untrue.
Can you point me out how I’m implying this? Honestly. I do think that EY has ton of arguments (and I am a big big fan of his work). I just thing his arguments (in this topic) are wrong
I think you implied it by calling them assumptions in your first comment, and magical thinking in your second. Arguments you disagree with aren’t really either of those things.
There are, of necessity, a fair number of assumptions in the arguments he makes. Similarly, counter-arguments to his views also make a fair number of assumptions. Given that we are talking about something that has never happened and which could happen in a number of different ways, this is inevitable.
I am saddened that this doomerism has gained so much track in a community as great as LW
You’re aware that Less Wrong (and the project of applied rationality) literally began as EY’s effort to produce a cohort of humans capable of clearly recognizing the AGI problem?
I don’t think this is a productive way to engage here. Notwithstanding the fact that LW was started for this purpose—the ultimate point is to think clearly and correctly. If it’s true that AI will cause doom, we want to believe that AI will cause doom. If not, then not.
So I don’t think LW should be a “AI doomerist” community in the sense that people who honestly disagree with AI doom are somehow outside the scope of LW or not worth engaging with. EY is the founder, not a divinely inspired prophet. Of course, LW is and can continue to be an “AI doomerist” community in the more limited sense that most people here are persuaded by the arguments that P(doom) is relatively high—but in that sense this kind of argument you have made is really besides the point. It work equally well regardless of the value of P(doom) and thus should not be credited.
in that sense this kind of argument you have made is really besides the point
One interpretation of XFrequentist’s comment is simply pointing out that mukashi’s “doomerism has gained so much track” implies a wrong history. A corrected statement would be more like “doomerism hasn’t lost track”.
I don’t think this is a productive way to engage here.
A “way of engaging” shouldn’t go so far as to disincentivize factual correction.
Fair enough. I interpreted XFrequentist as presenting this argument as an argument that AI Doomerism is correct and/or that people skeptical of Doomerism shouldn’t post those skeptical views. But i see now how your interpretation is also plausible.
Indeed, as Vladmir gleaned, I just wanted to clarify that the historical roots of LW & AGI risk are deeper than might be immediately apparent, which could offer a better explanation for the prevalence of Doomerism than, like, EY enchanting us with his eyes or whatever.
If someone stabs you with the knife, there is a possibility that there be no damage to large blood vessels and organs, so you survive. But when you are at risk of being stabbed you don’t think “I won’t treat dying from stabbing by knife as inevitability”, you think “I should avoid being stabbed, because otherwise I can die.”
Yes. But you don’t worry about him killing everyone in Washington DC, taking control of the White House and enslaving the human race. That’s my critic: he goes too easily from , a machine very intelligent can be built, to, this machine will inevitably be magically powerful and kill everyone. I’m perfectly aware of instrumental convergence and the orthogonality principle by the way, and still consider this view just wrong
You don’t need to be magically powerful to kill everyone! I think, at current biotech level, medium-sized lab with no ethical constrains and median computational resources can develop humanity-wiping virus in 10 years and the only thing that saves us is that bioweapon is out of fashion. If we enter new Cold War with mentality “If you refuse to make bioweapons for Our Country then you are Their spy!” we are pretty doomed without any AI.
Sorry, I don’t think that’s possible! The bit we are disagreeing to be specific is the “everyone”. Yes, it is possible to cause A LOT of damage like this.
I can increase my timelines from 10 years to 20 to get “kill everyone including all eukaryotic biosphere”, using some prokaryotic intracellular parasite with incredible metabolic efficiency and sufficiently alternative biochemistry to be not edible by modern organisms.
I work on prokaryotic evolution. Happy to do a zoom call and you explain to me how that works. If you are interested just send me a DM! Otherwise just ignore:)
There is reasoning hiding behind the points that seem magical to you. The AI will want our matter as resources. Avoiding processing Earth and everything in it for negentropy would require that it cares about us, and nobody knows how to train an AI that wants that.
The AI wouldn’t be oriented towards trying to find reasons for why keep the planet it started on habitable (or in one piece) for one particular species. It’s true that it’s possible the AI will discover some reasons for not killing us all that we can’t currently see, but that sounds like motivated reasoning to me (it’s also possible it will discover extra reasons to process Earth).
Other planets have more mass, higher insolation, lower gravity, lower temperature and/or rings and more (mass in) moons. I can think of reasons why any of those might be more or less desirable than the characteristics of Earth It is also possible that the AI may determine it is better off not to be on a planet at all. In addition, in a non- foom scenario, for defensive or conflict avoidance reasons the AI may wind up leaving Earth and once it does so may choose not to return.
That depends a lot on how it views the probe. In particular by doing this is it setting up a more dangerous competitor than humanity or not? Does it regard the probe as self? Has it solved the alignment problem and how good does it think it’s solution is?
No. Humans aren’t going to be the best solution. The question is whether they will be good enough that it would be a better use of resources to continue using the humans and focus on other issues.
It’s definitely possible that it will discover extra reasons to process Earth (or destroy the humans even if it doesn’t process Earth).
So, the interesting part is that it’s not enough that they’re a better source of raw material (even if they were) and better for optimizing (even if they were), because travelling to those planets also costs something.
I can think of reasons why any of those might be more or less desirable than the characteristics of Earth
So, we would need specific evidence that would cut one way but not another. If we can explain AI choosing another planet over Earth as well as we can explain it choosing Earth over another planet, we have zero knowledge.
2. This is an interesting point. I thought at first that it can simply set it up to keep synchronizing the probe with itself, so that it would be a single redundantly run process, rather than another agent. But that would involve always having to shut down periodically (so that the other half could be active for a while). But it’s plausible it would be confident enough in simply creating its copy and choosing not to modify the relevant parts of its utility function without some sort of handshake or metaprocedure. It definitely doesn’t sound like something that it would have to wait to completely solve alignment for.
3. That would give us a brief window during which humans would be tricked into or forced to work for an unaligned AI, after which it would kill us all.
If we expect there will be lots of intermediate steps—does this really change the analysis much?
How will we know once we’ve reached the point where there aren’t many intermediate steps left before crossing a crticial threshold? How do you expect everyone’s behaviour to change once we do get close?
If we expect there will be lots of intermediate steps—does this really change the analysis much? >
I think so yes. One fundamental way is that you might develop machines that are intelligent enough to produce new knowledge at a speed and quality above the current capacity of humans, without those machines being necessarily agentic. Those machines could potentially work in the alignment problem themselves
I think I know what EY objection would be (I might be wrong): a machine capable of doing that is already an AGI and henceforth already deadly. Well, I think this argument would be wrong too. I can envision a machine capable of doing science and not necessarily being agentic.
How will we know once we’ve reached the point where there aren’t many intermediate steps left before crossing a crticial threshold? >
I don’t know if it is useful to think in terms of thresholds. A threshold to what? To an AGI? To an AGI of unlimited power? Before making a very intelligent machine there will be less intelligent machines. The leap can be very quick, but I don’t expect that there will be at any point one single entity that is so powerful that will dominate any other life forms in a very short time (a window of time shorter than it takes to other companies/groups to develop similar entities). How do I know that? I don’t, but when I hear all the possible scenarios in which a machine pulls off a “end of the world” scenario, they all are based on the assumption (and I think it is fair calling it this way) that the machine will have almost unlimited power, e.g. it is able to simulate nanomachines and then devise a plan to successfully deploy simultaneously those nanomachines everywhere while being hidden. It is this part of the argument that I have problems with: it assumes that these things are possible in the first place. And some things are not, even if you have 100000 Von Neumanns thinking for 1000000 years. A machine that can play Go at the God level can’t win a game against AlphaZero with 20 handicap.
How do you expect everyone’s behaviour to change once we do get close?
Close to develop an AGI? I think we are close now. I just don’t think it will mean the end of the world.
While you can envision something, it doesn’t mean that envisioned is logically coherent/possible/trivial to achieve. In one fantasy novel protagonists travel to the world were physical laws make it impossible to light matches. It’s very easy to imagine that you try to light match again and again and fail, but “impossibility to light matches” implies such drastic changes in physical laws that Earth’s life probably can’t sustain itself here, because heads of matches contain phosphorous and phosphorous is vital for bodily processes (and I don’t even go for universe-wide consequences of different physical constants).
So it’s very easy to imagine terminal where you print “how to solve alignment?”, press “enter”, get solution after an hour and everybody lives happily ever after. But I can’t imagine how this thing should work without developing agency, if I don’t say in some moment “here happens Magic that prevents this system from developing agency”.
AFAIK, Eliezer Yukowsky is one of Everett’s Multiple Worlds interpretation of QM, proponents. As such, he should combine the small, non-zero probability that everything is going to go well with AGI, and this MWI thing. So, there will be some branches where all is going to be well, even if the majority of them will be sterilized. Who cares for those! Thanks to Everett, all will look just fine for the survivors.
I see this as a contradiction in his belief system, not necessarily that he is wrong about AGI.
I think this is a bad way to think about probabilities under the Everett interpretation, for two reasons.
First, it’s a fully general argument against caring about the possibility of your own death. If this were a good way of thinking, then if you offer me $1 to play Russian roulette with bullets in 5 of the 6 chambers then I should take it—because the only branches where I continue to exist are ones where I didn’t get killed. That’s obviously stupid: it cannot possibly be unreasonable to care whether or not one dies. If it were a necessary consequence of the Everett interpretation, then I might say “OK, this means that one can’t coherently accept the Everett interpretation” or “hmm, seems like I have to completely rethink my preferences”, but in fact it is not a necessary consequence of the Everett interpretation.
Second, it ignores the possibility of branches where we survive but horribly. In that Russian roulette game, there are cases where I do get shot through the head but survive with terrible brain damage. In the unfriendly-AI scenarios, there are cases where the human race survives but unhappily. In either case the probability is small, but maybe not so small as a fraction of survival cases.
I think the only reasonable attitude to one’s future branches, if one accepts the Everett interpretation, is to care about all those branches, including those where one doesn’t survive, with weight corresponding to |psi|^2. That is, to treat “quantum probabilities” the same way as “ordinary probabilities”. (This attitude seems perfectly reasonable to me conditional on Everett.)
The alignment problem still has to get solved somehow in those branches, which almost all merely have slightly different versions of us doing mostly the same sorts of things.
What might be different in these branches is that world-ending AGIs have anomalously bad luck in getting started. But the vast majority of anthropic weight, even after selecting for winning branches, will be on branches that are pretty ordinary, and where the alignment problem still had to get solved the hard way, by people who were basically just luckier versions of us.
So even if we decide to stake our hope on those possibilities, it’s pretty much the same as staking hope on luckier versions of ourselves who still did the hard work. It doesn’t really change anything for us here and now; we still need to do the same sorts of things. It all adds up to normality.
If anthropic stuff actually works out like this, then this is great news for values over experiences, which will still be about as satisfiable as they were before, despite our impending doom. But values over world-states will not be at all consoled.
I suspect human values are a complicated mix of the two, with things like male-libido being far on the experience end (since each additional experience of sexual pleasure would correspond in the ancestral environment to a roughly linear increase in reproductive fitness), and things like maternal-love being far on the world-state end (since it needs to actually track the well-being of the children, even in cases where no further experiences are expected), and most things lying somewhere in the middle.
The more I hear Eliezer discussing this, the more convinced I am he is wrong.
The interviewers on the other hand look pretty sane to me, and the objections they make are very reasonable.
One of Eliezer’s (many) assumptions is that the AGI we create will be some sort of djinni of unlimited power and we will reach that point mostly without any intermediate steps
I think calling this an assumption is misleading. He’s written extensively about why he thinks this is true. It’s a result/output of his model.
He takes things that are possibilities (e.g. intelligent beings way more powerful than us) and treats them as inevitabilities, without any nuance. E.g. you have atoms, the machine will want your atoms. Or, nanomachines are possible, the machine (all mighty) will make factories in record time and control the planet using them. Etc etc. The list is too long. There is too much magical thinking in there, and I am saddened that this doomerism has gained so much track in a community as great as LW.
All of these are model outputs. He’s written extensively about all this stuff. You can disagree with his arguments, but your comments so far imply that he has no arguments, which is untrue.
Can you point me out how I’m implying this? Honestly. I do think that EY has ton of arguments (and I am a big big fan of his work). I just thing his arguments (in this topic) are wrong
I think you implied it by calling them assumptions in your first comment, and magical thinking in your second. Arguments you disagree with aren’t really either of those things.
Fair enough
There are, of necessity, a fair number of assumptions in the arguments he makes. Similarly, counter-arguments to his views also make a fair number of assumptions. Given that we are talking about something that has never happened and which could happen in a number of different ways, this is inevitable.
You’re aware that Less Wrong (and the project of applied rationality) literally began as EY’s effort to produce a cohort of humans capable of clearly recognizing the AGI problem?
I don’t think this is a productive way to engage here. Notwithstanding the fact that LW was started for this purpose—the ultimate point is to think clearly and correctly. If it’s true that AI will cause doom, we want to believe that AI will cause doom. If not, then not.
So I don’t think LW should be a “AI doomerist” community in the sense that people who honestly disagree with AI doom are somehow outside the scope of LW or not worth engaging with. EY is the founder, not a divinely inspired prophet. Of course, LW is and can continue to be an “AI doomerist” community in the more limited sense that most people here are persuaded by the arguments that P(doom) is relatively high—but in that sense this kind of argument you have made is really besides the point. It work equally well regardless of the value of P(doom) and thus should not be credited.
One interpretation of XFrequentist’s comment is simply pointing out that mukashi’s “doomerism has gained so much track” implies a wrong history. A corrected statement would be more like “doomerism hasn’t lost track”.
A “way of engaging” shouldn’t go so far as to disincentivize factual correction.
Fair enough. I interpreted XFrequentist as presenting this argument as an argument that AI Doomerism is correct and/or that people skeptical of Doomerism shouldn’t post those skeptical views. But i see now how your interpretation is also plausible.
Indeed, as Vladmir gleaned, I just wanted to clarify that the historical roots of LW & AGI risk are deeper than might be immediately apparent, which could offer a better explanation for the prevalence of Doomerism than, like, EY enchanting us with his eyes or whatever.
If someone stabs you with the knife, there is a possibility that there be no damage to large blood vessels and organs, so you survive. But when you are at risk of being stabbed you don’t think “I won’t treat dying from stabbing by knife as inevitability”, you think “I should avoid being stabbed, because otherwise I can die.”
Yes. But you don’t worry about him killing everyone in Washington DC, taking control of the White House and enslaving the human race. That’s my critic: he goes too easily from , a machine very intelligent can be built, to, this machine will inevitably be magically powerful and kill everyone. I’m perfectly aware of instrumental convergence and the orthogonality principle by the way, and still consider this view just wrong
You don’t need to be magically powerful to kill everyone! I think, at current biotech level, medium-sized lab with no ethical constrains and median computational resources can develop humanity-wiping virus in 10 years and the only thing that saves us is that bioweapon is out of fashion. If we enter new Cold War with mentality “If you refuse to make bioweapons for Our Country then you are Their spy!” we are pretty doomed without any AI.
Sorry, I don’t think that’s possible! The bit we are disagreeing to be specific is the “everyone”. Yes, it is possible to cause A LOT of damage like this.
I can increase my timelines from 10 years to 20 to get “kill everyone including all eukaryotic biosphere”, using some prokaryotic intracellular parasite with incredible metabolic efficiency and sufficiently alternative biochemistry to be not edible by modern organisms.
I work on prokaryotic evolution. Happy to do a zoom call and you explain to me how that works. If you are interested just send me a DM! Otherwise just ignore:)
There is reasoning hiding behind the points that seem magical to you. The AI will want our matter as resources. Avoiding processing Earth and everything in it for negentropy would require that it cares about us, and nobody knows how to train an AI that wants that.
This is just wrong. Avoiding processing Earth doesn’t require that the AI cares for us. Other possibilities include:
(1) Earth is not worth it; the AI determines that getting off Earth fast is better;
(2) AI determines that it is unsure that it can process Earth without unacceptable risk to itself;
(3) AI determines that humans are actually useful to it one way or another;
(4) Other possibilities that a super-intelligent AI can think of, that we can’t.
What does the negentropy on other planets have that Earth doesn’t, that will result in the AI quickly getting off Earth without processing it first?
Send a probe away from Earth and also undergo the 0.001% risk of being destroyed while trying to take over Earth.
In just the right way that would make it most profitable for the AI to use humans instead of some other solution?
The AI wouldn’t be oriented towards trying to find reasons for why keep the planet it started on habitable (or in one piece) for one particular species. It’s true that it’s possible the AI will discover some reasons for not killing us all that we can’t currently see, but that sounds like motivated reasoning to me (it’s also possible it will discover extra reasons to process Earth).
Other planets have more mass, higher insolation, lower gravity, lower temperature and/or rings and more (mass in) moons. I can think of reasons why any of those might be more or less desirable than the characteristics of Earth It is also possible that the AI may determine it is better off not to be on a planet at all. In addition, in a non- foom scenario, for defensive or conflict avoidance reasons the AI may wind up leaving Earth and once it does so may choose not to return.
That depends a lot on how it views the probe. In particular by doing this is it setting up a more dangerous competitor than humanity or not? Does it regard the probe as self? Has it solved the alignment problem and how good does it think it’s solution is?
No. Humans aren’t going to be the best solution. The question is whether they will be good enough that it would be a better use of resources to continue using the humans and focus on other issues.
It’s definitely possible that it will discover extra reasons to process Earth (or destroy the humans even if it doesn’t process Earth).
So, the interesting part is that it’s not enough that they’re a better source of raw material (even if they were) and better for optimizing (even if they were), because travelling to those planets also costs something.
So, we would need specific evidence that would cut one way but not another. If we can explain AI choosing another planet over Earth as well as we can explain it choosing Earth over another planet, we have zero knowledge.
2. This is an interesting point. I thought at first that it can simply set it up to keep synchronizing the probe with itself, so that it would be a single redundantly run process, rather than another agent. But that would involve always having to shut down periodically (so that the other half could be active for a while). But it’s plausible it would be confident enough in simply creating its copy and choosing not to modify the relevant parts of its utility function without some sort of handshake or metaprocedure. It definitely doesn’t sound like something that it would have to wait to completely solve alignment for.
3. That would give us a brief window during which humans would be tricked into or forced to work for an unaligned AI, after which it would kill us all.
If we expect there will be lots of intermediate steps—does this really change the analysis much?
How will we know once we’ve reached the point where there aren’t many intermediate steps left before crossing a crticial threshold? How do you expect everyone’s behaviour to change once we do get close?
If we expect there will be lots of intermediate steps—does this really change the analysis much? >
I think so yes. One fundamental way is that you might develop machines that are intelligent enough to produce new knowledge at a speed and quality above the current capacity of humans, without those machines being necessarily agentic. Those machines could potentially work in the alignment problem themselves
I think I know what EY objection would be (I might be wrong): a machine capable of doing that is already an AGI and henceforth already deadly. Well, I think this argument would be wrong too. I can envision a machine capable of doing science and not necessarily being agentic.
How will we know once we’ve reached the point where there aren’t many intermediate steps left before crossing a crticial threshold? >
I don’t know if it is useful to think in terms of thresholds. A threshold to what? To an AGI? To an AGI of unlimited power? Before making a very intelligent machine there will be less intelligent machines. The leap can be very quick, but I don’t expect that there will be at any point one single entity that is so powerful that will dominate any other life forms in a very short time (a window of time shorter than it takes to other companies/groups to develop similar entities). How do I know that? I don’t, but when I hear all the possible scenarios in which a machine pulls off a “end of the world” scenario, they all are based on the assumption (and I think it is fair calling it this way) that the machine will have almost unlimited power, e.g. it is able to simulate nanomachines and then devise a plan to successfully deploy simultaneously those nanomachines everywhere while being hidden. It is this part of the argument that I have problems with: it assumes that these things are possible in the first place. And some things are not, even if you have 100000 Von Neumanns thinking for 1000000 years. A machine that can play Go at the God level can’t win a game against AlphaZero with 20 handicap.
How do you expect everyone’s behaviour to change once we do get close?
Close to develop an AGI? I think we are close now. I just don’t think it will mean the end of the world.
While you can envision something, it doesn’t mean that envisioned is logically coherent/possible/trivial to achieve. In one fantasy novel protagonists travel to the world were physical laws make it impossible to light matches. It’s very easy to imagine that you try to light match again and again and fail, but “impossibility to light matches” implies such drastic changes in physical laws that Earth’s life probably can’t sustain itself here, because heads of matches contain phosphorous and phosphorous is vital for bodily processes (and I don’t even go for universe-wide consequences of different physical constants).
So it’s very easy to imagine terminal where you print “how to solve alignment?”, press “enter”, get solution after an hour and everybody lives happily ever after. But I can’t imagine how this thing should work without developing agency, if I don’t say in some moment “here happens Magic that prevents this system from developing agency”.
AFAIK, Eliezer Yukowsky is one of Everett’s Multiple Worlds interpretation of QM, proponents. As such, he should combine the small, non-zero probability that everything is going to go well with AGI, and this MWI thing. So, there will be some branches where all is going to be well, even if the majority of them will be sterilized. Who cares for those! Thanks to Everett, all will look just fine for the survivors.
I see this as a contradiction in his belief system, not necessarily that he is wrong about AGI.
I think this is a bad way to think about probabilities under the Everett interpretation, for two reasons.
First, it’s a fully general argument against caring about the possibility of your own death. If this were a good way of thinking, then if you offer me $1 to play Russian roulette with bullets in 5 of the 6 chambers then I should take it—because the only branches where I continue to exist are ones where I didn’t get killed. That’s obviously stupid: it cannot possibly be unreasonable to care whether or not one dies. If it were a necessary consequence of the Everett interpretation, then I might say “OK, this means that one can’t coherently accept the Everett interpretation” or “hmm, seems like I have to completely rethink my preferences”, but in fact it is not a necessary consequence of the Everett interpretation.
Second, it ignores the possibility of branches where we survive but horribly. In that Russian roulette game, there are cases where I do get shot through the head but survive with terrible brain damage. In the unfriendly-AI scenarios, there are cases where the human race survives but unhappily. In either case the probability is small, but maybe not so small as a fraction of survival cases.
I think the only reasonable attitude to one’s future branches, if one accepts the Everett interpretation, is to care about all those branches, including those where one doesn’t survive, with weight corresponding to |psi|^2. That is, to treat “quantum probabilities” the same way as “ordinary probabilities”. (This attitude seems perfectly reasonable to me conditional on Everett.)
The alignment problem still has to get solved somehow in those branches, which almost all merely have slightly different versions of us doing mostly the same sorts of things.
What might be different in these branches is that world-ending AGIs have anomalously bad luck in getting started. But the vast majority of anthropic weight, even after selecting for winning branches, will be on branches that are pretty ordinary, and where the alignment problem still had to get solved the hard way, by people who were basically just luckier versions of us.
So even if we decide to stake our hope on those possibilities, it’s pretty much the same as staking hope on luckier versions of ourselves who still did the hard work. It doesn’t really change anything for us here and now; we still need to do the same sorts of things. It all adds up to normality.
Another consideration I thought of:
If anthropic stuff actually works out like this, then this is great news for values over experiences, which will still be about as satisfiable as they were before, despite our impending doom. But values over world-states will not be at all consoled.
I suspect human values are a complicated mix of the two, with things like male-libido being far on the experience end (since each additional experience of sexual pleasure would correspond in the ancestral environment to a roughly linear increase in reproductive fitness), and things like maternal-love being far on the world-state end (since it needs to actually track the well-being of the children, even in cases where no further experiences are expected), and most things lying somewhere in the middle.