Why is Eliezer such a downer? We simply don’t know how things are going to turn out—I believe he’s right about how we should approach AI. And regarding any technical points I’d guess he is more right than anyone else.
That doesn’t justify, in my opinion, going around and instilling a defeatist attitude in anyone who wants to take him seriously. Seriously, people are looking up to you. That’s not how you treat them.
EDIT: I rewatched part of the podcast. Previously, I had only seen the final snippets, in particular where he talks about the conference involving Elon Musk. The problem I have doesn’t pertain to the rest of the podcast, which makes up a big proportion, so I would weaken the indignant tone above quite a bit.
Still, I have an issue with that snippet, and I believe it is no coincidence that it was isolated for emotional effect. Concretely, Eliezer strikes this hurt tone and starts talking about how we couldn’t even react in a dignified way and so on. This does strike me as unnecessary and even bordering toward offending the viewer/society at large (whether or not that is intended or might be justified aside).
I’m sorry if I’m misunderstanding- but is your claim that Yudkowsky’s model actually does tell us for certain, or some extremely close approximation of ‘certain’, about what’s going to happen?
(This is of course just my understanding of his model, but) yes. The analogy he uses is that while you cannot predict Stockfish’s next move in chess, you can predict for ‘certain’ that it will win the game. I think the components of the model are roughly:
it is ‘certain’ that, given the fierce competition and the number of players and the incentives involved, somebody will build an AGI before we’ve solved alignment.
it is ‘certain’ that if one builds an AGI without solving alignment first, one gets basically a random draw from mindspace.
it is ‘certain’ that a random draw from mindspace doesn’t care about humans
it is ‘certain’ that, like Stockfish, this random draw AGI will ‘win the game’, and that since it doesn’t care about humans, a won gameboard does not have any humans on it (because those humans were made of atoms which could be used for whatever it does care about)
Why is it a necessary condition that human atoms must be allocated elsewhere? There are plenty of other atoms to work with. We humans dominate the globe but we don’t disassemble literally everything (although we do a lot) to use the atoms for other purposes. Isn’t it arguable that ASI or even AGI will have a better appreciation for systems ecology than we do…
There are a bunch of baked assumptions here from EY. Remember he came up with many of these ideas years ago, before deep learning existed.
(1) the AGIs have to be agentic with a global score. This is not true, but very early AI agents often did work this way. Take one of the simplest possible RL agents, the q-learner. All it does is pick the action that has the maximum discounted reward. Thus the q-learner learns it’s environment, filling out an array in memory called the q-table, and then just does whatever it’s source code told it has the max reward. (some of the first successful deep learning papers just replaced that array with a neural network)
You could imagine building an “embodied” robot that from the moment you switch it on, it always tries to make that “reward” number go ever higher, the same way.
This kind of AGI is likely lethally dangerous.
(2) Intelligence scales very high. In a simple game, intelligence has diminishing returns that collapse to zero. (once you have enough intelligence to solve a task, you have 0 error gradient or reason to develop any more)
In more complex games (including reality), intelligence goes further, but there always is a limit. For example, if you think about a task like “primate locates and picks apples”, ever more intelligence can make the primate more efficient at searching for the apple, or to take a more efficient path towards reaching and grasping the apple. But it’s logarithmically diminishing returns, and no amount of intelligence will let the primate find an apple if it’s paralyzed or unable to explore at least some of the forest. Nor can it instruct another primate to find the apple for it if the paralyzed one has never seen the forest at all.
Note also that in reality, an agent’s reward equals (resource gain—resource cost). One term in ‘resource cost’ is the cost of compute. Hence, for example you would not want to make a robot that mines copper too smart as adding more and more cognitive capacity adds less and less incremental efficiency gain in how much copper it collects, but costs more and more compute to realize. Similarly there is no reason to train the agent in simulation past a certain point, for the same cost reason. Intelligence stops adding marginal net utility.
EY posits that technologies that we think will probably take methodically improvements and careful experiments on a very large scale to develop could be “leapfrogged” by just skipping direct to advanced capabilities. For example, diamondoid nanotechnology not from carefully studying small assemblies of diamond on a large scale, and methodically working up the tool chain, at a large scale using many billions of dollars of equipment, but instead just hacking it direct from hijacking biology.
From an agent that has no direct experimental data with biology—EY gives examples where the AGI has done everything in sim. Note EY has never been to high school or college per wikipedia. He may be an extreme edge of the bell curve genius, but there may be small flaws in his knowledge base that are leading to these faulty assumptions. Which is exactly the problem an AGI with infinite compute but no empirical data not regurgitated from humans would have. It would model biology and the nanoscale using all human papers, but small errors would cause the simulation to diverge from reality, causing the AGI to make plans based on nonsense. (see how RL agents exploit environments by exploiting flaws in the physics sim for an example of this)
We humans dominate the globe but we don’t disassemble literally everything (although we do a lot) to use the atoms for other purposes.
There are two reasons why we don’t:
We don’t have the resources or technology to. For example there are tons of metals in the ground and up in space that we’d love to get our hands on but don’t yet have the tech or the time to do so, and there are viruses we’d love to destroy but we don’t know how. The AGI is presumably much more capable than us, and it hardly even needs to be more capable than us to destroy us (the tech and resources for that already exist), so this reason will not stop it.
We don’t want to. For example there are some forests we could turn into useful wood and farmland, and yet we protect them for reasons such as “beauty”, “caring for the environment”, etc. Thing is, these are all very human-specific reasons, and:
Isn’t it arguable that ASI or even AGI will have a better appreciation for systems ecology than we do…
No. Sure it is possible, as in it doesn’t have literally zero chance if you draw a mind at random. (Similarly a rocket launched in a random direction could potentially land on the moon, or at least crash into it.) But there are so many possible things an AGI could be optimizing for, and there is no reason that human-centric things like “systems ecology” should be likely, as opposed to “number of paperclips”, “number of alternating 1s and 0s in its memory banks”, or an enormous host of things we can’t even comprehend because we haven’t discovered the relevant physics yet.
(My personal hope for humanity lies in the first bullet point above being wrong: given surprising innovations in the past, it seems plausible that someone will solve alignment before it’s too late, and also given some semi-successful global coordination things in the past (avoiding nuclear war, banning CFCs), it seems plausible that a few scary pre-critical AIs might successfully galvanize the world into successful delaying action for long enough that alignment could be solved)
if an AI appreciates ecology more than we do, among its goals is to prevent human harm to ecosystems, and so among its early actions will be to kill most or all humans. You didn’t think of this, because it’s such an inhuman course of action. Almost every goal that is easy to specify leads to human disempowerment or extinction, if a superhuman entity tries hard enough to accomplish it. This regrettable fact takes a while to convince yourself of, because it is so strange and terrible. In my case, it was roughly 1997-2003. Hopefully humanity learns a bit faster.
Evolution favours organisms that grow as fast as possible. AGIs that expand aggressively are the ones that will become ubiquitous.
Computronium needs power and cooling. Only dense, reliable and highly scalable form of power available on earth is nuclear, why would ASI care about ensuring no release of radioactivity into the environment?
Similarly mineral extraction—which at huge scales needed for VInge’s “aggressively hegemonizing” AI will be using inevitably low grade ores becomes extremely energy intensive and highly polluting. Why would ASI care about the pollution?
If/when ASI power consumption rises to petaWatt levels the extra heat is going to start having a major impact on climate. Icecaps gone etc. Oceans are probably most attractive locations for high power intensity ASI due to vast cooling potential.
Imagine fiancéespace (or fiancéspace) - as in the space of romantic partners that would marry you (assuming you’re not married and you want to be). You can imagine “drawing” from that space, but once you draw nearly all of the work is still ahead of you. Someone that was initially “friendly” wouldn’t necessarily stay that way, and someone that was unfriendly wouldn’t necessarily stay that way. It’s like asking “how do you make sure a human mind stays friendly to you forever?” We can’t solve that with our lowly ape minds, and I’m not sure that we’d want to. The closest solution to that I know if with humans is Williams syndrome, and we probably wouldn’t want an AGI with an analogous handicap. The relationship cultured overtime with other minds is more important in many respects the the initial conditions of the other minds.
Maybe dogs are the better metaphor. We want AGIs to be like very smart Labradors. Random, “feral,” AGIs may be more like wolves. So if we made them so they could be “selectively bred” using something like a genetic algorithm? Select for more Lab-y and less Wolf-y traits.
If a Labrador was like 10 or 100 times smarter than it’s owner, would it still be mostly nice most of the time? I would hope so. Maybe the first AGI works like Garm->Fenrir in God of War (spoiler, sorry).
Well, he might be right—and I align with his views more than with many others—but you still have to realize that you can’t literally predict the future.
I think there’s a difference between not wanting to elicit false hope and taking out your negative emotions on others (even if it’s about a reasonable expectation of the world). Of course, he has a right to experience these emotions—but I believe it would be more considerate to do that in private.
I should say that I have great respect to him and his efforts and insights. This is not a critique of the person, just of a concrete behavior.
There is a misalignment hazard to this framing: the person who decides to withhold the truth is not the audience who’d care to have their feelings spared. So the question of whether it’s “more important” might be ill-posed.
Thanks for bringing this up. Yes I think that is very important and is not what I’m trying to criticize. I will update the previous comment to clarify.
That doesn’t matter for the points I was responding to, a matter of policy for what things to claim, given what your own understanding of the world happens to be.
you still have to realize that you can’t literally predict the future
There are claims about the facts of the world being made, apart from any emotions. Presence of emotional correlates doesn’t make corresponding events in the concrete physical world irrelevant.
I think there’s a kind of division of labor going on, and I’m going to use a software industry metaphor. If you’re redteaming, auditing, or QAing at a large org, you should really be maxing out on paranoia, being glass half empty, etc. because you believe that elsewhere in the institution, other peoples’ jobs are to consider your advice and weigh it against the risk tolerance implied by the budget or by regulation or whatever. Whereas I think redteaming, auditing, or QAing at a small org you kind of take on some of the responsibility of measuring threatmodels against given cost constraints. It’s not exactly obvious that someone else in the org will rationally integrate information you provide into the organization’s strategy and implementation, you want them to follow your recommendations in a way that makes business sense.
My guess is that being a downer comes from this large org register of a redteam’s job description being literally just redteam, whereas it might make sense for other researchers or communicators to take a more small org approach where the redteam is probably multitasking in some way.
Intuition pump: I don’t really know a citation, but I once saw a remark that the commercial airline crash rate in the late soviet union was plausibly more rational than the commercial airline crash rate in the US. Airplane risk intolerance in the US is great for QA jobs, but that doesn’t mean it’s based on an optimal tradeoff between price and safety with respect to stakeholder preferences (if you could elicit them in some way). Economists make related remarks re nuclear energy.
Why is Eliezer such a downer? We simply don’t know how things are going to turn out—I believe he’s right about how we should approach AI. And regarding any technical points I’d guess he is more right than anyone else. That doesn’t justify, in my opinion, going around and instilling a defeatist attitude in anyone who wants to take him seriously. Seriously, people are looking up to you. That’s not how you treat them.
EDIT: I rewatched part of the podcast. Previously, I had only seen the final snippets, in particular where he talks about the conference involving Elon Musk. The problem I have doesn’t pertain to the rest of the podcast, which makes up a big proportion, so I would weaken the indignant tone above quite a bit.
Still, I have an issue with that snippet, and I believe it is no coincidence that it was isolated for emotional effect. Concretely, Eliezer strikes this hurt tone and starts talking about how we couldn’t even react in a dignified way and so on. This does strike me as unnecessary and even bordering toward offending the viewer/society at large (whether or not that is intended or might be justified aside).
His model does say how things are going to turn out: with everyone dying.
What’s the preferred alternative? Lying or withholding relevant arguments, out of general principle, without even an expected benefit to anyone?
I’m sorry if I’m misunderstanding- but is your claim that Yudkowsky’s model actually does tell us for certain, or some extremely close approximation of ‘certain’, about what’s going to happen?
(This is of course just my understanding of his model, but) yes. The analogy he uses is that while you cannot predict Stockfish’s next move in chess, you can predict for ‘certain’ that it will win the game. I think the components of the model are roughly:
it is ‘certain’ that, given the fierce competition and the number of players and the incentives involved, somebody will build an AGI before we’ve solved alignment.
it is ‘certain’ that if one builds an AGI without solving alignment first, one gets basically a random draw from mindspace.
it is ‘certain’ that a random draw from mindspace doesn’t care about humans
it is ‘certain’ that, like Stockfish, this random draw AGI will ‘win the game’, and that since it doesn’t care about humans, a won gameboard does not have any humans on it (because those humans were made of atoms which could be used for whatever it does care about)
Why is it a necessary condition that human atoms must be allocated elsewhere? There are plenty of other atoms to work with. We humans dominate the globe but we don’t disassemble literally everything (although we do a lot) to use the atoms for other purposes. Isn’t it arguable that ASI or even AGI will have a better appreciation for systems ecology than we do…
There are a bunch of baked assumptions here from EY. Remember he came up with many of these ideas years ago, before deep learning existed.
(1) the AGIs have to be agentic with a global score. This is not true, but very early AI agents often did work this way. Take one of the simplest possible RL agents, the q-learner. All it does is pick the action that has the maximum discounted reward. Thus the q-learner learns it’s environment, filling out an array in memory called the q-table, and then just does whatever it’s source code told it has the max reward. (some of the first successful deep learning papers just replaced that array with a neural network)
You could imagine building an “embodied” robot that from the moment you switch it on, it always tries to make that “reward” number go ever higher, the same way.
This kind of AGI is likely lethally dangerous.
(2) Intelligence scales very high. In a simple game, intelligence has diminishing returns that collapse to zero. (once you have enough intelligence to solve a task, you have 0 error gradient or reason to develop any more)
In more complex games (including reality), intelligence goes further, but there always is a limit. For example, if you think about a task like “primate locates and picks apples”, ever more intelligence can make the primate more efficient at searching for the apple, or to take a more efficient path towards reaching and grasping the apple. But it’s logarithmically diminishing returns, and no amount of intelligence will let the primate find an apple if it’s paralyzed or unable to explore at least some of the forest. Nor can it instruct another primate to find the apple for it if the paralyzed one has never seen the forest at all.
Note also that in reality, an agent’s reward equals (resource gain—resource cost). One term in ‘resource cost’ is the cost of compute. Hence, for example you would not want to make a robot that mines copper too smart as adding more and more cognitive capacity adds less and less incremental efficiency gain in how much copper it collects, but costs more and more compute to realize. Similarly there is no reason to train the agent in simulation past a certain point, for the same cost reason. Intelligence stops adding marginal net utility.
EY posits that technologies that we think will probably take methodically improvements and careful experiments on a very large scale to develop could be “leapfrogged” by just skipping direct to advanced capabilities. For example, diamondoid nanotechnology not from carefully studying small assemblies of diamond on a large scale, and methodically working up the tool chain, at a large scale using many billions of dollars of equipment, but instead just hacking it direct from hijacking biology.
From an agent that has no direct experimental data with biology—EY gives examples where the AGI has done everything in sim. Note EY has never been to high school or college per wikipedia. He may be an extreme edge of the bell curve genius, but there may be small flaws in his knowledge base that are leading to these faulty assumptions. Which is exactly the problem an AGI with infinite compute but no empirical data not regurgitated from humans would have. It would model biology and the nanoscale using all human papers, but small errors would cause the simulation to diverge from reality, causing the AGI to make plans based on nonsense. (see how RL agents exploit environments by exploiting flaws in the physics sim for an example of this)
There are two reasons why we don’t:
We don’t have the resources or technology to. For example there are tons of metals in the ground and up in space that we’d love to get our hands on but don’t yet have the tech or the time to do so, and there are viruses we’d love to destroy but we don’t know how. The AGI is presumably much more capable than us, and it hardly even needs to be more capable than us to destroy us (the tech and resources for that already exist), so this reason will not stop it.
We don’t want to. For example there are some forests we could turn into useful wood and farmland, and yet we protect them for reasons such as “beauty”, “caring for the environment”, etc. Thing is, these are all very human-specific reasons, and:
No. Sure it is possible, as in it doesn’t have literally zero chance if you draw a mind at random. (Similarly a rocket launched in a random direction could potentially land on the moon, or at least crash into it.) But there are so many possible things an AGI could be optimizing for, and there is no reason that human-centric things like “systems ecology” should be likely, as opposed to “number of paperclips”, “number of alternating 1s and 0s in its memory banks”, or an enormous host of things we can’t even comprehend because we haven’t discovered the relevant physics yet.
(My personal hope for humanity lies in the first bullet point above being wrong: given surprising innovations in the past, it seems plausible that someone will solve alignment before it’s too late, and also given some semi-successful global coordination things in the past (avoiding nuclear war, banning CFCs), it seems plausible that a few scary pre-critical AIs might successfully galvanize the world into successful delaying action for long enough that alignment could be solved)
if an AI appreciates ecology more than we do, among its goals is to prevent human harm to ecosystems, and so among its early actions will be to kill most or all humans. You didn’t think of this, because it’s such an inhuman course of action.
Almost every goal that is easy to specify leads to human disempowerment or extinction, if a superhuman entity tries hard enough to accomplish it. This regrettable fact takes a while to convince yourself of, because it is so strange and terrible. In my case, it was roughly 1997-2003. Hopefully humanity learns a bit faster.
Evolution favours organisms that grow as fast as possible. AGIs that expand aggressively are the ones that will become ubiquitous.
Computronium needs power and cooling. Only dense, reliable and highly scalable form of power available on earth is nuclear, why would ASI care about ensuring no release of radioactivity into the environment?
Similarly mineral extraction—which at huge scales needed for VInge’s “aggressively hegemonizing” AI will be using inevitably low grade ores becomes extremely energy intensive and highly polluting. Why would ASI care about the pollution?
If/when ASI power consumption rises to petaWatt levels the extra heat is going to start having a major impact on climate. Icecaps gone etc. Oceans are probably most attractive locations for high power intensity ASI due to vast cooling potential.
Imagine fiancéespace (or fiancéspace) - as in the space of romantic partners that would marry you (assuming you’re not married and you want to be). You can imagine “drawing” from that space, but once you draw nearly all of the work is still ahead of you. Someone that was initially “friendly” wouldn’t necessarily stay that way, and someone that was unfriendly wouldn’t necessarily stay that way. It’s like asking “how do you make sure a human mind stays friendly to you forever?” We can’t solve that with our lowly ape minds, and I’m not sure that we’d want to. The closest solution to that I know if with humans is Williams syndrome, and we probably wouldn’t want an AGI with an analogous handicap. The relationship cultured overtime with other minds is more important in many respects the the initial conditions of the other minds.
Maybe dogs are the better metaphor. We want AGIs to be like very smart Labradors. Random, “feral,” AGIs may be more like wolves. So if we made them so they could be “selectively bred” using something like a genetic algorithm? Select for more Lab-y and less Wolf-y traits.
If a Labrador was like 10 or 100 times smarter than it’s owner, would it still be mostly nice most of the time? I would hope so. Maybe the first AGI works like Garm->Fenrir in God of War (spoiler, sorry).
Just thinking out loud a bit...
You can’t selectively breed labradors if the first wolf kills you and everyone else.
Of course you can, you just have to make the first set of wolves very small.
Well, he might be right—and I align with his views more than with many others—but you still have to realize that you can’t literally predict the future.
I think there’s a difference between not wanting to elicit false hope and taking out your negative emotions on others (even if it’s about a reasonable expectation of the world). Of course, he has a right to experience these emotions—but I believe it would be more considerate to do that in private.
I should say that I have great respect to him and his efforts and insights. This is not a critique of the person, just of a concrete behavior.
Maybe making people realize the reality of the situation and telling the truth is more important than sparing their feelings.
There is a misalignment hazard to this framing: the person who decides to withhold the truth is not the audience who’d care to have their feelings spared. So the question of whether it’s “more important” might be ill-posed.
Thanks for bringing this up. Yes I think that is very important and is not what I’m trying to criticize. I will update the previous comment to clarify.
That doesn’t matter for the points I was responding to, a matter of policy for what things to claim, given what your own understanding of the world happens to be.
You can’t know the future with certainty, but you can predict it. The sun will rise tomorrow. I’m much less certain that it will rise in 20 years, and not because there is nobody to observe it.
There are claims about the facts of the world being made, apart from any emotions. Presence of emotional correlates doesn’t make corresponding events in the concrete physical world irrelevant.
I think there’s a kind of division of labor going on, and I’m going to use a software industry metaphor. If you’re redteaming, auditing, or QAing at a large org, you should really be maxing out on paranoia, being glass half empty, etc. because you believe that elsewhere in the institution, other peoples’ jobs are to consider your advice and weigh it against the risk tolerance implied by the budget or by regulation or whatever. Whereas I think redteaming, auditing, or QAing at a small org you kind of take on some of the responsibility of measuring threatmodels against given cost constraints. It’s not exactly obvious that someone else in the org will rationally integrate information you provide into the organization’s strategy and implementation, you want them to follow your recommendations in a way that makes business sense.
My guess is that being a downer comes from this large org register of a redteam’s job description being literally just redteam, whereas it might make sense for other researchers or communicators to take a more small org approach where the redteam is probably multitasking in some way.
Intuition pump: I don’t really know a citation, but I once saw a remark that the commercial airline crash rate in the late soviet union was plausibly more rational than the commercial airline crash rate in the US. Airplane risk intolerance in the US is great for QA jobs, but that doesn’t mean it’s based on an optimal tradeoff between price and safety with respect to stakeholder preferences (if you could elicit them in some way). Economists make related remarks re nuclear energy.