I’m sorry if I’m misunderstanding- but is your claim that Yudkowsky’s model actually does tell us for certain, or some extremely close approximation of ‘certain’, about what’s going to happen?
(This is of course just my understanding of his model, but) yes. The analogy he uses is that while you cannot predict Stockfish’s next move in chess, you can predict for ‘certain’ that it will win the game. I think the components of the model are roughly:
it is ‘certain’ that, given the fierce competition and the number of players and the incentives involved, somebody will build an AGI before we’ve solved alignment.
it is ‘certain’ that if one builds an AGI without solving alignment first, one gets basically a random draw from mindspace.
it is ‘certain’ that a random draw from mindspace doesn’t care about humans
it is ‘certain’ that, like Stockfish, this random draw AGI will ‘win the game’, and that since it doesn’t care about humans, a won gameboard does not have any humans on it (because those humans were made of atoms which could be used for whatever it does care about)
Why is it a necessary condition that human atoms must be allocated elsewhere? There are plenty of other atoms to work with. We humans dominate the globe but we don’t disassemble literally everything (although we do a lot) to use the atoms for other purposes. Isn’t it arguable that ASI or even AGI will have a better appreciation for systems ecology than we do…
There are a bunch of baked assumptions here from EY. Remember he came up with many of these ideas years ago, before deep learning existed.
(1) the AGIs have to be agentic with a global score. This is not true, but very early AI agents often did work this way. Take one of the simplest possible RL agents, the q-learner. All it does is pick the action that has the maximum discounted reward. Thus the q-learner learns it’s environment, filling out an array in memory called the q-table, and then just does whatever it’s source code told it has the max reward. (some of the first successful deep learning papers just replaced that array with a neural network)
You could imagine building an “embodied” robot that from the moment you switch it on, it always tries to make that “reward” number go ever higher, the same way.
This kind of AGI is likely lethally dangerous.
(2) Intelligence scales very high. In a simple game, intelligence has diminishing returns that collapse to zero. (once you have enough intelligence to solve a task, you have 0 error gradient or reason to develop any more)
In more complex games (including reality), intelligence goes further, but there always is a limit. For example, if you think about a task like “primate locates and picks apples”, ever more intelligence can make the primate more efficient at searching for the apple, or to take a more efficient path towards reaching and grasping the apple. But it’s logarithmically diminishing returns, and no amount of intelligence will let the primate find an apple if it’s paralyzed or unable to explore at least some of the forest. Nor can it instruct another primate to find the apple for it if the paralyzed one has never seen the forest at all.
Note also that in reality, an agent’s reward equals (resource gain—resource cost). One term in ‘resource cost’ is the cost of compute. Hence, for example you would not want to make a robot that mines copper too smart as adding more and more cognitive capacity adds less and less incremental efficiency gain in how much copper it collects, but costs more and more compute to realize. Similarly there is no reason to train the agent in simulation past a certain point, for the same cost reason. Intelligence stops adding marginal net utility.
EY posits that technologies that we think will probably take methodically improvements and careful experiments on a very large scale to develop could be “leapfrogged” by just skipping direct to advanced capabilities. For example, diamondoid nanotechnology not from carefully studying small assemblies of diamond on a large scale, and methodically working up the tool chain, at a large scale using many billions of dollars of equipment, but instead just hacking it direct from hijacking biology.
From an agent that has no direct experimental data with biology—EY gives examples where the AGI has done everything in sim. Note EY has never been to high school or college per wikipedia. He may be an extreme edge of the bell curve genius, but there may be small flaws in his knowledge base that are leading to these faulty assumptions. Which is exactly the problem an AGI with infinite compute but no empirical data not regurgitated from humans would have. It would model biology and the nanoscale using all human papers, but small errors would cause the simulation to diverge from reality, causing the AGI to make plans based on nonsense. (see how RL agents exploit environments by exploiting flaws in the physics sim for an example of this)
We humans dominate the globe but we don’t disassemble literally everything (although we do a lot) to use the atoms for other purposes.
There are two reasons why we don’t:
We don’t have the resources or technology to. For example there are tons of metals in the ground and up in space that we’d love to get our hands on but don’t yet have the tech or the time to do so, and there are viruses we’d love to destroy but we don’t know how. The AGI is presumably much more capable than us, and it hardly even needs to be more capable than us to destroy us (the tech and resources for that already exist), so this reason will not stop it.
We don’t want to. For example there are some forests we could turn into useful wood and farmland, and yet we protect them for reasons such as “beauty”, “caring for the environment”, etc. Thing is, these are all very human-specific reasons, and:
Isn’t it arguable that ASI or even AGI will have a better appreciation for systems ecology than we do…
No. Sure it is possible, as in it doesn’t have literally zero chance if you draw a mind at random. (Similarly a rocket launched in a random direction could potentially land on the moon, or at least crash into it.) But there are so many possible things an AGI could be optimizing for, and there is no reason that human-centric things like “systems ecology” should be likely, as opposed to “number of paperclips”, “number of alternating 1s and 0s in its memory banks”, or an enormous host of things we can’t even comprehend because we haven’t discovered the relevant physics yet.
(My personal hope for humanity lies in the first bullet point above being wrong: given surprising innovations in the past, it seems plausible that someone will solve alignment before it’s too late, and also given some semi-successful global coordination things in the past (avoiding nuclear war, banning CFCs), it seems plausible that a few scary pre-critical AIs might successfully galvanize the world into successful delaying action for long enough that alignment could be solved)
if an AI appreciates ecology more than we do, among its goals is to prevent human harm to ecosystems, and so among its early actions will be to kill most or all humans. You didn’t think of this, because it’s such an inhuman course of action. Almost every goal that is easy to specify leads to human disempowerment or extinction, if a superhuman entity tries hard enough to accomplish it. This regrettable fact takes a while to convince yourself of, because it is so strange and terrible. In my case, it was roughly 1997-2003. Hopefully humanity learns a bit faster.
Evolution favours organisms that grow as fast as possible. AGIs that expand aggressively are the ones that will become ubiquitous.
Computronium needs power and cooling. Only dense, reliable and highly scalable form of power available on earth is nuclear, why would ASI care about ensuring no release of radioactivity into the environment?
Similarly mineral extraction—which at huge scales needed for VInge’s “aggressively hegemonizing” AI will be using inevitably low grade ores becomes extremely energy intensive and highly polluting. Why would ASI care about the pollution?
If/when ASI power consumption rises to petaWatt levels the extra heat is going to start having a major impact on climate. Icecaps gone etc. Oceans are probably most attractive locations for high power intensity ASI due to vast cooling potential.
Imagine fiancéespace (or fiancéspace) - as in the space of romantic partners that would marry you (assuming you’re not married and you want to be). You can imagine “drawing” from that space, but once you draw nearly all of the work is still ahead of you. Someone that was initially “friendly” wouldn’t necessarily stay that way, and someone that was unfriendly wouldn’t necessarily stay that way. It’s like asking “how do you make sure a human mind stays friendly to you forever?” We can’t solve that with our lowly ape minds, and I’m not sure that we’d want to. The closest solution to that I know if with humans is Williams syndrome, and we probably wouldn’t want an AGI with an analogous handicap. The relationship cultured overtime with other minds is more important in many respects the the initial conditions of the other minds.
Maybe dogs are the better metaphor. We want AGIs to be like very smart Labradors. Random, “feral,” AGIs may be more like wolves. So if we made them so they could be “selectively bred” using something like a genetic algorithm? Select for more Lab-y and less Wolf-y traits.
If a Labrador was like 10 or 100 times smarter than it’s owner, would it still be mostly nice most of the time? I would hope so. Maybe the first AGI works like Garm->Fenrir in God of War (spoiler, sorry).
I’m sorry if I’m misunderstanding- but is your claim that Yudkowsky’s model actually does tell us for certain, or some extremely close approximation of ‘certain’, about what’s going to happen?
(This is of course just my understanding of his model, but) yes. The analogy he uses is that while you cannot predict Stockfish’s next move in chess, you can predict for ‘certain’ that it will win the game. I think the components of the model are roughly:
it is ‘certain’ that, given the fierce competition and the number of players and the incentives involved, somebody will build an AGI before we’ve solved alignment.
it is ‘certain’ that if one builds an AGI without solving alignment first, one gets basically a random draw from mindspace.
it is ‘certain’ that a random draw from mindspace doesn’t care about humans
it is ‘certain’ that, like Stockfish, this random draw AGI will ‘win the game’, and that since it doesn’t care about humans, a won gameboard does not have any humans on it (because those humans were made of atoms which could be used for whatever it does care about)
Why is it a necessary condition that human atoms must be allocated elsewhere? There are plenty of other atoms to work with. We humans dominate the globe but we don’t disassemble literally everything (although we do a lot) to use the atoms for other purposes. Isn’t it arguable that ASI or even AGI will have a better appreciation for systems ecology than we do…
There are a bunch of baked assumptions here from EY. Remember he came up with many of these ideas years ago, before deep learning existed.
(1) the AGIs have to be agentic with a global score. This is not true, but very early AI agents often did work this way. Take one of the simplest possible RL agents, the q-learner. All it does is pick the action that has the maximum discounted reward. Thus the q-learner learns it’s environment, filling out an array in memory called the q-table, and then just does whatever it’s source code told it has the max reward. (some of the first successful deep learning papers just replaced that array with a neural network)
You could imagine building an “embodied” robot that from the moment you switch it on, it always tries to make that “reward” number go ever higher, the same way.
This kind of AGI is likely lethally dangerous.
(2) Intelligence scales very high. In a simple game, intelligence has diminishing returns that collapse to zero. (once you have enough intelligence to solve a task, you have 0 error gradient or reason to develop any more)
In more complex games (including reality), intelligence goes further, but there always is a limit. For example, if you think about a task like “primate locates and picks apples”, ever more intelligence can make the primate more efficient at searching for the apple, or to take a more efficient path towards reaching and grasping the apple. But it’s logarithmically diminishing returns, and no amount of intelligence will let the primate find an apple if it’s paralyzed or unable to explore at least some of the forest. Nor can it instruct another primate to find the apple for it if the paralyzed one has never seen the forest at all.
Note also that in reality, an agent’s reward equals (resource gain—resource cost). One term in ‘resource cost’ is the cost of compute. Hence, for example you would not want to make a robot that mines copper too smart as adding more and more cognitive capacity adds less and less incremental efficiency gain in how much copper it collects, but costs more and more compute to realize. Similarly there is no reason to train the agent in simulation past a certain point, for the same cost reason. Intelligence stops adding marginal net utility.
EY posits that technologies that we think will probably take methodically improvements and careful experiments on a very large scale to develop could be “leapfrogged” by just skipping direct to advanced capabilities. For example, diamondoid nanotechnology not from carefully studying small assemblies of diamond on a large scale, and methodically working up the tool chain, at a large scale using many billions of dollars of equipment, but instead just hacking it direct from hijacking biology.
From an agent that has no direct experimental data with biology—EY gives examples where the AGI has done everything in sim. Note EY has never been to high school or college per wikipedia. He may be an extreme edge of the bell curve genius, but there may be small flaws in his knowledge base that are leading to these faulty assumptions. Which is exactly the problem an AGI with infinite compute but no empirical data not regurgitated from humans would have. It would model biology and the nanoscale using all human papers, but small errors would cause the simulation to diverge from reality, causing the AGI to make plans based on nonsense. (see how RL agents exploit environments by exploiting flaws in the physics sim for an example of this)
There are two reasons why we don’t:
We don’t have the resources or technology to. For example there are tons of metals in the ground and up in space that we’d love to get our hands on but don’t yet have the tech or the time to do so, and there are viruses we’d love to destroy but we don’t know how. The AGI is presumably much more capable than us, and it hardly even needs to be more capable than us to destroy us (the tech and resources for that already exist), so this reason will not stop it.
We don’t want to. For example there are some forests we could turn into useful wood and farmland, and yet we protect them for reasons such as “beauty”, “caring for the environment”, etc. Thing is, these are all very human-specific reasons, and:
No. Sure it is possible, as in it doesn’t have literally zero chance if you draw a mind at random. (Similarly a rocket launched in a random direction could potentially land on the moon, or at least crash into it.) But there are so many possible things an AGI could be optimizing for, and there is no reason that human-centric things like “systems ecology” should be likely, as opposed to “number of paperclips”, “number of alternating 1s and 0s in its memory banks”, or an enormous host of things we can’t even comprehend because we haven’t discovered the relevant physics yet.
(My personal hope for humanity lies in the first bullet point above being wrong: given surprising innovations in the past, it seems plausible that someone will solve alignment before it’s too late, and also given some semi-successful global coordination things in the past (avoiding nuclear war, banning CFCs), it seems plausible that a few scary pre-critical AIs might successfully galvanize the world into successful delaying action for long enough that alignment could be solved)
if an AI appreciates ecology more than we do, among its goals is to prevent human harm to ecosystems, and so among its early actions will be to kill most or all humans. You didn’t think of this, because it’s such an inhuman course of action.
Almost every goal that is easy to specify leads to human disempowerment or extinction, if a superhuman entity tries hard enough to accomplish it. This regrettable fact takes a while to convince yourself of, because it is so strange and terrible. In my case, it was roughly 1997-2003. Hopefully humanity learns a bit faster.
Evolution favours organisms that grow as fast as possible. AGIs that expand aggressively are the ones that will become ubiquitous.
Computronium needs power and cooling. Only dense, reliable and highly scalable form of power available on earth is nuclear, why would ASI care about ensuring no release of radioactivity into the environment?
Similarly mineral extraction—which at huge scales needed for VInge’s “aggressively hegemonizing” AI will be using inevitably low grade ores becomes extremely energy intensive and highly polluting. Why would ASI care about the pollution?
If/when ASI power consumption rises to petaWatt levels the extra heat is going to start having a major impact on climate. Icecaps gone etc. Oceans are probably most attractive locations for high power intensity ASI due to vast cooling potential.
Imagine fiancéespace (or fiancéspace) - as in the space of romantic partners that would marry you (assuming you’re not married and you want to be). You can imagine “drawing” from that space, but once you draw nearly all of the work is still ahead of you. Someone that was initially “friendly” wouldn’t necessarily stay that way, and someone that was unfriendly wouldn’t necessarily stay that way. It’s like asking “how do you make sure a human mind stays friendly to you forever?” We can’t solve that with our lowly ape minds, and I’m not sure that we’d want to. The closest solution to that I know if with humans is Williams syndrome, and we probably wouldn’t want an AGI with an analogous handicap. The relationship cultured overtime with other minds is more important in many respects the the initial conditions of the other minds.
Maybe dogs are the better metaphor. We want AGIs to be like very smart Labradors. Random, “feral,” AGIs may be more like wolves. So if we made them so they could be “selectively bred” using something like a genetic algorithm? Select for more Lab-y and less Wolf-y traits.
If a Labrador was like 10 or 100 times smarter than it’s owner, would it still be mostly nice most of the time? I would hope so. Maybe the first AGI works like Garm->Fenrir in God of War (spoiler, sorry).
Just thinking out loud a bit...
You can’t selectively breed labradors if the first wolf kills you and everyone else.
Of course you can, you just have to make the first set of wolves very small.