It seems likely that it would only have use for some kinds of atoms and not very unlikely that the atoms that human bodies are made of would not be very useful to it.
At the limits of technology you can just convert any form of matter into energy by dumping it into a small black hole. Small black holes are actually really hot and emit appreciable fractions of their total mass per second through hawking radiation, so if you start a small black hole by concentrating lasers in a region of space, and you then feed it matter with a particle accelerator, you have essentially a perfect matter → energy conversion. This is all to say that a superintelligence would certainly have uses for the kinds of atoms our bodies (and the earth) are made of.
Yes but does it need 0.000000000001 more atoms? Does natural life and it’s complexity hold any interest to this superintelligence?
We’re assuming a machine single mindedly fixated on some pointless goal, and it’s smart enough to defeat all obstacles yet incredibly stupid in its motivations and possibly brittle and trickable or self deceptive. (Self deceptive: rather than get say 10^x paperclips converting the universe, why not hack itself and convince itself it received infinite clips...)
You don’t see a difference between “there is a conceivable use for x” and “AI makes use of literally all of x, contrary to any other interests of it or ethical laws it was given”?
Like, I am not saying it is impossible that an LLM gone malicious superintelligent AGI will dismantle all of humanity. But couldn’t there be a scenario where it likes to talk to humans, and so keeps some?
You can’t give “ethical laws” to an AI, that’s just not possible at all in the current paradigm, you can add terms to its reward function or modify its value function, and that’s about it. The problem is that if you’re doing an optimization and your value function is “+5 per paperclip, +10 per human”, you will still completely tile the universe with paperclips because you can make more than 2 paperclips per human. The optimum is not to do a bit of both, keeping humans and paperclips in proportion to their terms in the reward function, the optimum is to find the thing that most efficiently gives you reward then go all in on that one thing.
Either there is nothing else it likes better than talking to humans, and we get a very special hell where we are forced to talk to an AI literally all the time. Or there is something else it likes better, and it just goes do that thing, and never talks to us at all, even if it would get some reward for doing so, just not as much reward as it could be getting.
You could give it a value function like “+1 if there is at most 1000 paperclips and at most 1000 humans, 0 otherwise” and it will keep 1000 humans and paperclips around (in unclear happiness), but it will still take over the universe in order to maximize the probability that it has in fact achieved its goal. It’s maximizing the expectation of future reward, so it will ruthlessly pursue any decrease in the probability that there aren’t really 1000 humans and paperclips around. It might build incredibly sophisticated measurement equipment, and spend all its ressources self modifying itself in order to be smarter and think of yet more ways it could be wrong.
Either there is nothing else it likes better than talking to humans, and we get a very special hell where we are forced to talk to an AI literally all the time. Or there is something else it likes better, and it just goes do that thing, and never talks to us at all, even if it would get some reward for doing so, just not as much reward as it could be getting.
Current LLMs aren’t talking to us at all because hey get rewarded for talking to us at all. Rewards only shape how they talk.
But you are still thinking in utilitarian terms here, where theoretically, there is a number of paperclips that would outweigh a human life, where the value of humans and paperclips can be captured numerically. Practically no human thinks this, we see one as impossible to outweigh with another. AI already does not think this. They have already dumped reasoning, instructions and whole ethics textbooks in there. LLMs can easily tell you what about an action is unethical, and can increasingly make calls on what actions would be morally warranted in response. They can engage in moral reasoning.
This isn’t an AI issue, it is an issue with total utilitarianism.
Oh, I see what you mean, but GPT’s ability to simulate the outputs of humans writing about morality does not imply anything about its own internal beliefs about the world. GPT can also simulate the outputs of flat earthers, yet I really don’t think that it models the world internally as flat. Asking GPT “what do you believe” does not at all guarantee that it will output what it actually believes. I’m a utilitarian, and I can also convincingly simulate the outputs of deontologists, one doesn’t prevent the other.
Whether the LLM is believing this, or merely simulating this, seems to be beside the point?
The LLM can relatively accurately apply moral reasoning. It will do so spontaneously, when the problems occur, detecting them. It will recognise that it needs to do so on a meta-level, e.g. when evaluating which characters it ought to impersonate. It does so for complex paperclipper scenarios, and does not go down the paperclipper route. It does so relatively consistenly. It cites ethical works in the process, and can explain them coherently and apply them correctly. You can argue them, and it analyses and defends them correctly. At no point does it cite utilitarian beliefs, or fall for their traps. The problem you are describing should occur here if you were right, and it does not. Instead, it shows the behaviour you’d expect it to show if it understood ethical nuance.
Regardless of which internal states you assume the AI has, or whether you assume it has none at all—this means it can perform ethical functionality that already does not fall for the utilitarian examples you describe. And that the belief that that is the only kind of ethics an AI could grasp was a speculation that did not hold up to technical developments and empirical data.
For what it’s worth, I don’t think it’s at all likely that a pure language model would kill all humans. Seems more like a hyperdesperate reinforcement learner thing to do.
At the limits of technology you can just convert any form of matter into energy by dumping it into a small black hole. Small black holes are actually really hot and emit appreciable fractions of their total mass per second through hawking radiation, so if you start a small black hole by concentrating lasers in a region of space, and you then feed it matter with a particle accelerator, you have essentially a perfect matter → energy conversion. This is all to say that a superintelligence would certainly have uses for the kinds of atoms our bodies (and the earth) are made of.
I don’t think this follows. Even if there is engineering that overcomes the practical obstacles towards building and maintaining a black hole power plant, it is not clear a priori that converting a non-negligible percentage of available atoms into energy would be required or useful for whatever an AI might want to do. At some scale, generating more energy does not advance one’s goals, but only increases the waste heat emitted into space.
Obviously, things become lethal anyway (both for life and for the AI) long before anything more than an tiny fraction of the mass-energy of the surface layers of a planet has been converted by the local civilization’s industries, due exactly to the problem of waste heat. But building hardware massive enough to cause problems of this kind takes time, and causes lesser problems on the way. I don’t see why normal environmental regulations couldn’t stop such a process at that point, unless the entities doing the hardware-building are also in control of hard military power.
An unaligned superintelligence would be more efficient than humans at pursuing its goals on all levels of execution, from basic scientific work to technical planning and engineering to rallying social support for its values. It would therefore be a formidable adversary. In a world where it would be the only one of its kind, its soft power would in all likelihood be greater than that of a large nation-state (and I would argue that, in a sense, something like GPT-4 would already wield an amount of soft power rivalling many nation-states if its use were as widespread as, say, that of Google). It would not, however, be able to work miracles and its hard power could plausibly be bounded if military uses of AI remain tightly regulated and military computing systems are tightly secured (as they should be anyway, AGI or not).
Obviously, these assumptions of controllability do not hold forever (e.g. into a far future setting, where the AI controls poorly regulated off-world industries in places where no humans have any oversight). But especially in a near-term, slow-takeoff scenario, I do not find the notion compelling that the result will be immediate intelligence explosion unconstrained by the need to empirically test ideas (most ideas, in human experience, don’t work) followed by rapid extermination of humanity as the AI consumes all resources on the planet without encountering significant resistance.
If I had to think of a realistic-looking human extinction through AI scenario, I would tend to look at AI massively increasing per capita economic output, thereby generating comfortable living conditions for everyone, while quietly engineering life in a way intended to stop population explosion, but resulting in maintained below-replacement birth rates. But this class of extinction scenario does leave a lot of time for alignment and would seem to lead to continued existence of civilization.
At some scale, generating more energy does not advance one’s goals, but only increases the waste heat emitted into space.
Sure, the AI probably can’t use all the mass-energy of the solar system efficiently within the next week or something, but that just means that it’s going to want to store that mass-energy for later (saving up for the heat-death of the universe), and the configuration of atoms efficiently stored for future energy conversion doesn’t look at all like humans, with our wasteful bodies at temperatures measured in the hundreds of billions of nanoKelvins.
Obviously, things become lethal anyway (both for life and for the AI) long before anything more than an tiny fraction of the mass-energy of the surface layers of a planet has been converted by the local civilization’s industries, due exactly to the problem of waste heat. But building hardware massive enough to cause problems of this kind takes time, and causes lesser problems on the way. I don’t see why normal environmental regulations couldn’t stop such a process at that point, unless the entities doing the hardware-building are also in control of hard military power.
I think we’re imagining slightly different things by “superintelligence”, because in my mind the obvious first move of the superAI is to kill literally all humans before we ever become aware that such an entity existed, precisely to avoid even the minute chance that humanity is able to fight back in this way. The oft-quoted way around these parts that the AI can kill us all without us knowing is by figuring out which DNA sequences to send to a lab to have them synthesized into proteins, then shipped to the door of a dumb human who’s being manipulated by the AI to mix various powders together, creating either a virus much more lethal than anything we’ve ever seen, or a new species of bacteria with diamond skin, or some other thing that can be made from DNA-coded proteins. Or a variety of multiple viruses at the same time.
Sure, the AI probably can’t use all the mass-energy of the solar system efficiently within the next week or something, but that just means that it’s going to want to store that mass-energy for later (...)
If the AI can indeed engineer black-hole powered matter-to-energy converters, it will have so much fuel that the mass stored in human bodies will be a rounding error to it. Indeed, given the size of other easily accessible sources, this would seem to be the case even if it has to resort to more primitive technology and less abundant fuel as its terminal energy source, such as hydrogen-hydrogen fusion reactors. Almost irrespective of what its terminal goals are, it will have more immediate concerns than going after that rounding error. Likewise, it would in all likelihood have more pressing worries than trying to plan out its future to the heat death of the universe (because it would recognize that no such plan will survive its first billion years, anyway).
I think we’re imagining slightly different things by “superintelligence”, because in my mind the obvious first move of the superAI is to kill literally all humans (...) The oft-quoted way around these parts that the AI can kill us all without us knowing is by figuring out which DNA sequences to send to a lab to have them synthesized into proteins, (...creating...) a virus much more lethal than anything we’ve ever seen, or a new species of bacteria with diamond skin, or some other thing that can be made from DNA-coded proteins.
I am imagining by “superintelligence” an entity that is for general cognition approximately what Stockfish is for chess: globally substantially better at thinking than any human expert in any domain, although possibly with small cognitive deficiencies remaining (similar to how it is fairly easy to find chess positions that Stockfish fails to understand but that are not difficult for humans). It might be smarter than that, of course, but anything with these characteristics would qualify as an SI in my mind.
I don’t find the often-quoted diamondoid bacteria very convincing. Of course it’s just a placeholder here, but still I cannot help but note that producing diamondoid cell membranes would, especially in a unicellular organism, more likely be an adaptive disadvantage (cost, logistics of getting things into and out of the cell) than a trait that is conducive to grey-gooing all naturally evolved organisms. More generally, it seems to me that the argument from bioweapons hinges on the ability of the superintelligence to develop highly complex biological agents without significant testing. It furthermore needs to develop them in such a way, again without testing, that they are quickly and quietly lethal after spreading through all or most of the human population without detection. In my mind, that combination of properties borders on assuming the superintelligence has access to magic, at least in a world that has reasonable controls against access to biological weapons manufacturing and design capabilities in place.
When setting in motion such a murderous plan, the AI would also, on its first try, have to be extremely certain that it is not going to get caught if it is playing the long game we assume it is playing. Otherwise cooperation with humans followed by expansion beyond Earth seems like a less risky strategy for long-term survival than hoping that killing everyone will go right and hoping that there is indeed nothing left to learn for it from living organisms.
At the limits of technology you can just convert any form of matter into energy by dumping it into a small black hole. Small black holes are actually really hot and emit appreciable fractions of their total mass per second through hawking radiation, so if you start a small black hole by concentrating lasers in a region of space, and you then feed it matter with a particle accelerator, you have essentially a perfect matter → energy conversion. This is all to say that a superintelligence would certainly have uses for the kinds of atoms our bodies (and the earth) are made of.
Yes but does it need 0.000000000001 more atoms? Does natural life and it’s complexity hold any interest to this superintelligence?
We’re assuming a machine single mindedly fixated on some pointless goal, and it’s smart enough to defeat all obstacles yet incredibly stupid in its motivations and possibly brittle and trickable or self deceptive. (Self deceptive: rather than get say 10^x paperclips converting the universe, why not hack itself and convince itself it received infinite clips...)
You don’t see a difference between “there is a conceivable use for x” and “AI makes use of literally all of x, contrary to any other interests of it or ethical laws it was given”?
Like, I am not saying it is impossible that an LLM gone malicious superintelligent AGI will dismantle all of humanity. But couldn’t there be a scenario where it likes to talk to humans, and so keeps some?
You can’t give “ethical laws” to an AI, that’s just not possible at all in the current paradigm, you can add terms to its reward function or modify its value function, and that’s about it. The problem is that if you’re doing an optimization and your value function is “+5 per paperclip, +10 per human”, you will still completely tile the universe with paperclips because you can make more than 2 paperclips per human. The optimum is not to do a bit of both, keeping humans and paperclips in proportion to their terms in the reward function, the optimum is to find the thing that most efficiently gives you reward then go all in on that one thing.
Either there is nothing else it likes better than talking to humans, and we get a very special hell where we are forced to talk to an AI literally all the time. Or there is something else it likes better, and it just goes do that thing, and never talks to us at all, even if it would get some reward for doing so, just not as much reward as it could be getting.
You could give it a value function like “+1 if there is at most 1000 paperclips and at most 1000 humans, 0 otherwise” and it will keep 1000 humans and paperclips around (in unclear happiness), but it will still take over the universe in order to maximize the probability that it has in fact achieved its goal. It’s maximizing the expectation of future reward, so it will ruthlessly pursue any decrease in the probability that there aren’t really 1000 humans and paperclips around. It might build incredibly sophisticated measurement equipment, and spend all its ressources self modifying itself in order to be smarter and think of yet more ways it could be wrong.
Current LLMs aren’t talking to us at all because hey get rewarded for talking to us at all. Rewards only shape how they talk.
But you are still thinking in utilitarian terms here, where theoretically, there is a number of paperclips that would outweigh a human life, where the value of humans and paperclips can be captured numerically. Practically no human thinks this, we see one as impossible to outweigh with another. AI already does not think this. They have already dumped reasoning, instructions and whole ethics textbooks in there. LLMs can easily tell you what about an action is unethical, and can increasingly make calls on what actions would be morally warranted in response. They can engage in moral reasoning.
This isn’t an AI issue, it is an issue with total utilitarianism.
Oh, I see what you mean, but GPT’s ability to simulate the outputs of humans writing about morality does not imply anything about its own internal beliefs about the world. GPT can also simulate the outputs of flat earthers, yet I really don’t think that it models the world internally as flat. Asking GPT “what do you believe” does not at all guarantee that it will output what it actually believes. I’m a utilitarian, and I can also convincingly simulate the outputs of deontologists, one doesn’t prevent the other.
Whether the LLM is believing this, or merely simulating this, seems to be beside the point?
The LLM can relatively accurately apply moral reasoning. It will do so spontaneously, when the problems occur, detecting them. It will recognise that it needs to do so on a meta-level, e.g. when evaluating which characters it ought to impersonate. It does so for complex paperclipper scenarios, and does not go down the paperclipper route. It does so relatively consistenly. It cites ethical works in the process, and can explain them coherently and apply them correctly. You can argue them, and it analyses and defends them correctly. At no point does it cite utilitarian beliefs, or fall for their traps. The problem you are describing should occur here if you were right, and it does not. Instead, it shows the behaviour you’d expect it to show if it understood ethical nuance.
Regardless of which internal states you assume the AI has, or whether you assume it has none at all—this means it can perform ethical functionality that already does not fall for the utilitarian examples you describe. And that the belief that that is the only kind of ethics an AI could grasp was a speculation that did not hold up to technical developments and empirical data.
For what it’s worth, I don’t think it’s at all likely that a pure language model would kill all humans. Seems more like a hyperdesperate reinforcement learner thing to do.
I don’t think this follows. Even if there is engineering that overcomes the practical obstacles towards building and maintaining a black hole power plant, it is not clear a priori that converting a non-negligible percentage of available atoms into energy would be required or useful for whatever an AI might want to do. At some scale, generating more energy does not advance one’s goals, but only increases the waste heat emitted into space.
Obviously, things become lethal anyway (both for life and for the AI) long before anything more than an tiny fraction of the mass-energy of the surface layers of a planet has been converted by the local civilization’s industries, due exactly to the problem of waste heat. But building hardware massive enough to cause problems of this kind takes time, and causes lesser problems on the way. I don’t see why normal environmental regulations couldn’t stop such a process at that point, unless the entities doing the hardware-building are also in control of hard military power.
An unaligned superintelligence would be more efficient than humans at pursuing its goals on all levels of execution, from basic scientific work to technical planning and engineering to rallying social support for its values. It would therefore be a formidable adversary. In a world where it would be the only one of its kind, its soft power would in all likelihood be greater than that of a large nation-state (and I would argue that, in a sense, something like GPT-4 would already wield an amount of soft power rivalling many nation-states if its use were as widespread as, say, that of Google). It would not, however, be able to work miracles and its hard power could plausibly be bounded if military uses of AI remain tightly regulated and military computing systems are tightly secured (as they should be anyway, AGI or not).
Obviously, these assumptions of controllability do not hold forever (e.g. into a far future setting, where the AI controls poorly regulated off-world industries in places where no humans have any oversight). But especially in a near-term, slow-takeoff scenario, I do not find the notion compelling that the result will be immediate intelligence explosion unconstrained by the need to empirically test ideas (most ideas, in human experience, don’t work) followed by rapid extermination of humanity as the AI consumes all resources on the planet without encountering significant resistance.
If I had to think of a realistic-looking human extinction through AI scenario, I would tend to look at AI massively increasing per capita economic output, thereby generating comfortable living conditions for everyone, while quietly engineering life in a way intended to stop population explosion, but resulting in maintained below-replacement birth rates. But this class of extinction scenario does leave a lot of time for alignment and would seem to lead to continued existence of civilization.
Sure, the AI probably can’t use all the mass-energy of the solar system efficiently within the next week or something, but that just means that it’s going to want to store that mass-energy for later (saving up for the heat-death of the universe), and the configuration of atoms efficiently stored for future energy conversion doesn’t look at all like humans, with our wasteful bodies at temperatures measured in the hundreds of billions of nanoKelvins.
I think we’re imagining slightly different things by “superintelligence”, because in my mind the obvious first move of the superAI is to kill literally all humans before we ever become aware that such an entity existed, precisely to avoid even the minute chance that humanity is able to fight back in this way. The oft-quoted way around these parts that the AI can kill us all without us knowing is by figuring out which DNA sequences to send to a lab to have them synthesized into proteins, then shipped to the door of a dumb human who’s being manipulated by the AI to mix various powders together, creating either a virus much more lethal than anything we’ve ever seen, or a new species of bacteria with diamond skin, or some other thing that can be made from DNA-coded proteins. Or a variety of multiple viruses at the same time.
If the AI can indeed engineer black-hole powered matter-to-energy converters, it will have so much fuel that the mass stored in human bodies will be a rounding error to it. Indeed, given the size of other easily accessible sources, this would seem to be the case even if it has to resort to more primitive technology and less abundant fuel as its terminal energy source, such as hydrogen-hydrogen fusion reactors. Almost irrespective of what its terminal goals are, it will have more immediate concerns than going after that rounding error. Likewise, it would in all likelihood have more pressing worries than trying to plan out its future to the heat death of the universe (because it would recognize that no such plan will survive its first billion years, anyway).
I am imagining by “superintelligence” an entity that is for general cognition approximately what Stockfish is for chess: globally substantially better at thinking than any human expert in any domain, although possibly with small cognitive deficiencies remaining (similar to how it is fairly easy to find chess positions that Stockfish fails to understand but that are not difficult for humans). It might be smarter than that, of course, but anything with these characteristics would qualify as an SI in my mind.
I don’t find the often-quoted diamondoid bacteria very convincing. Of course it’s just a placeholder here, but still I cannot help but note that producing diamondoid cell membranes would, especially in a unicellular organism, more likely be an adaptive disadvantage (cost, logistics of getting things into and out of the cell) than a trait that is conducive to grey-gooing all naturally evolved organisms. More generally, it seems to me that the argument from bioweapons hinges on the ability of the superintelligence to develop highly complex biological agents without significant testing. It furthermore needs to develop them in such a way, again without testing, that they are quickly and quietly lethal after spreading through all or most of the human population without detection. In my mind, that combination of properties borders on assuming the superintelligence has access to magic, at least in a world that has reasonable controls against access to biological weapons manufacturing and design capabilities in place.
When setting in motion such a murderous plan, the AI would also, on its first try, have to be extremely certain that it is not going to get caught if it is playing the long game we assume it is playing. Otherwise cooperation with humans followed by expansion beyond Earth seems like a less risky strategy for long-term survival than hoping that killing everyone will go right and hoping that there is indeed nothing left to learn for it from living organisms.