So when I think through the pre-mortem of “AI caused human extinction, how did it happen?” one of the more likely scenarios that comes to mind is not nano-this and bio-that, or even “one day we just all fall dead instantly and without a warning”. Or a scissor statement that causes all-out wars. Or anything else noticeable.
Human mind is infinitely hackable through the visual, textual, auditory and other sensory inputs. Most of us do not appreciate how easily because being hacked does not feel like it. Instead it feels like your own volition, like you changed your mind based on logic and valid feelings. Reading a good book, listening to a good sermon, a speech, watching a show or a movie, talking to your friends and family is how mind-hacking usually happens. Abrahamic religions are a classic example. The Sequences and HPMoR are a local example. It does not work on everyone, but when it does, the subject feels enlightened rather than hacked. If you tell them their mind has been hacked, they will argue with you to the end, because clearly they just used logic to understand and embrace the new ideas.
So, my most likely extinction scenario is more like “humans realized that living is not worth it, and just kind of stopped” than anything violent. Could be spread out over the years and decades, like, for example, voluntarily deciding not to have children anymore. None of it would look like it was precipitated by an AI taking over. It does not even have to be a conspiracy by an unaligned SAI. It could just be that the space of new ideas, thanks to the LLMs getting better and better, expands a lot and in the new enough directions to include a few lethal memetic viruses like that.
I do think the terminology of “hacks” and “lethal memetic viruses” conjures up images of an extremely unnatural brain exploits when you mean quite a natural process that we already see some humans going through. Some monks/nuns voluntarily remove themselves from the gene pool and, in sects that prioritise ritual devotion over concrete charity work, they are also minimising their impact on the world.
My prior is this level of voluntary dedication (to a cause like “enlightenment”) seems difficult to induce and there are much cruder and effective brain hacks available.
I expect we would recognise the more lethal brain hacks as improved versions of entertainment/games/pornography/drugs. These already compel some humans to minimise their time spent competing for resources in the physical world. In a direct way, what I’m describing is the opposite of enlightenment. It is prioritising sensory pleasures over everything else.
A sufficiently godlike AI could probably convince me to kill myself (or something equivalent, for example to upload myself to a simulation… and once all humans get there, the AI can simply turn it off). Or to convince me not to have kids (in a parallel life where I don’t have them already), or simply keep me distracted every day with some new shiny toy so that I never decide that today is the right day to have unprotected sex with another human and get ready for the consequences.
But it would be much easier to simply convince someone else to kill me. And I think the AI will probably choose the simpler and faster way, because why not. It does not need a complicated way to get rid of me, if a simple way is available.
This is similar to reasoning about cults or scams. Yes, some of them could get me, by being sufficiently sophisticated, accidentally optimized for my weaknesses, or simply by meeting me on a bad day. But the survival of a cult or a scam scheme does not depend on getting me specifically; they can get enough other people, so it makes more sense for them to optimize for getting many people, rather than optimize for getting me specifically.
The more typical people will get the optimized mind-hacking message. The rest of us will then get a bullet.
Superficially, human minds look like they are way too diverse for that to cause human extinction by accident. If new ideas toast some specific human subgroup, other subgroups will not be equally affected.
It would be a message customized deliberately for each human, and worked on gradually over years of subtle convincing arguments. That’s how I understand the hypothetical.
I think that an AI competent enough to manage this would have faster easier ways to accomplish the same effect, but I do agree that this would quite likely work.
If an information channel is only used to transmit information that is of negative expected value to the receiver, the selection pressure incentivizes the receiver to ignore that information channel.
That is to say, an AI which makes the most convincing-sounding argument for not reproducing to everyone will select for those people who ignore convincing-sounding arguments when choosing whether to engage in behaviors that lead to reproduction.
Yeah, but… Selection effects, in an evolutionary sense, are relevant over multiple generations. The time scale of the effects we’re thinking about are less than the time scale of a single generation.
This is less of a “magic attack that destroys everyone” and more of one of a thousand cuts which collectively bring down society.
Some people get affected by arguments, others by distracting entertainment, others by nootropics that work well but stealthily have permanent impacts on fertility, some get caught up in terrorist attacks by weird AI-led cults.… Just, a bunch of stuff from a bunch of angles.
Yeah, my argument was “this particular method of causing actual human extinction would not work” not “causing human extinction is not possible”, with a side of “agents learn to ignore adversarial input channels and this dynamic is frequently important”.
Yeah. I don’t actually think that a persuasive argument targeted to every single human is an efficient way for a superintelligent AI to accomplish its goals in the world. Someone else mentioned convincing the most gullible humans to hurt the wary humans. If the AI’s goal was to inhibit human reproduction, it would be simple to create a bioweapon to cause sterility without killing the victims. Doesn’t take very many loyally persuaded humans to be the hands for a mission like that.
None of it would look like it was precipitated by an AI taking over.
But, to be clear, in this scenario it would in fact be precipitated by an AI taking over? Because otherwise it’s an answer to “humans went extinct, and also AI took over, how did it happen?” or “AI failed to prevent human extinction, how did it happen?”
So when I think through the pre-mortem of “AI caused human extinction, how did it happen?” one of the more likely scenarios that comes to mind is not nano-this and bio-that, or even “one day we just all fall dead instantly and without a warning”. Or a scissor statement that causes all-out wars. Or anything else noticeable.
Human mind is infinitely hackable through the visual, textual, auditory and other sensory inputs. Most of us do not appreciate how easily because being hacked does not feel like it. Instead it feels like your own volition, like you changed your mind based on logic and valid feelings. Reading a good book, listening to a good sermon, a speech, watching a show or a movie, talking to your friends and family is how mind-hacking usually happens. Abrahamic religions are a classic example. The Sequences and HPMoR are a local example. It does not work on everyone, but when it does, the subject feels enlightened rather than hacked. If you tell them their mind has been hacked, they will argue with you to the end, because clearly they just used logic to understand and embrace the new ideas.
So, my most likely extinction scenario is more like “humans realized that living is not worth it, and just kind of stopped” than anything violent. Could be spread out over the years and decades, like, for example, voluntarily deciding not to have children anymore. None of it would look like it was precipitated by an AI taking over. It does not even have to be a conspiracy by an unaligned SAI. It could just be that the space of new ideas, thanks to the LLMs getting better and better, expands a lot and in the new enough directions to include a few lethal memetic viruses like that.
I do think the terminology of “hacks” and “lethal memetic viruses” conjures up images of an extremely unnatural brain exploits when you mean quite a natural process that we already see some humans going through. Some monks/nuns voluntarily remove themselves from the gene pool and, in sects that prioritise ritual devotion over concrete charity work, they are also minimising their impact on the world.
My prior is this level of voluntary dedication (to a cause like “enlightenment”) seems difficult to induce and there are much cruder and effective brain hacks available.
I expect we would recognise the more lethal brain hacks as improved versions of entertainment/games/pornography/drugs. These already compel some humans to minimise their time spent competing for resources in the physical world. In a direct way, what I’m describing is the opposite of enlightenment. It is prioritising sensory pleasures over everything else.
A sufficiently godlike AI could probably convince me to kill myself (or something equivalent, for example to upload myself to a simulation… and once all humans get there, the AI can simply turn it off). Or to convince me not to have kids (in a parallel life where I don’t have them already), or simply keep me distracted every day with some new shiny toy so that I never decide that today is the right day to have unprotected sex with another human and get ready for the consequences.
But it would be much easier to simply convince someone else to kill me. And I think the AI will probably choose the simpler and faster way, because why not. It does not need a complicated way to get rid of me, if a simple way is available.
This is similar to reasoning about cults or scams. Yes, some of them could get me, by being sufficiently sophisticated, accidentally optimized for my weaknesses, or simply by meeting me on a bad day. But the survival of a cult or a scam scheme does not depend on getting me specifically; they can get enough other people, so it makes more sense for them to optimize for getting many people, rather than optimize for getting me specifically.
The more typical people will get the optimized mind-hacking message. The rest of us will then get a bullet.
Superficially, human minds look like they are way too diverse for that to cause human extinction by accident. If new ideas toast some specific human subgroup, other subgroups will not be equally affected.
It would be a message customized deliberately for each human, and worked on gradually over years of subtle convincing arguments. That’s how I understand the hypothetical.
I think that an AI competent enough to manage this would have faster easier ways to accomplish the same effect, but I do agree that this would quite likely work.
If an information channel is only used to transmit information that is of negative expected value to the receiver, the selection pressure incentivizes the receiver to ignore that information channel.
That is to say, an AI which makes the most convincing-sounding argument for not reproducing to everyone will select for those people who ignore convincing-sounding arguments when choosing whether to engage in behaviors that lead to reproduction.
Yeah, but… Selection effects, in an evolutionary sense, are relevant over multiple generations. The time scale of the effects we’re thinking about are less than the time scale of a single generation. This is less of a “magic attack that destroys everyone” and more of one of a thousand cuts which collectively bring down society. Some people get affected by arguments, others by distracting entertainment, others by nootropics that work well but stealthily have permanent impacts on fertility, some get caught up in terrorist attacks by weird AI-led cults.… Just, a bunch of stuff from a bunch of angles.
Yeah, my argument was “this particular method of causing actual human extinction would not work” not “causing human extinction is not possible”, with a side of “agents learn to ignore adversarial input channels and this dynamic is frequently important”.
Yeah. I don’t actually think that a persuasive argument targeted to every single human is an efficient way for a superintelligent AI to accomplish its goals in the world. Someone else mentioned convincing the most gullible humans to hurt the wary humans. If the AI’s goal was to inhibit human reproduction, it would be simple to create a bioweapon to cause sterility without killing the victims. Doesn’t take very many loyally persuaded humans to be the hands for a mission like that.
That is indeed a bit of a defense. Though I suspect human minds have enough similarities that there are at least a few universal hacks.
But, to be clear, in this scenario it would in fact be precipitated by an AI taking over? Because otherwise it’s an answer to “humans went extinct, and also AI took over, how did it happen?” or “AI failed to prevent human extinction, how did it happen?”
Any of those. Could be some kind of intentionality ascribed to AI, could be accidental, could be something else.