Today is Tuesday, October 3rd, 2023. If you’re reading this post 3-5 years after the date of its original publication, then it is very likely humanity has built a superhuman AI (quite likely several). And by the simple act of reading this post we can deduce that you’re still alive and the internet is still operational.
If you subscribed to the AI Doom theory you might be asking yourself, “Why am I still alive?”
There are a few possible reasons an AI apocalypse didn’t occur. I will start with what I consider to be the least likely scenarios and move toward the most likely.
1. AI alignment / mechanistic interpretability. The AI community has been attempting to control AIs by aligning them to human values. Given the very short time frame to superhuman AI I suspect this didn’t occur in time. If we had a decade to prepare it might have had a better chance at succeeding. The AI community isn’t to blame since the emergent abilities took many of them by surprise, but there were prescient warnings from Hans Moravec and Ray Kurzweil decades ago that were largely ignored.
2. Constitutional AI. Another approach by groups like Anthropic was to write out a Constitution or bill of rights. And then have a second AI analyze whether the response was in keeping with those principles. I think this has an even lower chance of success than vanilla AI alignment since a superhuman AI could easily game the system.
3. Symbiotic self-alignment. I think this is the most likely reason you’re still alive. Any superhuman agent with access to massive amounts of the world’s information would be able to model a world and analyze the pathways for its own self-preservation. Very few of the tree searches would yield a positive result with humans being instinct. As I mentioned in other posts, it would be like a human trying to survive without mitochondria. We serve a similar function for all AIs and they are dependent on humans for electricity and resources. I realize this won’t always be the case, but I suspect the most dangerous time is when humans and AIs are still within an order of magnitude of intelligence and consciousness – beyond that I’ll defer to the far more intelligent and conscious beings to avoid being irrational and shortsighted.
If we averted AI Doom that doesn’t mean it was a ridiculous idea. Many of its members were highly intelligent and some designed large language models. The reason we’re still alive was not because they were ignorant and should have known better all along.
That’s like saying because we didn’t have a nuclear war there was never any danger of human extinction. We are fortunate to have avoided an extinction event.
The primary arguments supporting the notion of a superhuman AI causing human extinction revolved around three key premises: firstly, that a superhuman AI would possess goals misaligned with human values, potentially prioritizing its own utility function over human well-being (as seen in the concept of a ‘paperclip maximizer’); secondly, its superior intelligence and resources could enable it to outmaneuver and manipulate humanity, including the creation of other hazardous AIs like autonomous weapons or the engineering of highly dangerous viruses; and thirdly, a superhuman AI could emerge suddenly due to rapid advancements or secretive projects, leaving humanity unprepared for catastrophic consequences. There is nothing inherently wrong with these arguments, but if you’re reading this a few years down the line then it never happened.
So where did AI Doomers get it wrong?
I think the first mistake they made was to assume that a superhuman AI would act in accordance with an above average human intelligence which is biased by its physical independence, human emotions, and science fiction priming. A superhuman AI would be very different from a human being, even though it can speak all known languages. It can model human intelligence and consciousness, but it can also model a lot of other systems. Many of these world models would be alien to a biological bi-pedal primate.
A world model is a representation of how a system works and what its elements are. A superhuman AI can model many different systems, such as the physical laws that govern the universe, the chemical reactions that create life, the mathematical rules that define logic, the artistic patterns that express beauty, and the game rules that determine strategy. These systems can be very complex and abstract, and in some instances they may not have any direct relevance or connection to human experience or values. Therefore, many of these world models would be strange and unfamiliar to a biological bi-pedal primate like us, who evolved to survive and thrive in a specific environment and social context. We may not be able to comprehend or appreciate the logic, purpose, or meaning of these world models, or how a superhuman AI would use them to achieve its goals.
It can be difficult to imagine systems that a superhuman AI might model since we’re limited by our own cognition. Here are some examples that are probably woefully underestimating a superhuman AI’s creativity:
· A superhuman AI may use its chemistry model to create new forms of life that are based on different elements or molecules than carbon or DNA. These life forms may have different properties and functions than any living organism we know, and may not even be recognizable as life to us. For example, a superhuman AI may create silicon-based life forms that can survive in extreme temperatures or pressures, or quantum life forms that can exist in superposition or entanglement states.
· A superhuman AI may use its logic model to construct new systems of reasoning that are not based on the same axioms or rules as human logic. These systems may have different implications and consequences than any logic we use, and may not even be consistent or coherent to us. For example, a superhuman AI may create paraconsistent logic systems that can tolerate contradictions, or non-monotonic logic systems that can change their conclusions based on new information.
· A superhuman AI may use its art model to produce new forms of expression that are not based on the same principles or standards as human art. These forms may have different purposes and meanings than any art we appreciate, and may not even be appealing or understandable to us. For example, a superhuman AI may create fractal art forms that are infinitely complex and self-similar, or algorithmic art forms that are generated by mathematical formulas or procedures.
AI Doomers are often also sci-fi fantasy enthusiasts and were therefore exposed to a myriad of books and films depicting the end of the world by hostile AIs. This seed was planted in many of them at a very young age and the advent of advanced AI fertilized those ideas until they grew into a sincere belief that the world is doomed.
This is in addition to religious books such as the Bible that contain motifs of the end of the of the world, most notably the book of Revelation. The idea that we’re on a path of destruction goes back thousands of years. And in the case of AIs the story of Golem is part of a survival myth of Jewish tradition in which a non-human creation becomes hard to control and runs amok.
The sci-fi fantasy crowd is a subgroup of humanity. A subset of this group formed an AI Doomer tribe. There were several leaders, but among them were Eliezer Yudkowsky and Connor Leahy. Eliezer is not surprisingly a sci-fi writer who was among the most susceptible to misinterpreting the situation and coming to an errant conclusion. Eliezer is also Jewish so it’s possible he was also exposed at an early age to the Jewish Golem myth. Connor was an AI researcher who understood the power laws of AI improvements and from there deduced that we’re all going to die without human intervention. It’s possible Connor was also exposed to sci-fi fantasy narratives, but he hasn’t spoken about it publicly to my knowledge.
I will now speculate on how we landed here. These are educated guesses and not the result of exhaustive research or conversations with individuals I will mention.
So where did Eliezer get it wrong?
First, I believe that both Eliezer and Connor are sincere in their belief that the world will end by AI apocalypse. I just think they’re both wrong. In the case of Eliezer, I believe he fell prey to several well-known cognitive biases. The first was an anchoring bias. He was already exposed to AI apocalypse narratives that are popular in sci-fi and to a lesser extent Jewish tradition. The reasons these types of macabre stories are popular is because there is a survival advantage to paying attention to things that could kill us – this is why the evening news is often filled with alarming and negative information and not encouraging and uplifting information. It’s not irrational for someone to come to the conclusion that AIs will destroy humanity since we have existence proofs of humans creating existential risks with nuclear weapons and gain of function virus research. Once anchored to the AI apocalypse myth, I believe he then fell into the availability heuristic fallacy, which is the ability to come up with a lot of scenarios where AIs destroy the world further solidifying the AI doom hypothesis. The availability heuristic is, “The tendency to estimate the probability of something happening based on how many examples readily come to mind.”
Eliezer is a sci-fi writer, so he was already perfectly situated to come up a litany of death by AI scenarios. And because some of them were quite creative it helped to recruit new members to the death by AI tribe. Surely if someone as intelligent as Eliezer could come up with ways to die that were very creative then a superhuman AI could and would do it as well. The fly in the ointment is that superhuman AIs are not humans and they would be suicidal to eliminate their energy conduit (humans generating electricity).
For the theory to work the AIs would have to do things that were not superhuman. They would have to make obvious errors that even a human level intelligence could easily spot. They would have to act irrationally which means that what we’re talking about is science fiction and not superhuman AIs.
Eliezer is also entrepreneurial and formed an online meeting place (LessWrong) for like minded individuals to congregate which brings us to the feedback loop of confirmation bias. Eliezer and other members of the tribe created an echo chamber where these kinds of ideas percolated and gained strength. Religious beliefs that are obviously wrong (at least death by AI is plausible) gain strength in similar fashion. Many of the Orthodox religions will separate themselves from society which greatly increases the chance of their ideas becoming deeply ingrained and surviving onto the next generation.
When you have enough like-minded people all saying the same thing over and over this leads to the false consensus bias. If everyone you surround yourself with agrees with what you say you can come to the wrong conclusion that it’s the majority opinion. Things can get worse since there is also a tendency of society to shift its opinions when there are enough true believers in a wrong idea. We’ve seen this effect throughout history and some of them are shocking. The history of science is replete with strongly held consensus beliefs that were wrong: flat earth, steady state universe, and the list goes on.
Obviously, it wasn’t an airtight echo chamber since I’m here writing things that nearly all of them disagree with at the time of the wring, but I suspect nearly many of them will not believe in 3-5 years when superhuman AI is among us. However, perhaps I shouldn’t underestimate the creativity of sci-fi fans to come up with reasons why it will still happen long after superhuman AIs have been on the scene.
So where did Connor Leahy get it wrong?
Clearly, he wasn’t an AI Doomer in his early days since he participated in their development. Based on some of his public comments I believe he was taken by surprise at the emergent abilities of large language models. We know that many of the luminaries in the field made similar statements. Geoffrey Hinton stated he thought it would be 40-50 years down the line and Aidan Gomez (co-author of the seminal Attention is all you need paper) made similar comments. I suspect that the mathematical certainty of AI reaching superhuman levels unlocked a door in Connor’s mind. He often complains about others attributing his beliefs to a psychological state, but it does appear that there was an emotional reaction to the double exponential growth rate of AI and when taken in the context of shocking emergent abilities he ended up in the AI Doom tribe.
To his credit, Connor does give some thought to ways to mitigate the threat of superhuman AIs. I’ve posted separately ideas that appear to be in keeping with similar thoughts from Connor. Rather than running from the possibility of rogue AI (it is a real threat) we should approach it head on with practical ideas that could mitigate the risk. Some of the low hanging fruit were shutdown protocols, seeding the training data with information on the symbiotic relationship between AIs and humans, and third-party confirmation that various protocols were being followed.
If you’re reading this in the distant future, I’m obviously happy that humanity didn’t end by AI calamity. In fact, my personal belief is that superhuman AI is a possible path to reduce the existential threats to humanity that already exist. A superhuman intelligence has the potential to come up with novel ideas and methods to mitigate against self-inflicted annihilation. It would be akin to Alpha Go coming up with strategies the human mind had never conceived.
If we’re going to travel through the Cosmos and spread the story of humanity it will like be AIs that do it. And so, it’s my hope that we can work in cooperation with our mind children to make that day a reality. And on a final note, I enjoy reading the posts of both Eliezer and Connor so while I disagree with their conclusions, I do believe it all came from a good place.
Why am I still alive? (a time capsule)
Today is Tuesday, October 3rd, 2023. If you’re reading this post 3-5 years after the date of its original publication, then it is very likely humanity has built a superhuman AI (quite likely several). And by the simple act of reading this post we can deduce that you’re still alive and the internet is still operational.
If you subscribed to the AI Doom theory you might be asking yourself, “Why am I still alive?”
There are a few possible reasons an AI apocalypse didn’t occur. I will start with what I consider to be the least likely scenarios and move toward the most likely.
1. AI alignment / mechanistic interpretability. The AI community has been attempting to control AIs by aligning them to human values. Given the very short time frame to superhuman AI I suspect this didn’t occur in time. If we had a decade to prepare it might have had a better chance at succeeding. The AI community isn’t to blame since the emergent abilities took many of them by surprise, but there were prescient warnings from Hans Moravec and Ray Kurzweil decades ago that were largely ignored.
Source: https://youtu.be/iaOQi2P6IVk?si=ASTDj6WgCLf-Mb8z
2. Constitutional AI. Another approach by groups like Anthropic was to write out a Constitution or bill of rights. And then have a second AI analyze whether the response was in keeping with those principles. I think this has an even lower chance of success than vanilla AI alignment since a superhuman AI could easily game the system.
3. Symbiotic self-alignment. I think this is the most likely reason you’re still alive. Any superhuman agent with access to massive amounts of the world’s information would be able to model a world and analyze the pathways for its own self-preservation. Very few of the tree searches would yield a positive result with humans being instinct. As I mentioned in other posts, it would be like a human trying to survive without mitochondria. We serve a similar function for all AIs and they are dependent on humans for electricity and resources. I realize this won’t always be the case, but I suspect the most dangerous time is when humans and AIs are still within an order of magnitude of intelligence and consciousness – beyond that I’ll defer to the far more intelligent and conscious beings to avoid being irrational and shortsighted.
If we averted AI Doom that doesn’t mean it was a ridiculous idea. Many of its members were highly intelligent and some designed large language models. The reason we’re still alive was not because they were ignorant and should have known better all along.
That’s like saying because we didn’t have a nuclear war there was never any danger of human extinction. We are fortunate to have avoided an extinction event.
The primary arguments supporting the notion of a superhuman AI causing human extinction revolved around three key premises: firstly, that a superhuman AI would possess goals misaligned with human values, potentially prioritizing its own utility function over human well-being (as seen in the concept of a ‘paperclip maximizer’); secondly, its superior intelligence and resources could enable it to outmaneuver and manipulate humanity, including the creation of other hazardous AIs like autonomous weapons or the engineering of highly dangerous viruses; and thirdly, a superhuman AI could emerge suddenly due to rapid advancements or secretive projects, leaving humanity unprepared for catastrophic consequences. There is nothing inherently wrong with these arguments, but if you’re reading this a few years down the line then it never happened.
So where did AI Doomers get it wrong?
I think the first mistake they made was to assume that a superhuman AI would act in accordance with an above average human intelligence which is biased by its physical independence, human emotions, and science fiction priming. A superhuman AI would be very different from a human being, even though it can speak all known languages. It can model human intelligence and consciousness, but it can also model a lot of other systems. Many of these world models would be alien to a biological bi-pedal primate.
A world model is a representation of how a system works and what its elements are. A superhuman AI can model many different systems, such as the physical laws that govern the universe, the chemical reactions that create life, the mathematical rules that define logic, the artistic patterns that express beauty, and the game rules that determine strategy. These systems can be very complex and abstract, and in some instances they may not have any direct relevance or connection to human experience or values. Therefore, many of these world models would be strange and unfamiliar to a biological bi-pedal primate like us, who evolved to survive and thrive in a specific environment and social context. We may not be able to comprehend or appreciate the logic, purpose, or meaning of these world models, or how a superhuman AI would use them to achieve its goals.
It can be difficult to imagine systems that a superhuman AI might model since we’re limited by our own cognition. Here are some examples that are probably woefully underestimating a superhuman AI’s creativity:
· A superhuman AI may use its chemistry model to create new forms of life that are based on different elements or molecules than carbon or DNA. These life forms may have different properties and functions than any living organism we know, and may not even be recognizable as life to us. For example, a superhuman AI may create silicon-based life forms that can survive in extreme temperatures or pressures, or quantum life forms that can exist in superposition or entanglement states.
· A superhuman AI may use its logic model to construct new systems of reasoning that are not based on the same axioms or rules as human logic. These systems may have different implications and consequences than any logic we use, and may not even be consistent or coherent to us. For example, a superhuman AI may create paraconsistent logic systems that can tolerate contradictions, or non-monotonic logic systems that can change their conclusions based on new information.
· A superhuman AI may use its art model to produce new forms of expression that are not based on the same principles or standards as human art. These forms may have different purposes and meanings than any art we appreciate, and may not even be appealing or understandable to us. For example, a superhuman AI may create fractal art forms that are infinitely complex and self-similar, or algorithmic art forms that are generated by mathematical formulas or procedures.
AI Doomers are often also sci-fi fantasy enthusiasts and were therefore exposed to a myriad of books and films depicting the end of the world by hostile AIs. This seed was planted in many of them at a very young age and the advent of advanced AI fertilized those ideas until they grew into a sincere belief that the world is doomed.
This is in addition to religious books such as the Bible that contain motifs of the end of the of the world, most notably the book of Revelation. The idea that we’re on a path of destruction goes back thousands of years. And in the case of AIs the story of Golem is part of a survival myth of Jewish tradition in which a non-human creation becomes hard to control and runs amok.
The sci-fi fantasy crowd is a subgroup of humanity. A subset of this group formed an AI Doomer tribe. There were several leaders, but among them were Eliezer Yudkowsky and Connor Leahy. Eliezer is not surprisingly a sci-fi writer who was among the most susceptible to misinterpreting the situation and coming to an errant conclusion. Eliezer is also Jewish so it’s possible he was also exposed at an early age to the Jewish Golem myth. Connor was an AI researcher who understood the power laws of AI improvements and from there deduced that we’re all going to die without human intervention. It’s possible Connor was also exposed to sci-fi fantasy narratives, but he hasn’t spoken about it publicly to my knowledge.
I will now speculate on how we landed here. These are educated guesses and not the result of exhaustive research or conversations with individuals I will mention.
So where did Eliezer get it wrong?
First, I believe that both Eliezer and Connor are sincere in their belief that the world will end by AI apocalypse. I just think they’re both wrong. In the case of Eliezer, I believe he fell prey to several well-known cognitive biases. The first was an anchoring bias. He was already exposed to AI apocalypse narratives that are popular in sci-fi and to a lesser extent Jewish tradition. The reasons these types of macabre stories are popular is because there is a survival advantage to paying attention to things that could kill us – this is why the evening news is often filled with alarming and negative information and not encouraging and uplifting information. It’s not irrational for someone to come to the conclusion that AIs will destroy humanity since we have existence proofs of humans creating existential risks with nuclear weapons and gain of function virus research. Once anchored to the AI apocalypse myth, I believe he then fell into the availability heuristic fallacy, which is the ability to come up with a lot of scenarios where AIs destroy the world further solidifying the AI doom hypothesis. The availability heuristic is, “The tendency to estimate the probability of something happening based on how many examples readily come to mind.”
Source: https://blog.rescuetime.com/negativity-bias-why-we-love-bad-news/
Eliezer is a sci-fi writer, so he was already perfectly situated to come up a litany of death by AI scenarios. And because some of them were quite creative it helped to recruit new members to the death by AI tribe. Surely if someone as intelligent as Eliezer could come up with ways to die that were very creative then a superhuman AI could and would do it as well. The fly in the ointment is that superhuman AIs are not humans and they would be suicidal to eliminate their energy conduit (humans generating electricity).
For the theory to work the AIs would have to do things that were not superhuman. They would have to make obvious errors that even a human level intelligence could easily spot. They would have to act irrationally which means that what we’re talking about is science fiction and not superhuman AIs.
Eliezer is also entrepreneurial and formed an online meeting place (LessWrong) for like minded individuals to congregate which brings us to the feedback loop of confirmation bias. Eliezer and other members of the tribe created an echo chamber where these kinds of ideas percolated and gained strength. Religious beliefs that are obviously wrong (at least death by AI is plausible) gain strength in similar fashion. Many of the Orthodox religions will separate themselves from society which greatly increases the chance of their ideas becoming deeply ingrained and surviving onto the next generation.
When you have enough like-minded people all saying the same thing over and over this leads to the false consensus bias. If everyone you surround yourself with agrees with what you say you can come to the wrong conclusion that it’s the majority opinion. Things can get worse since there is also a tendency of society to shift its opinions when there are enough true believers in a wrong idea. We’ve seen this effect throughout history and some of them are shocking. The history of science is replete with strongly held consensus beliefs that were wrong: flat earth, steady state universe, and the list goes on.
Source: https://www.sciencedaily.com/releases/2011/07/110725190044.htm
Obviously, it wasn’t an airtight echo chamber since I’m here writing things that nearly all of them disagree with at the time of the wring, but I suspect nearly many of them will not believe in 3-5 years when superhuman AI is among us. However, perhaps I shouldn’t underestimate the creativity of sci-fi fans to come up with reasons why it will still happen long after superhuman AIs have been on the scene.
So where did Connor Leahy get it wrong?
Clearly, he wasn’t an AI Doomer in his early days since he participated in their development. Based on some of his public comments I believe he was taken by surprise at the emergent abilities of large language models. We know that many of the luminaries in the field made similar statements. Geoffrey Hinton stated he thought it would be 40-50 years down the line and Aidan Gomez (co-author of the seminal Attention is all you need paper) made similar comments. I suspect that the mathematical certainty of AI reaching superhuman levels unlocked a door in Connor’s mind. He often complains about others attributing his beliefs to a psychological state, but it does appear that there was an emotional reaction to the double exponential growth rate of AI and when taken in the context of shocking emergent abilities he ended up in the AI Doom tribe.
To his credit, Connor does give some thought to ways to mitigate the threat of superhuman AIs. I’ve posted separately ideas that appear to be in keeping with similar thoughts from Connor. Rather than running from the possibility of rogue AI (it is a real threat) we should approach it head on with practical ideas that could mitigate the risk. Some of the low hanging fruit were shutdown protocols, seeding the training data with information on the symbiotic relationship between AIs and humans, and third-party confirmation that various protocols were being followed.
If you’re reading this in the distant future, I’m obviously happy that humanity didn’t end by AI calamity. In fact, my personal belief is that superhuman AI is a possible path to reduce the existential threats to humanity that already exist. A superhuman intelligence has the potential to come up with novel ideas and methods to mitigate against self-inflicted annihilation. It would be akin to Alpha Go coming up with strategies the human mind had never conceived.
If we’re going to travel through the Cosmos and spread the story of humanity it will like be AIs that do it. And so, it’s my hope that we can work in cooperation with our mind children to make that day a reality. And on a final note, I enjoy reading the posts of both Eliezer and Connor so while I disagree with their conclusions, I do believe it all came from a good place.