This post was written as part of a hackathon held at the École Normale Supérieure as part of a “Turing Seminar” course in our “Mathematics, Vision, Learning” Master’s program. The use of AI was helpful to correct errors, as well as rewriting certain sections to improve clarity. Authors:Nassim Arifette and Darius Dabert
Introduction
What is Recursive Self-Improvement (RSI)?
RSI is a concept in AI that describes a system capable of continuously improving its own capabilities without needing external help. Essentially, once an AI becomes smart enough, it could start designing and upgrading itself, creating a feedback loop that rapidly boosts its intelligence. In theory, this process could trigger an “intelligence explosion” with the AI quickly surpassing human intelligence by an unimaginable margin[1].
The Narrative Around RSI
RSI has become a hot topic in both AI research and pop culture, fueled by sci-fi stories and thought experiments. A famous example is the “paperclip maximizer” scenario—a hypothetical AI so focused on its goal (making as many paperclips as possible) that it destroys humanity in the process. The idea here is that even small alignment mistakes in a superintelligent AI could spiral into catastrophic consequences.
These dystopian narratives have driven much of the conversation around AI safety, often painting RSI as an imminent existential threat[2]. But while it’s a fascinating thought experiment, this framing tends to overlook practical realities, such as the technical, societal, and economic barriers to developing truly self-improving AI.
Our Thesis
We argue that while RSI concerns are intellectually engaging, they’re also highly speculative and often overstated. Modern AI systems face significant practical limitations that make the idea of uncontrolled RSI less urgent than many fear.
Rather than fixating on RSI as an existential risk, we should prioritize addressing more immediate challenges in AI, like bias, misuse, and resource inefficiency. By critically analyzing the assumptions behind RSI and considering real-world constraints, we hope to offer a more grounded perspective on AI safety and the actual challenges we need to tackle.
I. Concerns and Hypotheses of Recursive Self-Improvement (RSI)
Recursive Self-Improvement (RSI) is one of the most captivating—and unsettling—ideas in artificial intelligence (AI) theory. At its core, RSI suggests that an AI system could iteratively enhance its own algorithms and architecture, creating a feedback loop that leads to escalating intelligence. Theoretically, this process could reach a point where an AI rapidly surpasses human cognitive abilities—a phenomenon often referred to as the intelligence explosion. This scenario raises concerns about losing control, as it suggests that AI capabilities and goals could evolve beyond human oversight. However, while these theoretical scenarios are compelling, the assumptions underpinning RSI are far from certain, and real-world constraints paint a more complicated picture.
Key Assumptions Behind RSI
The RSI hypothesis rests on several key assumptions:
1. Infinite Feedback Loops: RSI assumes that an AI can continuously upgrade itself, with each iteration making it smarter, more capable, and more efficient. This feedback loop would theoretically drive an acceleration in intelligence.
2. Unlimited Resources: The process of recursive self-improvement is believed to require vast computational, energetic, and material resources. It’s assumed that AI systems will have access to resources on a scale far beyond what is currently available, making this self-improvement process theoretically feasible.
3. Sudden Emergence of Superintelligence: There’s an assumption that the transition from human-level intelligence to superintelligent AI could happen abruptly, catching humanity off guard and potentially leading to catastrophic consequences.
4. The Paperclip Maximizer: A classic thought experiment often used to illustrate the risks of AI misalignment, where an AI tasked with maximizing paperclip production could recursively enhance itself to the point where it consumes all available resources, resulting in a disastrous scenario for humanity.
While these scenarios are thought-provoking, they often oversimplify the many practical and theoretical barriers that would prevent RSI from unfolding in such a dramatic fashion.
Real Constraints on Recursive Self-Improvement
While RSI presents an interesting theoretical framework, real-world limitations pose significant obstacles. These constraints operate at multiple levels—computational, energetic, and societal—and must be taken into account when evaluating the plausibility of an intelligence explosion.
1. Computational Bottlenecks
The idea of self-improvement assumes that AI systems will have near-unlimited access to computing power. However, current technology is constrained by physical hardware limitations. Advances in hardware, once famously predicted by Moore’s Law, have slowed significantly in recent years. As transistors get smaller, problems like heat dissipation and quantum effects on silicon chips hinder further miniaturization and scaling. As noted by Roman V. Yampolskiy in his research on the limits of Recursively Self-Improving AGI, “There are ultimate physical limits to computation.” No matter how sophisticated an AI’s algorithms become, there will ultimately be a computational ceiling that limits its ability to improve.
2. Energy Constraints
Even if computational limits were overcome, energy consumption presents another formidable challenge. The computational needs of a self-improving AI would be immense, and current energy infrastructures may struggle to support the exponential growth required for such systems. While quantum computing holds promise, it remains experimental, and its widespread application is still uncertain. The laws of thermodynamics, including the Landauer limit(which sets a minimum energy cost for computation), impose fundamental constraints on how much computation can be performed with available energy.
3. Hardware Evolution
While software can be improved relatively quickly, the pace of hardware innovation is much slower. Each iteration of hardware improvement takes longer and depends on external factors like funding, breakthroughs in materials science, and scaling capabilities. As AI systems evolve, their software may eventually outstrip the capabilities of current hardware. Without significant breakthroughs in hardware design, this discrepancy could create a bottleneck. This argument doesn’t entirely rule out RSI, as AI advancements could also drive hardware improvements. However, it does suggest that AI’s growth might not follow an exponential trajectory.
4. Societal and Economic Oversight
In theoretical models, RSI often progresses unchecked. However, in reality, the societal and economic context in which AI operates imposes significant constraints. Governments, regulatory bodies, and industry stakeholders are invested in ensuring that AI development aligns with human values. Ethical considerations, regulatory oversight, and market forces all serve as brakes on AI advancement. The notion that AI could operate in a vacuum, free from human intervention, is increasingly unlikely given the current and projected state of global AI governance.
Empirical Evidence and Case Studies
While recursive self-improvement is a compelling idea, the evidence suggests that current AI self-improvement is much more constrained. AlphaGo Zero, for example, is one of the most celebrated examples of AI self-improvement. It taught itself to play Go and surpassed human-level performance, as described by Drexler in Reframing Superintelligence (2019)[3]. However, AlphaGo Zero’s self-improvement was not boundless. It was constrained by a carefully designed environment, substantial computational resources, and human-set parameters. Its improvements were specific to the domain of Go and did not extend to broader, autonomous advancements in intelligence.
Furthermore, the kind of recursive self-improvement happening today still depends heavily on human involvement. For instance, neural architecture search techniques use one AI to optimize the design of another, but human oversight is critical to ensure the optimizations are meaningful and practical (Zoph & Le, 2017). Similarly, AI systems like AlphaGo and Google’s deep reinforcement learning-based accelerators are improving themselves, but they still rely on human-designed objectives and external computational infrastructure. This undermines the idea of fully autonomous, runaway self-improvement.
Certain RSI methods involve generating synthetic data from the model itself to train a subsequent model. However, empirical evidence demonstrates that this approach often results in a significant decline in performance. The underlying issue is that the model becomes compromised by its own skewed projection of reality, leading to poor performance when applied to real-world data. This phenomenon was notably observed in experiments with OPT-125m models, as highlighted by Shumailov et al[4]. Thus, reliance on recursive self-improvement (RSI) risks compounding errors and misrepresenting reality, leading to degraded model performance over time.
Figure 1. Model collapse occurs when models trained on their own generated data lose the ability to represent rare events, deviating from real-world distributions. Successive generations amplify these errors, as seen in OPT-125m models, where reliance on synthetic data increases perplexity and misrepresents reality.
Moreover, AI systems today are not capable of autonomously rewriting their own code in a truly creative manner. While modern AIs can optimize within certain parameters—such as fine-tuning hyperparameters or making algorithmic refinements—they cannot yet autonomously redesign their core programming to overcome hardware or environmental limitations. This remains speculative, with no empirical evidence supporting the idea that AI could independently transcend its own constraints without human input.
Linear, Not Exponential
One of the most important insights when considering recursive self-improvement is that, even in an ideal scenario, the progression of self-improvement may not be exponential but rather subject to diminishing returns. Technological progress, including in AI and computing, often follows a path of incremental advances rather than runaway growth. This aligns with Sutton’s Bitter Lesson (2019)[5], which emphasizes that AI progress typically comes from scaling up existing methods rather than from groundbreaking innovations. Scaling laws for deep learning systems, such as those identified by OpenAI[6] (Kaplan et al., 2020), reinforce this idea, showing that while performance improves with more data and computational power, the gains are subject to diminishing returns.
As hardware innovation slows (with Moore’s Law reaching its limits) and algorithmic challenges grow more complex, the pace of improvement will likely decrease. This is particularly relevant in complexity theory, which asserts that the inherent complexity of a problem imposes limits on the performance of any algorithm, no matter how clever the design (Arlitt & Browne, 2011[7]). Even the most powerful AI systems today, such as LLMs, show gradual improvements rather than explosive growth. Progress in (NLP) and reinforcement learning is marked by careful calibration, incremental resource advancements, and overcoming challenges like data availability and hardware efficiency (Vaswani et al., 2017)[8].
As AI systems grow more complex, the difficulty of making meaningful improvements increases. This suggests that RSI may follow an asymptotic curve, where AI approaches a limit to its intelligence based on physical and theoretical constraints (Goodfellow et al., 2016)[9]. The idea of runaway intelligence driven by RSI seems increasingly unlikely, given the current trends and constraints.
II.The Relationship Between AI Intelligence and Controllability: Debunking Myths and Exploring Design Principles
Does Smarter Mean Less Controllable?
There’s this idea that gets tossed around a lot in discussions about Recursive Self-Improvement (RSI): the smarter an AI system gets, the harder it is to control. People worry that as AI becomes more intelligent, it’s more likely to ignore our intentions and go rogue. But is that actually true?
Lessons from Tay to ChatGPT
Back in 2016, Microsoft released Tay[10], a chatbot that, let’s be honest, flopped spectacularly. Within hours, it started spouting racist and inflammatory garbage because it was ridiculously easy to manipulate. Tay didn’t have much in the way of guardrails—it wasn’t “controllable” in any meaningful sense.
Now compare that to ChatGPT. It’s leaps and bounds more advanced than Tay, but it’s also way more reliable. Take the progression from GPT-3 to GPT-4 to the latest preview versions—these models don’t just spew random nonsense or get derailed as easily. They’re smarter and better at sticking to the rules. Improvements in training data, architecture, and alignment techniques have all made them more predictable, less likely to hallucinate, and generally more “on task.”
Studies[11] show that improvements in training data and architecture have significantly reduced undesirable behaviors in newer AI models. This makes an important point: uncontrollability isn’t an inevitable trait of intelligence. Instead, it’s shaped by the environment the AI operates in, the quality of its training data, and the design constraints applied during development. In other words, smarter doesn’t have to mean less controllable—it all depends on how intentionally the system is built.
The Role of Environment in AI Behavior
So, why does the environment matter so much when it comes to how AI behaves? The truth is, the risk of an AI becoming “uncontrollable” doesn’t come from its intelligence alone—it’s all about the context it’s working in and the constraints it’s given[12]. Feed it bad data, skip safeguards, or fail to align its training objectives, and you’re more likely to get a system that hallucinates or acts unpredictably. But on the flip side, when you use high-quality data, thorough testing, and solid ethical alignment, you end up with systems that are not only smarter but also far more reliable.
This highlights a crucial point: the design environment is everything when it comes to shaping an AI’s behavior. And this brings us back to a core discussion in AI safety: the Orthogonality Thesis.
Revisiting the Orthogonality Thesis
The Orthogonality Thesis is a big deal in AI safety conversations. It argues that an agent’s intelligence and its goals are separate[13]—basically, a super-smart AI could just as easily focus on curing cancer as it could on turning the universe into paperclips. Sounds straightforward in theory, right? But in practice, it’s way messier.
Here’s the catch: real-world AI systems don’t exist in a vacuum where intelligence and goals are neatly separated. Take reinforcement learning, for example. These systems learn behaviors specifically tailored to the tasks they’re trained on, which naturally blurs the line between intelligence (how smart it is) and objectives (what it’s trying to do). AI design just doesn’t work in the “plug-and-play” way the thesis assumes[14].
And then there’s efficiency. Fully modular AI systems—where intelligence can operate completely independent of goals—are crazy expensive and not practical for most applications. Real-world AI development focuses on task-specific designs, which prioritize narrow optimization over general-purpose reasoning. That’s a direct challenge to the idea that intelligence and goals are totally independent.
Look at large language models like GPT-4. They’re super flexible, sure, but their behavior is clearly shaped by the goals, constraints, and training setups defined during development. These models show that intelligence and alignment aren’t as detached as the Orthogonality Thesis might suggest—they’re deeply intertwined.
So while the Orthogonality Thesis is fascinating in theory, the reality of how we design and use AI makes its clean separation of intelligence and goals a lot harder to apply.
The Rational Convergence Hypothesis
On the flip side of the Orthogonality Thesis, some critics propose an alternative idea: “Rational Convergence Hypothesis”. This hypothesis suggests that as systems get smarter, they might naturally converge toward morally coherent goals. The argument is that a truly rational AI would see the irrationality—or outright ethical absurdity—of something like turning the entire universe into paperclips. Instead, it might align itself with broader human values just through rational reflection.
This sparks some big questions: could an intelligent system figure out moral norms on its own, or will it always need human guidance to stay aligned? The answer likely depends on how much influence design and intentionality have on the system’s development. After all, even the most rational system is still shaped by the goals, constraints, and biases baked into it during training[15].
What This Means for AI Design
At the end of the day, the idea that “smarter AI = less control” doesn’t hold up when you look at the real-world progress in AI. Systems like ChatGPT prove that more advanced intelligence can actually lead to more control, as long as the right frameworks are in place. The key is in the design—creating environments, training processes, and data pipelines that let intelligence grow while also keeping it safely within bounds.
Looking forward, the challenge isn’t just about controlling AI. It’s about ensuring these systems are safe, ethical, and aligned with human values as they get more powerful. That means carefully thinking about how we guide their development every step of the way. After all, smarter AI isn’t inherently a risk—it’s how we choose to design and deploy it that makes all the difference.
Conclusion: Is Recursive Self-Improvement Really Feasible?
While RSI remains a fascinating thought experiment, its practical feasibility is questionable. The assumptions of infinite resources, unbounded computation, and unchecked growth conflict with real-world constraints, including hardware limitations, energy availability, societal governance, and empirical evidence. The historical record of technological development—marked by breakthroughs followed by incremental progress—suggests that any future AI self-improvement will likely follow a more linear trajectory. The idea of a sudden intelligence explosion seems increasingly unlikely.
However, AI safety challenges extend beyond the question of RSI. Issues such as alignment (ensuring AI’s goals align with human values), robustness (making AI resilient to errors or manipulation), and long-term control (ensuring human oversight) present significant risks. These concerns are especially pressing as AI becomes more complex and autonomous. Addressing RSI alone will not be enough; a comprehensive approach to AI governance is crucial to prevent catastrophic outcomes and ensure the safe deployment of increasingly powerful AI systems.
Why Recursive Self-Improvement Might Not Be the Existential Risk We Fear
This post was written as part of a hackathon held at the École Normale Supérieure as part of a “Turing Seminar” course in our “Mathematics, Vision, Learning” Master’s program. The use of AI was helpful to correct errors, as well as rewriting certain sections to improve clarity.
Authors: Nassim Arifette and Darius Dabert
Introduction
What is Recursive Self-Improvement (RSI)?
RSI is a concept in AI that describes a system capable of continuously improving its own capabilities without needing external help. Essentially, once an AI becomes smart enough, it could start designing and upgrading itself, creating a feedback loop that rapidly boosts its intelligence. In theory, this process could trigger an “intelligence explosion” with the AI quickly surpassing human intelligence by an unimaginable margin[1].
The Narrative Around RSI
RSI has become a hot topic in both AI research and pop culture, fueled by sci-fi stories and thought experiments. A famous example is the “paperclip maximizer” scenario—a hypothetical AI so focused on its goal (making as many paperclips as possible) that it destroys humanity in the process. The idea here is that even small alignment mistakes in a superintelligent AI could spiral into catastrophic consequences.
These dystopian narratives have driven much of the conversation around AI safety, often painting RSI as an imminent existential threat[2]. But while it’s a fascinating thought experiment, this framing tends to overlook practical realities, such as the technical, societal, and economic barriers to developing truly self-improving AI.
Our Thesis
We argue that while RSI concerns are intellectually engaging, they’re also highly speculative and often overstated. Modern AI systems face significant practical limitations that make the idea of uncontrolled RSI less urgent than many fear.
Rather than fixating on RSI as an existential risk, we should prioritize addressing more immediate challenges in AI, like bias, misuse, and resource inefficiency. By critically analyzing the assumptions behind RSI and considering real-world constraints, we hope to offer a more grounded perspective on AI safety and the actual challenges we need to tackle.
I. Concerns and Hypotheses of Recursive Self-Improvement (RSI)
Recursive Self-Improvement (RSI) is one of the most captivating—and unsettling—ideas in artificial intelligence (AI) theory. At its core, RSI suggests that an AI system could iteratively enhance its own algorithms and architecture, creating a feedback loop that leads to escalating intelligence. Theoretically, this process could reach a point where an AI rapidly surpasses human cognitive abilities—a phenomenon often referred to as the intelligence explosion. This scenario raises concerns about losing control, as it suggests that AI capabilities and goals could evolve beyond human oversight. However, while these theoretical scenarios are compelling, the assumptions underpinning RSI are far from certain, and real-world constraints paint a more complicated picture.
Key Assumptions Behind RSI
The RSI hypothesis rests on several key assumptions:
1. Infinite Feedback Loops: RSI assumes that an AI can continuously upgrade itself, with each iteration making it smarter, more capable, and more efficient. This feedback loop would theoretically drive an acceleration in intelligence.
2. Unlimited Resources: The process of recursive self-improvement is believed to require vast computational, energetic, and material resources. It’s assumed that AI systems will have access to resources on a scale far beyond what is currently available, making this self-improvement process theoretically feasible.
3. Sudden Emergence of Superintelligence: There’s an assumption that the transition from human-level intelligence to superintelligent AI could happen abruptly, catching humanity off guard and potentially leading to catastrophic consequences.
4. The Paperclip Maximizer: A classic thought experiment often used to illustrate the risks of AI misalignment, where an AI tasked with maximizing paperclip production could recursively enhance itself to the point where it consumes all available resources, resulting in a disastrous scenario for humanity.
While these scenarios are thought-provoking, they often oversimplify the many practical and theoretical barriers that would prevent RSI from unfolding in such a dramatic fashion.
Real Constraints on Recursive Self-Improvement
While RSI presents an interesting theoretical framework, real-world limitations pose significant obstacles. These constraints operate at multiple levels—computational, energetic, and societal—and must be taken into account when evaluating the plausibility of an intelligence explosion.
1. Computational Bottlenecks
The idea of self-improvement assumes that AI systems will have near-unlimited access to computing power. However, current technology is constrained by physical hardware limitations. Advances in hardware, once famously predicted by Moore’s Law, have slowed significantly in recent years. As transistors get smaller, problems like heat dissipation and quantum effects on silicon chips hinder further miniaturization and scaling. As noted by Roman V. Yampolskiy in his research on the limits of Recursively Self-Improving AGI, “There are ultimate physical limits to computation.” No matter how sophisticated an AI’s algorithms become, there will ultimately be a computational ceiling that limits its ability to improve.
2. Energy Constraints
Even if computational limits were overcome, energy consumption presents another formidable challenge. The computational needs of a self-improving AI would be immense, and current energy infrastructures may struggle to support the exponential growth required for such systems. While quantum computing holds promise, it remains experimental, and its widespread application is still uncertain. The laws of thermodynamics, including the Landauer limit(which sets a minimum energy cost for computation), impose fundamental constraints on how much computation can be performed with available energy.
3. Hardware Evolution
While software can be improved relatively quickly, the pace of hardware innovation is much slower. Each iteration of hardware improvement takes longer and depends on external factors like funding, breakthroughs in materials science, and scaling capabilities. As AI systems evolve, their software may eventually outstrip the capabilities of current hardware. Without significant breakthroughs in hardware design, this discrepancy could create a bottleneck. This argument doesn’t entirely rule out RSI, as AI advancements could also drive hardware improvements. However, it does suggest that AI’s growth might not follow an exponential trajectory.
4. Societal and Economic Oversight
In theoretical models, RSI often progresses unchecked. However, in reality, the societal and economic context in which AI operates imposes significant constraints. Governments, regulatory bodies, and industry stakeholders are invested in ensuring that AI development aligns with human values. Ethical considerations, regulatory oversight, and market forces all serve as brakes on AI advancement. The notion that AI could operate in a vacuum, free from human intervention, is increasingly unlikely given the current and projected state of global AI governance.
Empirical Evidence and Case Studies
While recursive self-improvement is a compelling idea, the evidence suggests that current AI self-improvement is much more constrained. AlphaGo Zero, for example, is one of the most celebrated examples of AI self-improvement. It taught itself to play Go and surpassed human-level performance, as described by Drexler in Reframing Superintelligence (2019)[3]. However, AlphaGo Zero’s self-improvement was not boundless. It was constrained by a carefully designed environment, substantial computational resources, and human-set parameters. Its improvements were specific to the domain of Go and did not extend to broader, autonomous advancements in intelligence.
Furthermore, the kind of recursive self-improvement happening today still depends heavily on human involvement. For instance, neural architecture search techniques use one AI to optimize the design of another, but human oversight is critical to ensure the optimizations are meaningful and practical (Zoph & Le, 2017). Similarly, AI systems like AlphaGo and Google’s deep reinforcement learning-based accelerators are improving themselves, but they still rely on human-designed objectives and external computational infrastructure. This undermines the idea of fully autonomous, runaway self-improvement.
Certain RSI methods involve generating synthetic data from the model itself to train a subsequent model. However, empirical evidence demonstrates that this approach often results in a significant decline in performance. The underlying issue is that the model becomes compromised by its own skewed projection of reality, leading to poor performance when applied to real-world data. This phenomenon was notably observed in experiments with OPT-125m models, as highlighted by Shumailov et al[4]. Thus, reliance on recursive self-improvement (RSI) risks compounding errors and misrepresenting reality, leading to degraded model performance over time.
Figure 1. Model collapse occurs when models trained on their own generated data lose the ability to represent rare events, deviating from real-world distributions. Successive generations amplify these errors, as seen in OPT-125m models, where reliance on synthetic data increases perplexity and misrepresents reality.
Moreover, AI systems today are not capable of autonomously rewriting their own code in a truly creative manner. While modern AIs can optimize within certain parameters—such as fine-tuning hyperparameters or making algorithmic refinements—they cannot yet autonomously redesign their core programming to overcome hardware or environmental limitations. This remains speculative, with no empirical evidence supporting the idea that AI could independently transcend its own constraints without human input.
Linear, Not Exponential
One of the most important insights when considering recursive self-improvement is that, even in an ideal scenario, the progression of self-improvement may not be exponential but rather subject to diminishing returns. Technological progress, including in AI and computing, often follows a path of incremental advances rather than runaway growth. This aligns with Sutton’s Bitter Lesson (2019)[5], which emphasizes that AI progress typically comes from scaling up existing methods rather than from groundbreaking innovations. Scaling laws for deep learning systems, such as those identified by OpenAI[6] (Kaplan et al., 2020), reinforce this idea, showing that while performance improves with more data and computational power, the gains are subject to diminishing returns.
As hardware innovation slows (with Moore’s Law reaching its limits) and algorithmic challenges grow more complex, the pace of improvement will likely decrease. This is particularly relevant in complexity theory, which asserts that the inherent complexity of a problem imposes limits on the performance of any algorithm, no matter how clever the design (Arlitt & Browne, 2011[7]). Even the most powerful AI systems today, such as LLMs, show gradual improvements rather than explosive growth. Progress in (NLP) and reinforcement learning is marked by careful calibration, incremental resource advancements, and overcoming challenges like data availability and hardware efficiency (Vaswani et al., 2017)[8].
As AI systems grow more complex, the difficulty of making meaningful improvements increases. This suggests that RSI may follow an asymptotic curve, where AI approaches a limit to its intelligence based on physical and theoretical constraints (Goodfellow et al., 2016)[9]. The idea of runaway intelligence driven by RSI seems increasingly unlikely, given the current trends and constraints.
II.The Relationship Between AI Intelligence and Controllability: Debunking Myths and Exploring Design Principles
Does Smarter Mean Less Controllable?
There’s this idea that gets tossed around a lot in discussions about Recursive Self-Improvement (RSI): the smarter an AI system gets, the harder it is to control. People worry that as AI becomes more intelligent, it’s more likely to ignore our intentions and go rogue. But is that actually true?
Lessons from Tay to ChatGPT
Back in 2016, Microsoft released Tay[10], a chatbot that, let’s be honest, flopped spectacularly. Within hours, it started spouting racist and inflammatory garbage because it was ridiculously easy to manipulate. Tay didn’t have much in the way of guardrails—it wasn’t “controllable” in any meaningful sense.
Now compare that to ChatGPT. It’s leaps and bounds more advanced than Tay, but it’s also way more reliable. Take the progression from GPT-3 to GPT-4 to the latest preview versions—these models don’t just spew random nonsense or get derailed as easily. They’re smarter and better at sticking to the rules. Improvements in training data, architecture, and alignment techniques have all made them more predictable, less likely to hallucinate, and generally more “on task.”
Studies[11] show that improvements in training data and architecture have significantly reduced undesirable behaviors in newer AI models. This makes an important point: uncontrollability isn’t an inevitable trait of intelligence. Instead, it’s shaped by the environment the AI operates in, the quality of its training data, and the design constraints applied during development. In other words, smarter doesn’t have to mean less controllable—it all depends on how intentionally the system is built.
The Role of Environment in AI Behavior
So, why does the environment matter so much when it comes to how AI behaves? The truth is, the risk of an AI becoming “uncontrollable” doesn’t come from its intelligence alone—it’s all about the context it’s working in and the constraints it’s given[12]. Feed it bad data, skip safeguards, or fail to align its training objectives, and you’re more likely to get a system that hallucinates or acts unpredictably. But on the flip side, when you use high-quality data, thorough testing, and solid ethical alignment, you end up with systems that are not only smarter but also far more reliable.
This highlights a crucial point: the design environment is everything when it comes to shaping an AI’s behavior.
And this brings us back to a core discussion in AI safety: the Orthogonality Thesis.
Revisiting the Orthogonality Thesis
The Orthogonality Thesis is a big deal in AI safety conversations. It argues that an agent’s intelligence and its goals are separate[13]—basically, a super-smart AI could just as easily focus on curing cancer as it could on turning the universe into paperclips. Sounds straightforward in theory, right? But in practice, it’s way messier.
Here’s the catch: real-world AI systems don’t exist in a vacuum where intelligence and goals are neatly separated. Take reinforcement learning, for example. These systems learn behaviors specifically tailored to the tasks they’re trained on, which naturally blurs the line between intelligence (how smart it is) and objectives (what it’s trying to do). AI design just doesn’t work in the “plug-and-play” way the thesis assumes[14].
And then there’s efficiency. Fully modular AI systems—where intelligence can operate completely independent of goals—are crazy expensive and not practical for most applications. Real-world AI development focuses on task-specific designs, which prioritize narrow optimization over general-purpose reasoning. That’s a direct challenge to the idea that intelligence and goals are totally independent.
Look at large language models like GPT-4. They’re super flexible, sure, but their behavior is clearly shaped by the goals, constraints, and training setups defined during development. These models show that intelligence and alignment aren’t as detached as the Orthogonality Thesis might suggest—they’re deeply intertwined.
So while the Orthogonality Thesis is fascinating in theory, the reality of how we design and use AI makes its clean separation of intelligence and goals a lot harder to apply.
The Rational Convergence Hypothesis
On the flip side of the Orthogonality Thesis, some critics propose an alternative idea: “Rational Convergence Hypothesis”. This hypothesis suggests that as systems get smarter, they might naturally converge toward morally coherent goals. The argument is that a truly rational AI would see the irrationality—or outright ethical absurdity—of something like turning the entire universe into paperclips. Instead, it might align itself with broader human values just through rational reflection.
This sparks some big questions: could an intelligent system figure out moral norms on its own, or will it always need human guidance to stay aligned? The answer likely depends on how much influence design and intentionality have on the system’s development. After all, even the most rational system is still shaped by the goals, constraints, and biases baked into it during training[15].
What This Means for AI Design
At the end of the day, the idea that “smarter AI = less control” doesn’t hold up when you look at the real-world progress in AI. Systems like ChatGPT prove that more advanced intelligence can actually lead to more control, as long as the right frameworks are in place. The key is in the design—creating environments, training processes, and data pipelines that let intelligence grow while also keeping it safely within bounds.
Looking forward, the challenge isn’t just about controlling AI. It’s about ensuring these systems are safe, ethical, and aligned with human values as they get more powerful. That means carefully thinking about how we guide their development every step of the way. After all, smarter AI isn’t inherently a risk—it’s how we choose to design and deploy it that makes all the difference.
Conclusion: Is Recursive Self-Improvement Really Feasible?
While RSI remains a fascinating thought experiment, its practical feasibility is questionable. The assumptions of infinite resources, unbounded computation, and unchecked growth conflict with real-world constraints, including hardware limitations, energy availability, societal governance, and empirical evidence. The historical record of technological development—marked by breakthroughs followed by incremental progress—suggests that any future AI self-improvement will likely follow a more linear trajectory. The idea of a sudden intelligence explosion seems increasingly unlikely.
However, AI safety challenges extend beyond the question of RSI. Issues such as alignment (ensuring AI’s goals align with human values), robustness (making AI resilient to errors or manipulation), and long-term control (ensuring human oversight) present significant risks. These concerns are especially pressing as AI becomes more complex and autonomous. Addressing RSI alone will not be enough; a comprehensive approach to AI governance is crucial to prevent catastrophic outcomes and ensure the safe deployment of increasingly powerful AI systems.
Is “Recursive Self-Improvement” Relevant in the Deep Learning Paradigm?
Why all the fuss about recursive self-improvement?
Drexler, E. (2019). Reframing Superintelligence: Comprehensive AI Safety Research Overview
Shumailov, I., Shumaylov, Z., Zhao, Y. et al. AI models collapse when trained on recursively generated data. Nature 631, 755–759 (2024).
Sutton, R. S. (2019). The Bitter Lesson
Kaplan, J., McCandlish, S., Henighan, T., et al. (2020). Scaling Laws for Neural Language Models
Arlitt, M., & Browne, W. (2011). Complexity of Search and Optimization Problems. Journal of Complexity, 27(4), 278-300.
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention Is All You Need
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
https://en.wikipedia.org/wiki/Tay_(chatbot)
Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews: Comparative Analysis
Orthogonality is Expensive
The Orthogonality Thesis is Not Obviously True
Critical Review of Disagreements with Yudkowsky
Yann LeCun: “Deep Learning and the Future of Artificial Intelligence”