Much of this post comes from the ideas presented in this paper. I assume the central claim of the paper; that AGI systems should posses the right to make contracts, hold property, and bring tort claims. In this post I aim to flesh out some ideas from the paper that specifically pertain to AI alignment rather than AGI governance.
In a world where AGIs are superintelligent and outperform humans in every economically important task, how do we ensure humanity’s survival and maintain a stable economic and social system? This proposal suggests a mechanism for aligning superintelligent systems with human interests, preventing catastrophic outcomes like mass unemployment, resource inequality, or the possibility of AGIs deeming humanity obsolete.
Humans must maintain a comparative advantage over AGIs, and I am under the belief that doing this beneficially requires that AGIs posses:
(1) A ceaseless objective that can always be further optimized for. There is no “maximum” attainable value.
(2) Completing the subgoals needed to optimize for the objective must incur a higher opportunity cost for the AGI than completing the goals we humans care about.
Why should humans posses a comparative advantage over AGIs?
As argued by this paper, AGIs may dominate humans in every economically important task. But, with fundamentally limited resources such as compute and energy, AGIs would suffer an opportunity cost by executing some tasks instead of others. Those tasks, which may even be necessary for AGIs to pursue their objectives, can be executed by humans instead. For example, say an AGI’s objective is to generate prime numbers. The AGI—being superintelligent—could produce other systems to mentain the GPUs it runs on and the power plants that generate the electricity it needs. But executing these subgoals requires compute and energy that could otherwise be spent on generating prime numbers, so these tasks are left for humans to execute instead.
Assuming there is always a comparative advantage of humans over AGIs, humans will always have economically important tasks to complete. Further, AGIs will always be incentivized to avoid human extinction.
Why do we need (1)?
For humans to maintain any comparative advantage over AGIs, the AGI’s optimization for its objective must be ceaseless. If an AGI can fully maximize its objective and is not deactivated, it may then use its resources to tend to all its subgoals. This reduces the opportunity cost of its subgoals, thereby diminishing any comparative advantage humans might have had. For example, once an AGI has cured all known diseases—assuming that is its only objective—it can devote its resources to building systems that maintain the GPUs it runs on so it can cure new diseases in the future. However, if an AGI’s objective is never fully satisfied and it requires limited resources to pursue this objective, there will always be an opportunity cost associated with its subgoals, preserving human relevance.
A thought expirement in favor of the above, and an example for why we need (2)
Assume we develop an AGI whose only objective is to generate prime numbers. Optimizing for this objective is ceaseless; there are infinitely many prime numbers. And this objective is not a proxy for any human values or goals. But optimizing for this objective requires completing numerous subgoals; maintaining infrastructure for electricity generation, developing the resources needed for this infrastructure (e.g., concrete, glass, …), designing better GPUs, constructing the parts needed for GPUs, etc. To avoid the high opportunity cost of devoting its limited resources to these subgoals when it could be generating prime numbers instead, the AGI can allocate them to humans. In exchange, to incentivize humans to complete these subgoals, the AGI can complete other tasks that the humans care about (e.g., curing diseases, growing crops, producing products and content humans enjoy).
The crux of this system is that the tasks humans must complete to serve the AGI’s objective must incur a higher opportunity cost for the AGI than the tasks that we humans care about. If this condition is met, however, then a system of trading goods and services between AGIs and humans arise. The AGI serves to benefit from benefiting humans. Essentially, we are forcing the AGI to optimize for what humans care about as a prerequisite for optimizing its own objective.
A mechanism for Coherent Extrapolated Volition?
The AGI described above would be incentivized to find out what human desires and motivations are and realize them. In doing so, this AGI would have more capitol to trade with humans in exchange for completing the subgoals needed for optimizing its objective.
A new direction for AI alignment:
Under this framing, progress toward building generally capable, superintelligent AI systems is progress toward building beneficial AI systems. We also need technical solutions to ensure that assumption (2) is met, including investigating the specific conditions under which it holds true.
AI Alignment through Comparative Advantage
Much of this post comes from the ideas presented in this paper. I assume the central claim of the paper; that AGI systems should posses the right to make contracts, hold property, and bring tort claims. In this post I aim to flesh out some ideas from the paper that specifically pertain to AI alignment rather than AGI governance.
In a world where AGIs are superintelligent and outperform humans in every economically important task, how do we ensure humanity’s survival and maintain a stable economic and social system? This proposal suggests a mechanism for aligning superintelligent systems with human interests, preventing catastrophic outcomes like mass unemployment, resource inequality, or the possibility of AGIs deeming humanity obsolete.
Humans must maintain a comparative advantage over AGIs, and I am under the belief that doing this beneficially requires that AGIs posses:
(1) A ceaseless objective that can always be further optimized for. There is no “maximum” attainable value.
(2) Completing the subgoals needed to optimize for the objective must incur a higher opportunity cost for the AGI than completing the goals we humans care about.
Why should humans posses a comparative advantage over AGIs?
As argued by this paper, AGIs may dominate humans in every economically important task. But, with fundamentally limited resources such as compute and energy, AGIs would suffer an opportunity cost by executing some tasks instead of others. Those tasks, which may even be necessary for AGIs to pursue their objectives, can be executed by humans instead. For example, say an AGI’s objective is to generate prime numbers. The AGI—being superintelligent—could produce other systems to mentain the GPUs it runs on and the power plants that generate the electricity it needs. But executing these subgoals requires compute and energy that could otherwise be spent on generating prime numbers, so these tasks are left for humans to execute instead.
Assuming there is always a comparative advantage of humans over AGIs, humans will always have economically important tasks to complete. Further, AGIs will always be incentivized to avoid human extinction.
Why do we need (1)?
For humans to maintain any comparative advantage over AGIs, the AGI’s optimization for its objective must be ceaseless. If an AGI can fully maximize its objective and is not deactivated, it may then use its resources to tend to all its subgoals. This reduces the opportunity cost of its subgoals, thereby diminishing any comparative advantage humans might have had. For example, once an AGI has cured all known diseases—assuming that is its only objective—it can devote its resources to building systems that maintain the GPUs it runs on so it can cure new diseases in the future. However, if an AGI’s objective is never fully satisfied and it requires limited resources to pursue this objective, there will always be an opportunity cost associated with its subgoals, preserving human relevance.
A thought expirement in favor of the above, and an example for why we need (2)
Assume we develop an AGI whose only objective is to generate prime numbers. Optimizing for this objective is ceaseless; there are infinitely many prime numbers. And this objective is not a proxy for any human values or goals. But optimizing for this objective requires completing numerous subgoals; maintaining infrastructure for electricity generation, developing the resources needed for this infrastructure (e.g., concrete, glass, …), designing better GPUs, constructing the parts needed for GPUs, etc. To avoid the high opportunity cost of devoting its limited resources to these subgoals when it could be generating prime numbers instead, the AGI can allocate them to humans. In exchange, to incentivize humans to complete these subgoals, the AGI can complete other tasks that the humans care about (e.g., curing diseases, growing crops, producing products and content humans enjoy).
The crux of this system is that the tasks humans must complete to serve the AGI’s objective must incur a higher opportunity cost for the AGI than the tasks that we humans care about. If this condition is met, however, then a system of trading goods and services between AGIs and humans arise. The AGI serves to benefit from benefiting humans. Essentially, we are forcing the AGI to optimize for what humans care about as a prerequisite for optimizing its own objective.
A mechanism for Coherent Extrapolated Volition?
The AGI described above would be incentivized to find out what human desires and motivations are and realize them. In doing so, this AGI would have more capitol to trade with humans in exchange for completing the subgoals needed for optimizing its objective.
A new direction for AI alignment:
Under this framing, progress toward building generally capable, superintelligent AI systems is progress toward building beneficial AI systems. We also need technical solutions to ensure that assumption (2) is met, including investigating the specific conditions under which it holds true.