I thought this was a quite good operationalisation of how hard aligning advanced AI systems might be, which I’ve taken from the conclusion of the overview of the technical landscape. (All of what follows is a direct quote, but it’s not in quotes because the editor can’t do that at the same time as bullet points.)
---
There are a broad range of implicit views about how technically hard it will be to make safe advanced AI systems. They differ on the technical difficulty of safe advanced AI systems, as well as risks of catastrophe, and rationality of regulatory systems. We might characterize them as follows:
Easy: We can, with high reliability, prevent catastrophic risks with modest effort, say 1-10% of the costs of developing the system.
Medium: Reliably building safe powerful systems, whether it be nuclear power plants or advanced AI systems, is challenging. Doing so costs perhaps 10% to 100% the cost of the system (measured in the most appropriate metric, such as money, time, etc.).
But incentives are aligned. Economic incentives are aligned so that companies or organizations will have correct incentives to build sufficiently safe systems. Companies don’t want to build bridges that fall down, or nuclear power plants that experience a meltdown.
But incentives will be aligned. Economic incentives are not perfectly aligned today, as we have seen with various scandals (oil spills, emissions fraud, financial fraud), but they will be after a few accidents lead to consumer pressure, litigation, or regulatory or other responses.[85]
But we will muddle through. Incentives are not aligned, and will never be fully. However, we will probably muddle through (get the risks small enough), as humanity has done with nuclear weapons and nuclear energy.
And other factors will strongly work against safety. Strong profit and power incentives, misperception, heterogenous theories of safety, overconfidence and rationalization, and other pathologies conspire to deprive us of the necessary patience and humility to get it right. This view is most likely if there will not be evidence (such as recoverable accidents) from reckless development, and if the safety function is steep over medium level of inputs (“This would not be a hard problem if we had two years to work on it, once we have the system. It will be almost impossible if we don’t.”).
Hard or Near Impossible: Building a safe superintelligence is like building a rocket and spacecraft for a moon-landing, without ever having done a test launch. It costs [86] greater than, or much greater than, 100% of development costs.
We don’t know.
[85] This assumes that recoverable accidents occur with sufficient probability before non-recoverable accidents.
I thought this was a quite good operationalisation of how hard aligning advanced AI systems might be, which I’ve taken from the conclusion of the overview of the technical landscape. (All of what follows is a direct quote, but it’s not in quotes because the editor can’t do that at the same time as bullet points.)
---
There are a broad range of implicit views about how technically hard it will be to make safe advanced AI systems. They differ on the technical difficulty of safe advanced AI systems, as well as risks of catastrophe, and rationality of regulatory systems. We might characterize them as follows:
Easy: We can, with high reliability, prevent catastrophic risks with modest effort, say 1-10% of the costs of developing the system.
Medium: Reliably building safe powerful systems, whether it be nuclear power plants or advanced AI systems, is challenging. Doing so costs perhaps 10% to 100% the cost of the system (measured in the most appropriate metric, such as money, time, etc.).
But incentives are aligned. Economic incentives are aligned so that companies or organizations will have correct incentives to build sufficiently safe systems. Companies don’t want to build bridges that fall down, or nuclear power plants that experience a meltdown.
But incentives will be aligned. Economic incentives are not perfectly aligned today, as we have seen with various scandals (oil spills, emissions fraud, financial fraud), but they will be after a few accidents lead to consumer pressure, litigation, or regulatory or other responses.[85]
But we will muddle through. Incentives are not aligned, and will never be fully. However, we will probably muddle through (get the risks small enough), as humanity has done with nuclear weapons and nuclear energy.
And other factors will strongly work against safety. Strong profit and power incentives, misperception, heterogenous theories of safety, overconfidence and rationalization, and other pathologies conspire to deprive us of the necessary patience and humility to get it right. This view is most likely if there will not be evidence (such as recoverable accidents) from reckless development, and if the safety function is steep over medium level of inputs (“This would not be a hard problem if we had two years to work on it, once we have the system. It will be almost impossible if we don’t.”).
Hard or Near Impossible: Building a safe superintelligence is like building a rocket and spacecraft for a moon-landing, without ever having done a test launch. It costs [86] greater than, or much greater than, 100% of development costs.
We don’t know.
[85] This assumes that recoverable accidents occur with sufficient probability before non-recoverable accidents.
[86] Yudkowsky, Eliezer. “So Far: Unfriendly AI Edition.” EconLog | Library of Economics and Liberty, 2016. http://econlog.econlib.org/archives/2016/03/so_far_unfriend.html.