Note: The absence of a catastrophe is also still hard to specify and will take a lot of effort, but the hardness is concentrated on bridging between high-level human concepts and the causal mechanisms in the world by which an AI system can intervene. For that...
Is the lack of a catastrophe intended to last forever, or only a fixed amount of time (ie 10 years, until turned off)
For all Time.
Say this AI looks to the future, it sees everything disassembled by nanobots. Self replicating bots build computers. Lots of details about how the world was are being recorded. Those recordings are used in some complicated calculation. Is this a catastrophe?
The answer sensitively depends on the exact moral valance of these computations, not something easy to specify. If the catastrophe prevention AI bans this class of scenarios, it significantly reduces future value, if it permits them, it lets through all sorts of catastrophes.
For a while.
If the catastrophe prevention is only designed to last a while while other AI is made, then we can wait for the uploading. But then, an unfriendly AI can wait too. Unless the anti catastrophe AI is supposed to ban all powerful AI systems that haven’t been greenlit somehow? (With a greenlighting process set up by human experts, and the AI only considers something greenlit if it sees it signed with a particular cryptographic key.) And the supposedly omnipotent catastrophe prevention AI has been programmed to stop all other AI’s exerting excess optimization on us. (in some way that lets us experiment while shielding us from harm.)
Is the lack of a catastrophe intended to last forever, or only a fixed amount of time (ie 10 years, until turned off)
For all Time.
Say this AI looks to the future, it sees everything disassembled by nanobots. Self replicating bots build computers. Lots of details about how the world was are being recorded. Those recordings are used in some complicated calculation. Is this a catastrophe?
The answer sensitively depends on the exact moral valance of these computations, not something easy to specify. If the catastrophe prevention AI bans this class of scenarios, it significantly reduces future value, if it permits them, it lets through all sorts of catastrophes.
For a while.
If the catastrophe prevention is only designed to last a while while other AI is made, then we can wait for the uploading. But then, an unfriendly AI can wait too. Unless the anti catastrophe AI is supposed to ban all powerful AI systems that haven’t been greenlit somehow? (With a greenlighting process set up by human experts, and the AI only considers something greenlit if it sees it signed with a particular cryptographic key.) And the supposedly omnipotent catastrophe prevention AI has been programmed to stop all other AI’s exerting excess optimization on us. (in some way that lets us experiment while shielding us from harm.)
Tricky. But maybe doable.
Yes, it’s the latter. See also the Open Agency Keyholder Prize.