One thing that informs my thinking on this is the idea of a https://en.wikipedia.org/wiki/Core_catcher . Nuclear reactors have many layers of defense. There are many separate cooling systems, from redundant pumps, ports to connect auxiliary pumps, water sprays that can cool the core, backup on the backup pumps that use steam. The fuel pellets are clad in zirconium which is not cheap partly for it’s high melting point. Passively there is just a lot of water in the core that has to boil to expose it.
Yet ultimately all the defenses will fail and you have this nasty mess of radioactive lava which is fuel mixed with pieces of everything else. A human will receive a lethal dose almost immediately, and current robots will also usually fail as well. (there’s 3 of these messes at Fukushima also)
So what to do? Well, humans have designed the floor below the reactor to separate the lava into subcritical masses passively. It’s done by simply improving the concrete and shaping it into channels.
What are the takeaways?
For AI systems, if “escape and betray humanity” is a plausible threat, you need to assume it occasionally will happen. You need some kind of plan or mitigation to deal with this actually being a reality. It isn’t plausible for any containment not to fail eventually.
Whatever your fallback plan is needs to be simple. It needs to work when it counts.
As for what you might come up with, it depends. If you assume an ASI needs specific types of hardware to exist, the heat dissipation as it runs are visible with satellites, and it cannot escape to space or the oceans without a lengthy preparation process, then keeping records and inspections for all the large compute clusters would be an effective backup plan, as well as a plan to reach the infected hardware and disconnect it.
Obviously if you assume the ASI can just do ‘anything’ then you can’t make any plans.
One thing that informs my thinking on this is the idea of a https://en.wikipedia.org/wiki/Core_catcher . Nuclear reactors have many layers of defense. There are many separate cooling systems, from redundant pumps, ports to connect auxiliary pumps, water sprays that can cool the core, backup on the backup pumps that use steam. The fuel pellets are clad in zirconium which is not cheap partly for it’s high melting point. Passively there is just a lot of water in the core that has to boil to expose it.
Yet ultimately all the defenses will fail and you have this nasty mess of radioactive lava which is fuel mixed with pieces of everything else. A human will receive a lethal dose almost immediately, and current robots will also usually fail as well. (there’s 3 of these messes at Fukushima also)
So what to do? Well, humans have designed the floor below the reactor to separate the lava into subcritical masses passively. It’s done by simply improving the concrete and shaping it into channels.
What are the takeaways?
For AI systems, if “escape and betray humanity” is a plausible threat, you need to assume it occasionally will happen. You need some kind of plan or mitigation to deal with this actually being a reality. It isn’t plausible for any containment not to fail eventually.
Whatever your fallback plan is needs to be simple. It needs to work when it counts.
As for what you might come up with, it depends. If you assume an ASI needs specific types of hardware to exist, the heat dissipation as it runs are visible with satellites, and it cannot escape to space or the oceans without a lengthy preparation process, then keeping records and inspections for all the large compute clusters would be an effective backup plan, as well as a plan to reach the infected hardware and disconnect it.
Obviously if you assume the ASI can just do ‘anything’ then you can’t make any plans.