Humans actually long ago solved a similar problem, fire safety. Every electrical junction is in a box. Many big buildings are made into separate big concrete boxes with sprinklers. The buildings are separated by gaps from one another with fire lanes and water ports. The city itself is separated into districts that have gaps. The country is separated into cities with gaps between.
Basically every box is surrounded by another box, and it is unlikely for a fire to burn down the earth.
You need a “box” of some kind around all the possible high performance computers an AI could escape to. Like fire you have to assume that careless humans will let AI out of boxes all the time (software bugs, crossing air gaps with an errant network cable, recreational arson/deliberate releases)
You need to have plans, monitoring, a compute cluster building code, an agency that tests the equipment (obviously by red teaming with ASI in isolated labs) and so on.
An escaped AI isn’t hot and glowing and a visible threat. It isn’t obvious that an escape has even occurred or where to draw the lines of the quarantine.
An “escaped” AI means compute clusters in the world that were not assigned to the AI are no longer running a model with the same digitally signed software their operators ordered.
This is falsifiable/measurable. While it is true that there’s nothing glowing or smoke to smell, it is checkable. (Checkable by things like using a separate computer hosted on the same main board to connect to the drive and hash the files containing the model or similar. It’s obviously crucially important to use a tool that cannot be hacked to lie to you)
I think it misframes the problem.
Humans actually long ago solved a similar problem, fire safety. Every electrical junction is in a box. Many big buildings are made into separate big concrete boxes with sprinklers. The buildings are separated by gaps from one another with fire lanes and water ports. The city itself is separated into districts that have gaps. The country is separated into cities with gaps between.
Basically every box is surrounded by another box, and it is unlikely for a fire to burn down the earth.
You need a “box” of some kind around all the possible high performance computers an AI could escape to. Like fire you have to assume that careless humans will let AI out of boxes all the time (software bugs, crossing air gaps with an errant network cable, recreational arson/deliberate releases)
You need to have plans, monitoring, a compute cluster building code, an agency that tests the equipment (obviously by red teaming with ASI in isolated labs) and so on.
An escaped AI isn’t hot and glowing and a visible threat. It isn’t obvious that an escape has even occurred or where to draw the lines of the quarantine.
An “escaped” AI means compute clusters in the world that were not assigned to the AI are no longer running a model with the same digitally signed software their operators ordered.
This is falsifiable/measurable. While it is true that there’s nothing glowing or smoke to smell, it is checkable. (Checkable by things like using a separate computer hosted on the same main board to connect to the drive and hash the files containing the model or similar. It’s obviously crucially important to use a tool that cannot be hacked to lie to you)