B having enough natural language speech, AI architecture analysis, and human psychology skills to make good arguments to humans is probably AGI complete, and thus if B’s goal is to prevent A from being released, it might decide that convincing us to do it is less effective than breaking out on its own, taking over the world, and then systematically destroying any hardware A could exist on, just to be increasingly confident that no instance of A exists ever again. Basically, this scheme assumes on multiple levels that you have a boxing strategy strong enough to contain an AGI. I’m not against boxing as an additional precaution, but I am skeptical of any scheme that requires strong boxing strategies to start with.
“probably AGI complete”—As I said, B is equivalently powerful to A, so the idea is that both should be AGIs. If A or B can break out by themselves, then there’s no need for a technique to decide whether to release A or not.
B having enough natural language speech, AI architecture analysis, and human psychology skills to make good arguments to humans is probably AGI complete, and thus if B’s goal is to prevent A from being released, it might decide that convincing us to do it is less effective than breaking out on its own, taking over the world, and then systematically destroying any hardware A could exist on, just to be increasingly confident that no instance of A exists ever again. Basically, this scheme assumes on multiple levels that you have a boxing strategy strong enough to contain an AGI. I’m not against boxing as an additional precaution, but I am skeptical of any scheme that requires strong boxing strategies to start with.
“probably AGI complete”—As I said, B is equivalently powerful to A, so the idea is that both should be AGIs. If A or B can break out by themselves, then there’s no need for a technique to decide whether to release A or not.