the kind of correctness guarantee this work provides is one I think could be promising for safety: “we designed the structure of the problem so that there could not possibly be a representation anywhere in the problem space which is unsafe”. it still seems like an impossible problem to find such guarantees for the continuous generalization of agentic coprotection, but I think there will turn out to be a version that puts very comfortable bounds on the representation and leaves relatively little to verify with a complicated prover afterwards.
the kind of correctness guarantee this work provides is one I think could be promising for safety: “we designed the structure of the problem so that there could not possibly be a representation anywhere in the problem space which is unsafe”. it still seems like an impossible problem to find such guarantees for the continuous generalization of agentic coprotection, but I think there will turn out to be a version that puts very comfortable bounds on the representation and leaves relatively little to verify with a complicated prover afterwards.