I would also assume that methods developed in challenges like the Trojan Detection Challenge or Universal Backdoor Detection would be good candidates to try out. Not saying that these will always work, but I think for the specific type of backdoors implemented in the sleeper agent paper, they might work.
I would also assume that methods developed in challenges like the Trojan Detection Challenge or Universal Backdoor Detection would be good candidates to try out. Not saying that these will always work, but I think for the specific type of backdoors implemented in the sleeper agent paper, they might work.