Our work here is not arguing that probing is a perfect solution in general; it’s just a single datapoint of how it fares on the models from our Sleeper Agents paper.
Our work here is not arguing that probing is a perfect solution in general; it’s just a single datapoint of how it fares on the models from our Sleeper Agents paper.