Readers might also be interested in some of the discussion in this earlier post on “coup probes” which have some discussion of the benefits and limitations of this sort of approach. That said, the actual method for producing a classifier discussed here is substantially different than the one discussed in the linked post. (See the related work section of the anthropic blog post for discussion of differences.)
(COI: Note that I advised on this linked post and the work discussed in it.)
Readers might also be interested in some of the discussion in this earlier post on “coup probes” which have some discussion of the benefits and limitations of this sort of approach. That said, the actual method for producing a classifier discussed here is substantially different than the one discussed in the linked post. (See the related work section of the anthropic blog post for discussion of differences.)
(COI: Note that I advised on this linked post and the work discussed in it.)