I agree that not using labels is interesting from a data generation perspective, but I expect this to be useful mostly if you have clean pairs of concepts for which it is hard to get labels—and I think this will not be the case for takeover attempts datasets.
About the performance of LAT: for monitoring, we mostly care about correlation—so LAT is worse IID, and it’s unclear if LAT is better OOD. If causality leads to better generalization properties, then LAT is dominated by mean difference probing (see the screenshot of Zou’s paper below), which is just regular probing with high enough L2 regularization (as shown in the first Appendix of this post).
I agree that not using labels is interesting from a data generation perspective, but I expect this to be useful mostly if you have clean pairs of concepts for which it is hard to get labels—and I think this will not be the case for takeover attempts datasets.
About the performance of LAT: for monitoring, we mostly care about correlation—so LAT is worse IID, and it’s unclear if LAT is better OOD. If causality leads to better generalization properties, then LAT is dominated by mean difference probing (see the screenshot of Zou’s paper below), which is just regular probing with high enough L2 regularization (as shown in the first Appendix of this post).