abhatt349 comments on What’s up with LLMs representing XORs of arbitrary features?

abhatt349 4 Jan 2024 6:12 UTC
3 points
−2
If it’s easy enough to run, it seems worth re-training the probes exactly the same way, except sampling both your train and test sets with replacement from the full dataset. This should avoid that issue. It has the downside of allowing some train/test leakage, but that seems pretty fine, especially if you only sample like 500 examples for train and 100 for test (from each of cities and neg_cities).
I’d strongly hope that after doing this, none of your probes would be significantly below 50%.