Fabien Roger comments on What Discovering Latent Knowledge Did and Did Not Find

Fabien Roger 15 Mar 2023 11:45 UTC
3 points
2
I think that a (linear) ensemble of linear probes (trained with Logistic Regression) should never be better than a single linear probe (otherwise the optimizer would have just found this combined linear probe instead). Therefore, I don’t expect that ensembling 20 linear CCS probe will increase performance much (and especially not beyond the performance of supervised linear regression).
Feel free to run the experiment if you’re interested about it!
- Seb Farquhar 3 Apr 2023 10:50 UTC
  3 points
  1
  Parent
  It will often be better on the test set (because of averaging uncorrelated errors).