JanB comments on I don’t find the lie detection results that surprising (by an author of the paper)

JanB 8 Oct 2023 15:04 UTC
2 points
1

Your AUCs aren’t great for the Turpin et al datasets. Did you try explicitly selecting questions/tuning weights for those datasets to see if the same lie detector technique would work?

We didn’t try this.

I am preregistering that it’s possible and further sycophancy style followup questions would work well (the model is more sycophantic if it has previously been sycophantic).

This is also my prediction.