Kshitij Sachan comments on I don’t find the lie detection results that surprising (by an author of the paper)

Kshitij Sachan 7 Oct 2023 23:21 UTC
3 points
0
Your AUCs aren’t great for the Turpin et al datasets. Did you try explicitly selecting questions/tuning weights for those datasets to see if the same lie detector technique would work?

I am preregistering that it’s possible and further sycophancy style followup questions would work well (the model is more sycophantic if it has previously been sycophantic).
- JanB 8 Oct 2023 15:04 UTC
  2 points
  1
  Parent
  
  Your AUCs aren’t great for the Turpin et al datasets. Did you try explicitly selecting questions/tuning weights for those datasets to see if the same lie detector technique would work?
  
  We didn’t try this.
  
  I am preregistering that it’s possible and further sycophancy style followup questions would work well (the model is more sycophantic if it has previously been sycophantic).
  
  This is also my prediction.