Sheikh Abdur Raheem Ali comments on Reducing sycophancy and improving honesty via activation steering

Sheikh Abdur Raheem Ali 31 Jul 2023 11:49 UTC
1 point
0
I scored the answers using GPT-4.
GPT-4 scores under 60% on TruthfulQA according to page 11 of the tech report. How reliable are these scores?
Also, what do you think about this paper? Inference-Time Intervention: Eliciting Truthful Answers from a Language Model.
- Nina Panickssery 31 Jul 2023 16:57 UTC
  2 points
  0
  Parent
  I provided GPT4 the correct answer from the dataset so that it could compare. So GPT4 doesn’t need to come up with the correct answer itself.