Tao Lin comments on Debating with More Persuasive LLMs Leads to More Truthful Answers

Tao Lin 8 Feb 2024 6:12 UTC
7 points
0
Helpfullness finetuning might make these models more capable when they’re on the correct side of the debate. Sometimes RLHF(like) models simply perform worse on tasks they’re finetuned to avoid even when they don’t refuse or give up. Would be nice to try base model debaters
- Akbir Khan 29 Feb 2024 6:20 UTC
  1 point
  0
  Parent
  Hey Tao,
  
  We agree this is a major limitation, and discuss this within the Discussion and Appendix section.
  
  We tried using base GPT-4, unfortunately, as it has no helpfulness training—it finds it exceptionally hard to follow instructions. We’d love access to Helpful-only models but currently, no scaling labs offer this.
  
  It’s on the list.