ryan_greenblatt comments on Debating with More Persuasive LLMs Leads to More Truthful Answers

ryan_greenblatt 8 Feb 2024 21:04 UTC
LW: 7 AF: 5
4
AF
Thanks for the response!

I think I agree with everything you said and I appreciate the level of thoughtfulness.

Yeah we tried a bunch of other tasks early on, which we discuss in Appendix C.

Great! I appreciate the inclusion of negative results here.

Of course this is not the same as human debaters who know their judge will be an LLM—in that case I’d imagine debaters trying out a lot of weird adversarial strategies.

Yep, I’d be interested in this setup, but maybe where we ban egregious jailbreaks or simillar.