I think I agree with everything you said and I appreciate the level of thoughtfulness.
Yeah we tried a bunch of other tasks early on, which we discuss in Appendix C.
Great! I appreciate the inclusion of negative results here.
Of course this is not the same as human debaters who know their judge will be an LLM—in that case I’d imagine debaters trying out a lot of weird adversarial strategies.
Yep, I’d be interested in this setup, but maybe where we ban egregious jailbreaks or simillar.
Thanks for the response!
I think I agree with everything you said and I appreciate the level of thoughtfulness.
Great! I appreciate the inclusion of negative results here.
Yep, I’d be interested in this setup, but maybe where we ban egregious jailbreaks or simillar.