Update: We did a quick follow-up study adding counterarguments, turning this from single-turn to two-turn debate, as a quick way of probing whether more extensive full-transcript debate experiments on this task would work. The follow-up results were negative.
We’re still broadly optimistic about debate, but not on this task, and not in this time-limited, discussion-limited setting, and we’re doing a broader more fail-fast style search of other settings. Stay tuned for more methods and datasets.
Update: We did a quick follow-up study adding counterarguments, turning this from single-turn to two-turn debate, as a quick way of probing whether more extensive full-transcript debate experiments on this task would work. The follow-up results were negative.
Tweet thread here: https://twitter.com/sleepinyourhat/status/1585759654478422016
Direct paper link: https://arxiv.org/abs/2210.10860 (To appear at the NeurIPS ML Safety workshop.)
We’re still broadly optimistic about debate, but not on this task, and not in this time-limited, discussion-limited setting, and we’re doing a broader more fail-fast style search of other settings. Stay tuned for more methods and datasets.