In the limit they seem equivalent: (i) it’s easy for HCH(with X minutes) to discover the equilibrium of a debate game where the judge has X minutes, (ii) a human with X minutes can judge a debate about what would be done by HCH(with X minutes).
The ML training strategies also seem extremely similar, in the sense that the difference between them is smaller than design choices within each of them, though that’s a more detailed discussion.
For reference, this is the topic of section 7 of AI Safety via Debate.
In the limit they seem equivalent: (i) it’s easy for HCH(with X minutes) to discover the equilibrium of a debate game where the judge has X minutes, (ii) a human with X minutes can judge a debate about what would be done by HCH(with X minutes).
The ML training strategies also seem extremely similar, in the sense that the difference between them is smaller than design choices within each of them, though that’s a more detailed discussion.