I wanted to note that I’m also quite worried about this when it comes to debate (or, really, most things with human models), but think this is an empirical question about human psychology / training mechanisms, where making progress on it (at this stage) looks a lot like just making generic progress. If we have a small set of people who can judge debates, that’s sufficient to eventually use this; if we can identify the set of cognitive tools judges need to have in order to succeed, that’s useful; but until we have a bunch of debates and throw humans at them, it’s not obvious how we’ll get empirical answers to empirical questions.
I would be interested in how much current work on debate is motivated by the “reason works” story where truth reliably has an edge, and how much of it is motivated by something more like computational complexity concerns (where, in the presence of optimization processes, you can apply a constraint to an execution trace and the processes generalize that constraint out to their possibility space). For the latter, you might give up on human-readable language and human evaluation of correctness without giving up on the general cognitive structure.
I wanted to note that I’m also quite worried about this when it comes to debate (or, really, most things with human models), but think this is an empirical question about human psychology / training mechanisms, where making progress on it (at this stage) looks a lot like just making generic progress. If we have a small set of people who can judge debates, that’s sufficient to eventually use this; if we can identify the set of cognitive tools judges need to have in order to succeed, that’s useful; but until we have a bunch of debates and throw humans at them, it’s not obvious how we’ll get empirical answers to empirical questions.
I would be interested in how much current work on debate is motivated by the “reason works” story where truth reliably has an edge, and how much of it is motivated by something more like computational complexity concerns (where, in the presence of optimization processes, you can apply a constraint to an execution trace and the processes generalize that constraint out to their possibility space). For the latter, you might give up on human-readable language and human evaluation of correctness without giving up on the general cognitive structure.