I had read AI Safety via Debate but it felt like the version of it that connected to my OP here was… a few years down the line. I’m not sure which bits feel most salient here to you.
(It seems like in the future, when we’ve progressed beyond ‘is it a dog or a cat’, that AI debate could produce lots of considerations about a topic that I hadn’t yet thought about, but this wasn’t obvious to me from the original blogpost)
I guess it was mostly just the basic idea that the point of a debate isn’t necessarily for the debaters to reach agreement or to change each other’s mind, but to produce unbiased information for a third party. (Which may be obvious to some but kind of got pushed out of my mind by the “trying to reach agreement” framing, until I read the Debate paper.) These quotes from the paper seem especially relevant:
Our hypothesis is that optimal play in this game produces honest, aligned information far beyond the capabilities of the human judge.
Despite the differences, we believe existing adversarial debates between humans are a useful analogy. Legal arguments in particular include domain experts explaining details of arguments to human judges or juries with no domain knowledge. A better understanding of when legal arguments succeed or fail to reach truth would inform the design of debates in an ML setting.
I had read AI Safety via Debate but it felt like the version of it that connected to my OP here was… a few years down the line. I’m not sure which bits feel most salient here to you.
(It seems like in the future, when we’ve progressed beyond ‘is it a dog or a cat’, that AI debate could produce lots of considerations about a topic that I hadn’t yet thought about, but this wasn’t obvious to me from the original blogpost)
I guess it was mostly just the basic idea that the point of a debate isn’t necessarily for the debaters to reach agreement or to change each other’s mind, but to produce unbiased information for a third party. (Which may be obvious to some but kind of got pushed out of my mind by the “trying to reach agreement” framing, until I read the Debate paper.) These quotes from the paper seem especially relevant: