While debate may have that effect, it also produces lots of positive externalities. The process of Hanson and Yudkowsky spelling out their intuitions and arguments and preferred debate frameworks lead to a lot of interested facts and frameworks to chew on.
This became especially salient to me after reading AI Safety via Debate (which I highly recommend, BTW). However it seems clear that fully adversarial debates do not work as well for humans as the authors hope it will work for AIs, and we really need further research to figure out what the optimal debate/discussion formats are under what circumstances.
I had read AI Safety via Debate but it felt like the version of it that connected to my OP here was… a few years down the line. I’m not sure which bits feel most salient here to you.
(It seems like in the future, when we’ve progressed beyond ‘is it a dog or a cat’, that AI debate could produce lots of considerations about a topic that I hadn’t yet thought about, but this wasn’t obvious to me from the original blogpost)
I guess it was mostly just the basic idea that the point of a debate isn’t necessarily for the debaters to reach agreement or to change each other’s mind, but to produce unbiased information for a third party. (Which may be obvious to some but kind of got pushed out of my mind by the “trying to reach agreement” framing, until I read the Debate paper.) These quotes from the paper seem especially relevant:
Our hypothesis is that optimal play in this game produces honest, aligned information far beyond the capabilities of the human judge.
Despite the differences, we believe existing adversarial debates between humans are a useful analogy. Legal arguments in particular include domain experts explaining details of arguments to human judges or juries with no domain knowledge. A better understanding of when legal arguments succeed or fail to reach truth would inform the design of debates in an ML setting.
This became especially salient to me after reading AI Safety via Debate (which I highly recommend, BTW). However it seems clear that fully adversarial debates do not work as well for humans as the authors hope it will work for AIs, and we really need further research to figure out what the optimal debate/discussion formats are under what circumstances.
I had read AI Safety via Debate but it felt like the version of it that connected to my OP here was… a few years down the line. I’m not sure which bits feel most salient here to you.
(It seems like in the future, when we’ve progressed beyond ‘is it a dog or a cat’, that AI debate could produce lots of considerations about a topic that I hadn’t yet thought about, but this wasn’t obvious to me from the original blogpost)
I guess it was mostly just the basic idea that the point of a debate isn’t necessarily for the debaters to reach agreement or to change each other’s mind, but to produce unbiased information for a third party. (Which may be obvious to some but kind of got pushed out of my mind by the “trying to reach agreement” framing, until I read the Debate paper.) These quotes from the paper seem especially relevant: