I’ve lately been talking a lot about doublecrux. It seemed good to note some updates I’d also made over the past few months about debate.
For the past few years I’ve been sort of annoyed at debate because it seems like it doesn’t lead people to change their opinions – instead, the entire debate framework seems more likely to prompt people to try to win, meanwhile treating arguments as soldiers and digging in their heels. I felt some frustration at the Hanson/Yudkowsky Foom Debate because huge amounts of digital ink were spilled, and neither party changed their mind much.
The counterpoint that’s been pointed out to me lately is:
While debate may have that effect, it also produces lots of positive externalities. The process of Hanson and Yudkowsky spelling out their intuitions and arguments and preferred debate frameworks lead to a lot of interested facts and frameworks to chew on.
While debate may have that effect, it also produces lots of positive externalities. The process of Hanson and Yudkowsky spelling out their intuitions and arguments and preferred debate frameworks lead to a lot of interested facts and frameworks to chew on.
This became especially salient to me after reading AI Safety via Debate (which I highly recommend, BTW). However it seems clear that fully adversarial debates do not work as well for humans as the authors hope it will work for AIs, and we really need further research to figure out what the optimal debate/discussion formats are under what circumstances.
I had read AI Safety via Debate but it felt like the version of it that connected to my OP here was… a few years down the line. I’m not sure which bits feel most salient here to you.
(It seems like in the future, when we’ve progressed beyond ‘is it a dog or a cat’, that AI debate could produce lots of considerations about a topic that I hadn’t yet thought about, but this wasn’t obvious to me from the original blogpost)
I guess it was mostly just the basic idea that the point of a debate isn’t necessarily for the debaters to reach agreement or to change each other’s mind, but to produce unbiased information for a third party. (Which may be obvious to some but kind of got pushed out of my mind by the “trying to reach agreement” framing, until I read the Debate paper.) These quotes from the paper seem especially relevant:
Our hypothesis is that optimal play in this game produces honest, aligned information far beyond the capabilities of the human judge.
Despite the differences, we believe existing adversarial debates between humans are a useful analogy. Legal arguments in particular include domain experts explaining details of arguments to human judges or juries with no domain knowledge. A better understanding of when legal arguments succeed or fail to reach truth would inform the design of debates in an ML setting.
The fact that such debates can go on for 500 pages without significant updates from either side point towards a failure to 1) systematically determine which arguments are strong and which ones are distractions 2) restrict the scope of the debate so opponents have to engage directly rather than shift to more comfortable ground.
There are also many simpler topics that could have meaningful progress made on them with current debating technology, but they just don’t happen because most people have an aversion to debating.
I’ve lately been talking a lot about doublecrux. It seemed good to note some updates I’d also made over the past few months about debate.
For the past few years I’ve been sort of annoyed at debate because it seems like it doesn’t lead people to change their opinions – instead, the entire debate framework seems more likely to prompt people to try to win, meanwhile treating arguments as soldiers and digging in their heels. I felt some frustration at the Hanson/Yudkowsky Foom Debate because huge amounts of digital ink were spilled, and neither party changed their mind much.
The counterpoint that’s been pointed out to me lately is:
While debate may have that effect, it also produces lots of positive externalities. The process of Hanson and Yudkowsky spelling out their intuitions and arguments and preferred debate frameworks lead to a lot of interested facts and frameworks to chew on.
This became especially salient to me after reading AI Safety via Debate (which I highly recommend, BTW). However it seems clear that fully adversarial debates do not work as well for humans as the authors hope it will work for AIs, and we really need further research to figure out what the optimal debate/discussion formats are under what circumstances.
I had read AI Safety via Debate but it felt like the version of it that connected to my OP here was… a few years down the line. I’m not sure which bits feel most salient here to you.
(It seems like in the future, when we’ve progressed beyond ‘is it a dog or a cat’, that AI debate could produce lots of considerations about a topic that I hadn’t yet thought about, but this wasn’t obvious to me from the original blogpost)
I guess it was mostly just the basic idea that the point of a debate isn’t necessarily for the debaters to reach agreement or to change each other’s mind, but to produce unbiased information for a third party. (Which may be obvious to some but kind of got pushed out of my mind by the “trying to reach agreement” framing, until I read the Debate paper.) These quotes from the paper seem especially relevant:
The fact that such debates can go on for 500 pages without significant updates from either side point towards a failure to 1) systematically determine which arguments are strong and which ones are distractions 2) restrict the scope of the debate so opponents have to engage directly rather than shift to more comfortable ground.
There are also many simpler topics that could have meaningful progress made on them with current debating technology, but they just don’t happen because most people have an aversion to debating.