Beth Barnes comments on Debate Minus Factored Cognition

Beth Barnes 6 Jan 2021 6:53 UTC
LW: 3 AF: 3
AF
The standard argument against having a non-zero-sum debate game is that then you may incentivise your debaters to collude.

I don’t know if you’ve seen our most recent debate rules and attempt at analysis of whether they provide the desired behavior—seems somewhat relevant to what you’re thinking about here.
- abramdemski 18 Jan 2021 22:57 UTC
  LW: 2 AF: 2
  AF Parent
  I don’t know if you’ve seen our most recent debate rules and attempt at analysis of whether they provide the desired behavior—seems somewhat relevant to what you’re thinking about here.
  I took a look, and it was indeed helpful. However, I left a comment there about a concern I have. The argument at the end only argues for what you call D-acceptability: having no answer that’s judged better after D steps of debate. My concern is that even if debaters are always D-acceptable for all D, that does not mean they are honest. They can instead use non-well-founded argument trees which never bottom out.
  - Beth Barnes 31 Jan 2021 4:39 UTC
    LW: 1 AF: 1
    AF Parent
    That is a concern, but only in the case where there’s no answer that has an argument tree that bottoms out in depth<D. As long as there exists an answer that is supported by a depth<D tree, this answer will beat the answers only supported by depth>D argument trees.
    So there is a case where the debaters are not incentivised to be honest; the case where the debaters know something but there’s no human-understandable argument for it that bottoms out in <D steps. This is where we get the PSPACE constraint.
    If we include discussion of cross-examination (which the analysis there did not include), then we can get rid of this constraint: each debater commits to an argument tree, then each debater points out the weakest node in the tree (or points out that some part of the tree doesn’t bottom out).
    (we can only handle really large trees if we assume debaters are computationally unbounded in general though. If we don’t assume this, even if we still assume they have oracles for some specific problems, we still probably can’t supervise anything that’s not in NP, because of the obfuscated argument problem)
- abramdemski 8 Jan 2021 15:46 UTC
  LW: 2 AF: 2
  AF Parent
  I think the collusion concern basically over-anthropomorphizes the training process. Say, in prisoner’s dilemma, if you train myopically, then “all incentives point toward defection” translates concretely to actual defection.
  
  Granted, there are training regimes in which this doesn’t happen, but those would have to be avoided.
  
  OTOH, the concern might be that an inner optimizer would develop which colludes. This would have to be dealt with by more general anti-inner-optimizer technology.
  
  I don’t know if you’ve seen our most recent debate rules and attempt at analysis of whether they provide the desired behavior—seems somewhat relevant to what you’re thinking about here.
  
  Yep, I should take a look!