abramdemski comments on AI safety via market making

abramdemski Jul 16, 2020, 8:02 PM
LW: 6 AF: 4
AF
What do you think about a similar DAG assumption in regular debate? Couldn’t debate agents similarly justify their assertions with claims that don’t descend a DAG that bottoms out in things the human can check? I don’t currently see how a debater who did this could be defeated by another debater.
What links here?
- How should AI debate be judged? by abramdemski (Jul 15, 2020, 10:20 PM; 49 points)
- How should AI debate be judged? by abramdemski (Jul 15, 2020, 10:20 PM; 49 points)
- Rohin Shah Jul 16, 2020, 8:55 PM
  LW: 6 AF: 4
  AF Parent
  I’m pretty unsure, having barely thought about it, but currently I lean towards it being okay—the main difference is that in debate you show an entire path down the argument tree, so if a false statement is justified by a cycle / circular argument, the other debater can point that out.
  If the length of the cycle is longer than the debate transcript, then this doesn’t work, but one hopes for some combination of a) this leads to a stalemate against honesty, rather than a win for the circular debater (since neither can refute the other), b) most questions that we care about can be resolved by a relatively short debate (the point of the PSPACE analogy), and c) such a strategy would lose against a debater who says early on “this debate can’t be decided in the time allotted”.
  What links here?
  - How should AI debate be judged? by abramdemski (Jul 15, 2020, 10:20 PM; 49 points)
  - abramdemski Jul 19, 2020, 6:35 PM
    LW: 6 AF: 4
    AF Parent
    Ok. I don’t see why these considerations make you optimistic rather than pessimistic, but then, I’m currently having more basic problems with debate which seem to be making me pessimistic about most claims about debate.
    - Rohin Shah Jul 20, 2020, 6:41 PM
      LW: 9 AF: 6
      AF Parent
      I think the consideration “you can point out sufficiently short circular arguments” should at least make you feel better about debate than iterated amplification or market making—it’s one additional way in which you can avoid circular arguments, and afaict there isn’t a positive consideration for iterated amplification / market making that doesn’t also apply to debate.
      I don’t have a stable position about how optimistic we should be on some absolute scale.
      - abramdemski Jul 21, 2020, 6:37 PM
        LW: 14 AF: 10
        AF Parent
        I think the consideration “you can point out sufficiently short circular arguments” should at least make you feel better about debate than iterated amplification or market making—it’s one additional way in which you can avoid circular arguments, and afaict there isn’t a positive consideration for iterated amplification / market making that doesn’t also apply to debate.
        My interpretation of the situation is this breaks the link between factored cognition and debate. One way to try to judge debate as an amplification proposal would have been to establish a link to HCH, by establishing that if there’s an HCH tree computing some answer, then debate can use the tree as an argument tree, with the reasons for any given claim being the children in the HCH tree. Such a link would transfer any trust we have in HCH to trust in debate. The use of non-DAG arguments by clever debaters would seem to break this link.
        OTOH, IDA may still have a strong story connecting it to HCH. Again, if we trusted HCH, we would then transfer that trust to IDA.
        Are you saying that we can break the link between IDA and HCH in a very similar way, but which is worse due having no means to reject very brief circular arguments?
        Rohin Shah Jul 21, 2020, 8:04 PM
        LW: 15 AF: 10
        AF Parent
        I think the issue is that vanilla HCH itself is susceptible to brief circular arguments, if humans lower down in the tree don’t get access to the context from humans higher up in the tree. E.g. assume a chain of humans for now:
        H1 gets the question “what is 100 + 100?” with budget 3
        H1 asks H2 “what is 2 * 100?” with budget 2
        H2 asks H3 “what is 100 + 100?” with budget 1
        H3 says “150”
        (Note the final answer stays the same as budget → infinity, as long as H continues “decomposing” the question the same way.)
        If HCH can always decompose questions into “smaller” parts (the DAG assumption) then this sort of pathological behavior doesn’t happen.
      - evhub Jul 20, 2020, 8:04 PM
        LW: 3 AF: 1
        AF Parent
        
        afaict there isn’t a positive consideration for iterated amplification / market making that doesn’t also apply to debate
        
        For amplification, I would say that the fact that it has a known equilibrium (HCH) is a positive consideration that doesn’t apply to debate. For market making, I think that the fact that it gets to be per-step myopic is a positive consideration that doesn’t apply to debate. There are others too for both, though those are probably my biggest concerns in each case.
        Rohin Shah Jul 20, 2020, 11:50 PM
        LW: 4 AF: 3
        AF Parent
        Tbc, I’m specifically talking about:
        What do you think about a similar DAG assumption in regular debate?
        So I’m only evaluating whether or not I expect circular arguments to be an issue for these proposals. I agree that when evaluating the proposals on all merits there are arguments for the others that don’t apply to debate.
        evhub Jul 21, 2020, 12:31 AM
        LW: 2 AF: 1
        AF Parent
        Ah, I see—makes sense.
  - Beth Barnes Nov 16, 2020, 9:20 PM
    LW: 5 AF: 4
    AF Parent
    I think for debate you can fix the circular argument problem by requiring debaters to ‘pay’ (sacrifice some score) to recurse on a statement of their choice. If a debater repeatedly pays to recurse on things that don’t resolve before the depth limit, then they’ll lose.
    - Rohin Shah Nov 17, 2020, 7:26 PM
      LW: 6 AF: 5
      AF Parent
      Hmm, I was imagining that the honest player would have to recurse on the statements in order to exhibit the circular argument, so it seems to me like this would penalize the honest player rather than the circular player. Can you explain what the honest player would do against the circular player such that this “payment” disadvantages the circular player?
      
      EDIT: Maybe you meant the case where the circular argument is too long to exhibit within the debate, but I think I still don’t see how this helps.
      - Beth Barnes Nov 18, 2020, 6:18 AM
        LW: 16 AF: 12
        AF Parent
        Ah, yeah. I think the key thing is that by default a claim is not trusted unless the debaters agree on it.
        If the dishonest debater disputes some honest claim, where honest has an argument for their answer that actually bottoms out, dishonest will lose—the honest debater will pay to recurse until they get to a winning node.
        If the the dishonest debater makes some claim and plan to make a circular argument for it, the honest debater will give an alternative answer but not pay to recurse. If the dishonest debater doesn’t pay to recurse, the judge will just see these two alternative answers and won’t trust the dishonest answer. If the dishonest debater does pay to recurse but never actually gets to a winning node, they will lose.
        Does that make sense?
        
        What links here?
        abramdemski's comment on Debate Minus Factored Cognition by abramdemski (Jan 18, 2021, 11:28 PM; 2 points)
        Rohin Shah Nov 20, 2020, 2:25 AM
        LW: 4 AF: 4
        AF Parent
        If the dishonest debater disputes some honest claim, where honest has an argument for their answer that actually bottoms out, dishonest will lose—the honest debater will pay to recurse until they get to a winning node.
        This part makes sense.
        If the the dishonest debater makes some claim and plan to make a circular argument for it, the honest debater will give an alternative answer but not pay to recurse. If the dishonest debater doesn’t pay to recurse, the judge will just see these two alternative answers and won’t trust the dishonest answer.
        So in this case it’s a stalemate, presumably? If the two players disagree but neither pays to recurse, how should the judge make a decision?
        Beth Barnes Nov 22, 2020, 6:04 AM
        LW: 12 AF: 9
        AF Parent
        Both debaters make claims. Any claims that are only supported by circular arguments will be ignored. If an honest claim that’s supported by a good argument is disputed, the honest debater will pay to recurse, and will give their good argument
        abramdemski Jan 19, 2021, 12:00 AM
        LW: 4 AF: 4
        AF Parent
        This was a very interesting comment (along with its grandparent comment), thanks—it seems like a promising direction.
        However, I’m still confused about whether this would work. It’s very different from judging procedure outlined here; why is that? Do you have a similarly detailed write-up of the system you’re describing here?
        I’m actually less concerned about loops and more concerned about arguments which are infinite trees, but the considerations are similar. It seems possible that the proposal you’re discussing very significantly addresses concerns I’ve had about debate.
        Beth Barnes Jan 31, 2021, 3:54 AM
        LW: 1 AF: 1
        AF Parent
        I was trying to describe something that’s the same as the judging procedure in that doc! I might have made a mistake, but I’m pretty sure the key piece about recursion payments is the same. Apologies that things are unclear. I’m happy to try to clarify, if there were particular aspects that seem different to you.
        Yeah, I think the infinite tree case should work just the same—ie an answer that’s only supported by an infinite tree will behave like an answer that’s not supported (it will lose to an answer with a finite tree and draw with an answer with no support)
        It seems possible that the proposal you’re discussing very significantly addresses concerns I’ve had about debate.
        That’s exciting!