William_S comments on Learning the prior and generalization

William_S 22 Oct 2020 19:52 UTC
LW: 4 AF: 4
AF
for extremely large $Z$ which are represented only implicitly as in Paul’s post, we might not always check whether the model matches the ground truth by actually generating the ground truth and instead just ask the human to verify the answer given $Z$
I’m not sure what “just ask the human to verify the answer given $Z$ ” looks like, for implicitly represented $Z$
- evhub 22 Oct 2020 20:52 UTC
  LW: 2 AF: 2
  AF Parent
  There are lots of ways to allow $H$ to interface with an implicitly represented $Z$ , but the one Paul describes in “Learning the Prior” is to train some model $M z (\cdot, z)$ which represents $Z$ implicitly by responding to human queries about $Z$ (see also “Approval-maximizing representations” which describes how a model like $M z$ could represent $Z$ implicitly as a tree).
  
  Once $H$ can interface with $Z$ , checking whether some answer is correct given $Z$ is at least no more difficult than producing an answer given $Z$ —since $H$ can just produce their answer then check it against the model’s using some distance metric (e.g. an autoregressive language model)—but could be much easier if there are ways for $H$ to directly evaluate how likely $H$ would be to produce that answer.
  - William_S 24 Oct 2020 2:25 UTC
    LW: 4 AF: 4
    AF Parent
    Right, but in the post the implicitly represented Z is used by an amplification or debate system, because it contains more information than a human can quickly read and use (so are you assuming it’s simple to verify the results of amplification/debate systems?)
    - evhub 24 Oct 2020 6:52 UTC
      LW: 3 AF: 3
      AF Parent
      Ah, sorry, no—I was assuming you were just using whatever procedure you used previously to allow the human to interface with $Z$ in that situation as well. I’ll edit the post to be more clear there.
      - William_S 19 Nov 2020 21:17 UTC
        LW: 3 AF: 3
        AF Parent
        Okay, makes more sense now, now my understanding is that for question X, answer from ML system Y, amplification system A, verification in your quote is asking the A to answer “Would A(Z) output answer Y to question X?”, as opposed to asking A to answer “X”, and then checking if it equals “Y”. This can at most be as hard as running the original system, and maybe could be much more efficient.
        evhub 19 Nov 2020 21:58 UTC
        LW: 3 AF: 3
        AF Parent
        Yep; that’s what I was imagining. It is also worth noting that it can be less safe to do that, though, since you’re letting A(Z) see Y, which could bias it in some way that you don’t want—I talk about that danger a bit in the context of approval-based amplification here and here.