Vanessa Kosoy comments on Where I agree and disagree with Eliezer

Vanessa Kosoy 21 Jun 2022 16:52 UTC
LW: 6 AF: 4
0
AF
Not quite sure what you’re saying here. Is the claim that speed penalties would help shift the balance against mesa-optimizers? This kind of solutions are worth looking into, but I’m not too optimistic about them atm. First, the mesa-optimizer probably won’t add a lot of overhead compared to the considerable complexity of emulating a brain. In particular, it need not work by anything like our own ML algorithms. So, if it’s possible to rule out mesa-optimizers like this, it would require a rather extreme penalty. Second, there are limits on how much you can shape the prior while still having feasible learning. And I suspect that such an extreme speed penalty would not cut it. Third, depending on the setup, an extreme speed penalty might harm generalization^[1]. But we definitely need to understand it more rigorously.
1. ↩︎
  The most appealing version is Christiano’s “minimal circuits”, but that only works for inputs of fixed size. It’s not so clear what’s the variable-input-size (“transformer”) version of that.
- Richard_Ngo 21 Jun 2022 18:20 UTC
  LW: 2 AF: 2
  0
  AF Parent
  No, I wasn’t advocating adding a speed penalty, I was just pointing at a reason to think that a speed prior would give a more accurate answer to the question of “which is favored” than the bounded simplicity prior you’re assuming:
  Suppose that your imitator works by something akin to Bayesian inference with some sort of bounded simplicity prior (I think it’s true of transformers)
  But now I realise that I don’t understand why you think this is true of transformers. Could you explain? It seems to me that there are many very simple hypotheses which take a long time to calculate, and which transformers therefore can’t be representing.
  - Vanessa Kosoy 22 Jun 2022 5:58 UTC
    LW: 2 AF: 2
    0
    AF Parent
    The word “bounded” in “bounded simplicity prior” referred to bounded computational resources. A “bounded simplicity prior” is a prior which involves either a “hard” (i.e. some hypotheses are excluded) or a “soft” (i.e. some hypotheses are down-weighted) bound on computational resources (or both), and also inductive bias towards simplicity (specifically it should probably behave as ~ 2^{-description complexity}). For a concrete example, see the prior I described here (w/o any claim to originality).
    - Richard_Ngo 22 Jun 2022 7:33 UTC
      LW: 2 AF: 2
      0
      AF Parent
      Ah, I see. That makes sense now!