Not quite sure what you’re saying here. Is the claim that speed penalties would help shift the balance against mesa-optimizers? This kind of solutions are worth looking into, but I’m not too optimistic about them atm. First, the mesa-optimizer probably won’t add a lot of overhead compared to the considerable complexity of emulating a brain. In particular, it need not work by anything like our own ML algorithms. So, if it’s possible to rule out mesa-optimizers like this, it would require a rather extreme penalty. Second, there are limits on how much you can shape the prior while still having feasible learning. And I suspect that such an extreme speed penalty would not cut it. Third, depending on the setup, an extreme speed penalty might harm generalization[1]. But we definitely need to understand it more rigorously.
The most appealing version is Christiano’s “minimal circuits”, but that only works for inputs of fixed size. It’s not so clear what’s the variable-input-size (“transformer”) version of that.
No, I wasn’t advocating adding a speed penalty, I was just pointing at a reason to think that a speed prior would give a more accurate answer to the question of “which is favored” than the bounded simplicity prior you’re assuming:
Suppose that your imitator works by something akin to Bayesian inference with some sort of bounded simplicity prior (I think it’s true of transformers)
But now I realise that I don’t understand why you think this is true of transformers. Could you explain? It seems to me that there are many very simple hypotheses which take a long time to calculate, and which transformers therefore can’t be representing.
The word “bounded” in “bounded simplicity prior” referred to bounded computational resources. A “bounded simplicity prior” is a prior which involves either a “hard” (i.e. some hypotheses are excluded) or a “soft” (i.e. some hypotheses are down-weighted) bound on computational resources (or both), and also inductive bias towards simplicity (specifically it should probably behave as ~ 2^{-description complexity}). For a concrete example, see the prior I described here (w/o any claim to originality).
Not quite sure what you’re saying here. Is the claim that speed penalties would help shift the balance against mesa-optimizers? This kind of solutions are worth looking into, but I’m not too optimistic about them atm. First, the mesa-optimizer probably won’t add a lot of overhead compared to the considerable complexity of emulating a brain. In particular, it need not work by anything like our own ML algorithms. So, if it’s possible to rule out mesa-optimizers like this, it would require a rather extreme penalty. Second, there are limits on how much you can shape the prior while still having feasible learning. And I suspect that such an extreme speed penalty would not cut it. Third, depending on the setup, an extreme speed penalty might harm generalization[1]. But we definitely need to understand it more rigorously.
The most appealing version is Christiano’s “minimal circuits”, but that only works for inputs of fixed size. It’s not so clear what’s the variable-input-size (“transformer”) version of that.
No, I wasn’t advocating adding a speed penalty, I was just pointing at a reason to think that a speed prior would give a more accurate answer to the question of “which is favored” than the bounded simplicity prior you’re assuming:
But now I realise that I don’t understand why you think this is true of transformers. Could you explain? It seems to me that there are many very simple hypotheses which take a long time to calculate, and which transformers therefore can’t be representing.
The word “bounded” in “bounded simplicity prior” referred to bounded computational resources. A “bounded simplicity prior” is a prior which involves either a “hard” (i.e. some hypotheses are excluded) or a “soft” (i.e. some hypotheses are down-weighted) bound on computational resources (or both), and also inductive bias towards simplicity (specifically it should probably behave as ~ 2^{-description complexity}). For a concrete example, see the prior I described here (w/o any claim to originality).
Ah, I see. That makes sense now!