One thing to note, which might be a technical quibble, is that I don’t endorse the entropy version of this prior (which is the one that wants 50⁄50 activations). I started off with it because it’s simpler, but I think it breaks for exactly the reasons you say, which is why I prefer the version that wants to see “Over the last N evaluations, each gate evaluated to T at least q times and to F at least q times, where q << N.” This is very specifically so that there isn’t a drive to unnaturally force the percentages towards 50% when the true input distribution is different from that.
Setting that aside: I think what this highlights is that the translation from “a prior over circuits” to “a regularizer for NN’s” is pretty nontrivial, and things that are reasonably behaved in one space can be very bad in the other. If I’m sampling boolean circuits from a one-gate trace prior I just immediately find the solution of ‘they’re all dogs, so put a constant wire in’. Whereas with neural networks we can’t jump straight to that solution and may end up doing more contrived things along the way.
which is why I prefer the version that wants to see “Over the last N evaluations, each gate evaluated to T at least q times and to F at least q times, where q << N.”
Yeah, I skipped over that because I don’t see how one would implement that. That doesn’t sound very differentiable? Were you thinking of perhaps some sort of evolutionary approach with that as part of a fitness function? Even if you have some differentiable trick for that, it’s easier to explain my objections concretely with 50%. But I don’t have anything further to say about that at the moment.
Setting that aside: I think what this highlights is that the translation from “a prior over circuits” to “a regularizer for NN’s” is pretty nontrivial, and things that are reasonably behaved in one space can be very bad in the other
Absolutely. You are messing around with weird machines and layers of interpreters, and simple security properties or simple translations go right out the window as soon as you have anything adversarial or optimization-related involved.
Were you thinking of perhaps some sort of evolutionary approach with that as part of a fitness function?
That would work, yeah. I was thinking of an approach based on making ad-hoc updates to the weights (beyond SGD), but an evolutionary approach would be much cleaner!
Ok, I see. Thanks for explaining!
One thing to note, which might be a technical quibble, is that I don’t endorse the entropy version of this prior (which is the one that wants 50⁄50 activations). I started off with it because it’s simpler, but I think it breaks for exactly the reasons you say, which is why I prefer the version that wants to see “Over the last N evaluations, each gate evaluated to T at least q times and to F at least q times, where q << N.” This is very specifically so that there isn’t a drive to unnaturally force the percentages towards 50% when the true input distribution is different from that.
Setting that aside: I think what this highlights is that the translation from “a prior over circuits” to “a regularizer for NN’s” is pretty nontrivial, and things that are reasonably behaved in one space can be very bad in the other. If I’m sampling boolean circuits from a one-gate trace prior I just immediately find the solution of ‘they’re all dogs, so put a constant wire in’. Whereas with neural networks we can’t jump straight to that solution and may end up doing more contrived things along the way.
Yeah, I skipped over that because I don’t see how one would implement that. That doesn’t sound very differentiable? Were you thinking of perhaps some sort of evolutionary approach with that as part of a fitness function? Even if you have some differentiable trick for that, it’s easier to explain my objections concretely with 50%. But I don’t have anything further to say about that at the moment.
Absolutely. You are messing around with weird machines and layers of interpreters, and simple security properties or simple translations go right out the window as soon as you have anything adversarial or optimization-related involved.
That would work, yeah. I was thinking of an approach based on making ad-hoc updates to the weights (beyond SGD), but an evolutionary approach would be much cleaner!