I don’t agree with this at all. I wrote a thing here about how NNs can be elegant, and derived from first principles.
Nice post.
Anyway, according to some recent works (ref, ref), it seems to be possible to directly learn digital circuits from examples using some variant of backproagation. In principle, if you add a circuit size penalty (which may be well the tricky part) this becomes time-bounded maximum a posteriori Solomonoff induction.
Yes binary neural networks are super interesting because they can be made much more compact in hardware than floating point ops. However there isn’t much (theoretical) advantage otherwise. Anything a circuit can do, an NN can do, and vice versa.
A circuit size penalty is already a very common technique. It’s called weight decay, where the synapses are encouraged to be as close to zero as possible. A synapse of 0 is the same as it not being there, which means the neural net parameters requires less information to specify.
Nice post.
Anyway, according to some recent works (ref, ref), it seems to be possible to directly learn digital circuits from examples using some variant of backproagation. In principle, if you add a circuit size penalty (which may be well the tricky part) this becomes time-bounded maximum a posteriori Solomonoff induction.
Yes binary neural networks are super interesting because they can be made much more compact in hardware than floating point ops. However there isn’t much (theoretical) advantage otherwise. Anything a circuit can do, an NN can do, and vice versa.
A circuit size penalty is already a very common technique. It’s called weight decay, where the synapses are encouraged to be as close to zero as possible. A synapse of 0 is the same as it not being there, which means the neural net parameters requires less information to specify.