I think by “neural nets” I mean “circuits that get optimized through GD-like optimization techniques and where the vast majority of degrees of freedom for the optimization process come from big matrix multiplications”.
Yeah, I definitely don’t expect that to be the typical representation; I expect neither an optimization-based training process nor lots of big matrix multiplications.
Based on the forms in Maxent and Abstraction, I expect (possibly nested) sums of local functions to be the main feature-representation. Figuring out which local functions to sum might be done, in practice, by backing equivalent sums out of a trained neural net, but the net wouldn’t be part of the representation.
I think by “neural nets” I mean “circuits that get optimized through GD-like optimization techniques and where the vast majority of degrees of freedom for the optimization process come from big matrix multiplications”.
Yeah, I definitely don’t expect that to be the typical representation; I expect neither an optimization-based training process nor lots of big matrix multiplications.
Interesting and exciting. Can you reveal more about how you expect it to work?
Based on the forms in Maxent and Abstraction, I expect (possibly nested) sums of local functions to be the main feature-representation. Figuring out which local functions to sum might be done, in practice, by backing equivalent sums out of a trained neural net, but the net wouldn’t be part of the representation.