Upvoted—I had considered writing a similar post but never got around to it because my version of conveying this idea translated into something far too long.
ANNs are just a universal model family (perhaps the simplest canonical model for analog/algebraic circuits) for which the inference/learning algorithm is completely orthogonal. You can use anything from genetic algorithms to gradient descent to full Bayesian to learn the weights or program from data. Of course, for performance reasons, one should never go full bayesian.
I think you missed the most important key performance principle behind deep nets—deep factoring. The real world has structure in the form of complex symmetries which can be exploited by deep factored circuits which reuse subcomputations. AIXI is beyond stupid because it learns a huge space of 2^N models where each model performs 2^N ops in complete isolation. In a deep factored model you focus the search on model subspaces where almost all of the subcomputations between submodels can be shared, resulting in exponential speedup. This is the true power of deep models, which all large successful ANNs exploit.
Upvoted—I had considered writing a similar post but never got around to it because my version of conveying this idea translated into something far too long.
ANNs are just a universal model family (perhaps the simplest canonical model for analog/algebraic circuits) for which the inference/learning algorithm is completely orthogonal. You can use anything from genetic algorithms to gradient descent to full Bayesian to learn the weights or program from data. Of course, for performance reasons, one should never go full bayesian.
I think you missed the most important key performance principle behind deep nets—deep factoring. The real world has structure in the form of complex symmetries which can be exploited by deep factored circuits which reuse subcomputations. AIXI is beyond stupid because it learns a huge space of 2^N models where each model performs 2^N ops in complete isolation. In a deep factored model you focus the search on model subspaces where almost all of the subcomputations between submodels can be shared, resulting in exponential speedup. This is the true power of deep models, which all large successful ANNs exploit.