Of course, but what’s interesting is how it is an approximation (and how close it is.) I don’t think decision trees are a very good approximation, for example. You can compile any computer program to a neural network (and people have actually made such compilers), but it would take an exponentially large decision tree to do the same thing. And many ML algorithms like linear regression or naive bayes, don’t even have the universal approximator property.
I also think that some AI researchers have written off neural networks because they seem too mathematically inelegant. Just some heuristics thrown together without any solid theory or principle.
The purpose of my writing is to show that they are elegant. Even further, that if you tried to come up with the ideal approximation of SI from first principles, you would just end up with NNs.
The purpose of my writing is to show that they are elegant. Even further, that if you tried to come up with the ideal approximation of SI from first principles, you would just end up with NNs.
Indeed. Although SGD is probably not the optimal approximation of Bayesian inference—for example it doesn’t handle track/handle uncertainty at all, but that is a hot current area of research.
I only barely mentioned it in my post, but there are ways of approximating bayesian inference like MCMC. And in fact there are methods which can take advantage of stochastic gradient information, which should make them roughly as efficient as SGD.
Of course, but what’s interesting is how it is an approximation (and how close it is.) I don’t think decision trees are a very good approximation, for example. You can compile any computer program to a neural network (and people have actually made such compilers), but it would take an exponentially large decision tree to do the same thing. And many ML algorithms like linear regression or naive bayes, don’t even have the universal approximator property.
I also think that some AI researchers have written off neural networks because they seem too mathematically inelegant. Just some heuristics thrown together without any solid theory or principle.
The purpose of my writing is to show that they are elegant. Even further, that if you tried to come up with the ideal approximation of SI from first principles, you would just end up with NNs.
Indeed. Although SGD is probably not the optimal approximation of Bayesian inference—for example it doesn’t handle track/handle uncertainty at all, but that is a hot current area of research.
I only barely mentioned it in my post, but there are ways of approximating bayesian inference like MCMC. And in fact there are methods which can take advantage of stochastic gradient information, which should make them roughly as efficient as SGD.
There is also a recent paper by Deepmind, Weight Uncertainty in Neural Networks.