You are comparing step and ladder (I had to seize on it !).
If you look at Table 2 in your last reference, you will see that they, carefully, show results improving has steps are added. Ladder is just another step (an optimisation one). There is a reason why researchers use PI-MNIST: it is to reduce the size of the ladder to make comparisons clearer.
What I am trying to bring here is a new first step.
I could have tried a 784-25-10 BP/SGD network (784*25 = 19600 parameters) to compare with this system with 196 neurons and 10 connections. I have managed to get 98% with that. How much for the same with BP/SGD ?
The current paradigm has been building up since 1986, and was itself based on the perceptron from 1958.
Here, I take the simplest form of the perceptron (single layer), only adjoin a, very basic, quantiliser to drive it, and already get near SOTA. I also point out that this quantiliser is just another form of neuron.
I am trying to show it might be an interesting step to take.
You are comparing step and ladder (I had to seize on it !).
If you look at Table 2 in your last reference, you will see that they, carefully, show results improving has steps are added. Ladder is just another step (an optimisation one). There is a reason why researchers use PI-MNIST: it is to reduce the size of the ladder to make comparisons clearer.
What I am trying to bring here is a new first step.
I could have tried a 784-25-10 BP/SGD network (784*25 = 19600 parameters) to compare with this system with 196 neurons and 10 connections. I have managed to get 98% with that. How much for the same with BP/SGD ?
The current paradigm has been building up since 1986, and was itself based on the perceptron from 1958.
Here, I take the simplest form of the perceptron (single layer), only adjoin a, very basic, quantiliser to drive it, and already get near SOTA. I also point out that this quantiliser is just another form of neuron.
I am trying to show it might be an interesting step to take.