You are comparing step and ladder (I had to seize on it !).
If you look at Table 2 in your last reference, you will see that they, carefully, show results improving has steps are added. Ladder is just another step (an optimisation one). There is a reason why researchers use PI-MNIST: it is to reduce the size of the ladder to make comparisons clearer.
What I am trying to bring here is a new first step.
I could have tried a 784-25-10 BP/SGD network (784*25 = 19600 parameters) to compare with this system with 196 neurons and 10 connections. I have managed to get 98% with that. How much for the same with BP/SGD ?
The current paradigm has been building up since 1986, and was itself based on the perceptron from 1958.
Here, I take the simplest form of the perceptron (single layer), only adjoin a, very basic, quantiliser to drive it, and already get near SOTA. I also point out that this quantiliser is just another form of neuron.
I am trying to show it might be an interesting step to take.
PI MNIST is up to at least 99.43% with Ladder Networks https://arxiv.org/abs/1507.02672. I think I vaguely remember some ~99.5% published since (it’s been 6 years) but I haven’t done the lit tree crawling to find it currently. Another example of a higher performing result than Maxout is Virtual Adversarial Training at 99.36% https://arxiv.org/abs/1704.03976. The JMLR version of dropout https://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf also has a 99.21% with dropout finetuning of a Deep Boltzmann Machine.
You are comparing step and ladder (I had to seize on it !).
If you look at Table 2 in your last reference, you will see that they, carefully, show results improving has steps are added. Ladder is just another step (an optimisation one). There is a reason why researchers use PI-MNIST: it is to reduce the size of the ladder to make comparisons clearer.
What I am trying to bring here is a new first step.
I could have tried a 784-25-10 BP/SGD network (784*25 = 19600 parameters) to compare with this system with 196 neurons and 10 connections. I have managed to get 98% with that. How much for the same with BP/SGD ?
The current paradigm has been building up since 1986, and was itself based on the perceptron from 1958.
Here, I take the simplest form of the perceptron (single layer), only adjoin a, very basic, quantiliser to drive it, and already get near SOTA. I also point out that this quantiliser is just another form of neuron.
I am trying to show it might be an interesting step to take.