The accuracy reaches state of the art (98.9%) on PI-MNIST with 750,000 low precision (binarisable) connections (>98.6% with 80,000 connections), one layer, no bias, no back-propagation and only additions. It is yet to be optimised. It works online, and is backed by existing mathematical theorems. It also relates to the real cortical structure and personal experience; and even popular wisdom.
Isn’t SOTA for MNIST more like 99.9%? At least according to this website. I don’t think MNIST is a great benchmark because it’s pretty much solved.
Permutation Invariant. To keep it simple, you cannot use convolutions. It is all explained in the text.
Real SOTA on that version is 99.04% (Maxout), but that is with 65 Millions+ parameters. I do not have the hardware (or time).
I stopped at 98.9% with 750,000 connections (integers and additions) and this is close to what BP/SGD (table 2) gets with 3 hidden layer of 1024 units each, for a total of >3,000,000 parameters (floating-points and multiplication) with max-norm and Relu.
For a similar accuracy, the number of ‘parameters’ is almost an order of magnitude lower with this system and efficiency even more.
Remember, it is not supposed to work at all, and it is not optimised.
You are comparing step and ladder (I had to seize on it !).
If you look at Table 2 in your last reference, you will see that they, carefully, show results improving has steps are added. Ladder is just another step (an optimisation one). There is a reason why researchers use PI-MNIST: it is to reduce the size of the ladder to make comparisons clearer.
What I am trying to bring here is a new first step.
I could have tried a 784-25-10 BP/SGD network (784*25 = 19600 parameters) to compare with this system with 196 neurons and 10 connections. I have managed to get 98% with that. How much for the same with BP/SGD ?
The current paradigm has been building up since 1986, and was itself based on the perceptron from 1958.
Here, I take the simplest form of the perceptron (single layer), only adjoin a, very basic, quantiliser to drive it, and already get near SOTA. I also point out that this quantiliser is just another form of neuron.
I am trying to show it might be an interesting step to take.
Isn’t SOTA for MNIST more like 99.9%? At least according to this website. I don’t think MNIST is a great benchmark because it’s pretty much solved.
It is PI-MNIST.
Permutation Invariant. To keep it simple, you cannot use convolutions. It is all explained in the text.
Real SOTA on that version is 99.04% (Maxout), but that is with 65 Millions+ parameters. I do not have the hardware (or time).
I stopped at 98.9% with 750,000 connections (integers and additions) and this is close to what BP/SGD (table 2) gets with 3 hidden layer of 1024 units each, for a total of >3,000,000 parameters (floating-points and multiplication) with max-norm and Relu.
For a similar accuracy, the number of ‘parameters’ is almost an order of magnitude lower with this system and efficiency even more.
Remember, it is not supposed to work at all, and it is not optimised.
PI MNIST is up to at least 99.43% with Ladder Networks https://arxiv.org/abs/1507.02672. I think I vaguely remember some ~99.5% published since (it’s been 6 years) but I haven’t done the lit tree crawling to find it currently. Another example of a higher performing result than Maxout is Virtual Adversarial Training at 99.36% https://arxiv.org/abs/1704.03976. The JMLR version of dropout https://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf also has a 99.21% with dropout finetuning of a Deep Boltzmann Machine.
You are comparing step and ladder (I had to seize on it !).
If you look at Table 2 in your last reference, you will see that they, carefully, show results improving has steps are added. Ladder is just another step (an optimisation one). There is a reason why researchers use PI-MNIST: it is to reduce the size of the ladder to make comparisons clearer.
What I am trying to bring here is a new first step.
I could have tried a 784-25-10 BP/SGD network (784*25 = 19600 parameters) to compare with this system with 196 neurons and 10 connections. I have managed to get 98% with that. How much for the same with BP/SGD ?
The current paradigm has been building up since 1986, and was itself based on the perceptron from 1958.
Here, I take the simplest form of the perceptron (single layer), only adjoin a, very basic, quantiliser to drive it, and already get near SOTA. I also point out that this quantiliser is just another form of neuron.
I am trying to show it might be an interesting step to take.