Initial results indicate the code performs poorly on F-MNIST. It is possible this is a hyperparameter-tuning issue but my default conclusion is that MNIST (created in 1998, before the invention of modern GPUs) is just too easy.
Btw, a multilayer perceptron (which is a permutation invariant model) with 230000 parameters and, AFAIK, no data augmentaiton used, can achieve 88.33% accuracy on FashionMNIST.
I doubt that this would be the best a MLP can achieve on F-MNIST.
I will put it this way: SONNs and MLPs do the same thing, in a different way. Therefore they should achieve the same accuracy. If this SONN can get near 90%, so should MLPs.
It is likely that nobody has bothered to try ‘without convolutions’ because it is so old-fashioned.
Convolutions are for repeated locally aggregated correlations.
Update: 3 runs (2 random) , 10 million steps. All three over 88.33 (average 9.5-10.5 million on the 3: 88.43). New SOTA ? Please check and update.
Update 2: 89.85 at step 50 Million with QuantUpP = 3.2 and quantUpN = 39. It does perform very well. I will leave it at that. As said in my post, those are the two important parameters (no, it is not a universal super-intelligence in 600 lines of code). Be rational, and think about what the fact that this mechanism works so well means (I am talking to everybody, there).
I looked at it, the informed way.
It gets over 88% with very limited effort.
As I pointed, the two dataset are similar in technical description, but they are ‘reversed’ in the data.
MNIST is black dots on white background. F-MNIST is white dots on black background. The histograms are very different.
I tried to make it work despite that, just with parameter changes, and it does.
Here are the changes to the code:
on line 555: quantUpP = 1.9 ;
on line 556: quantUpN = 24.7 ;
with rand(1000), as it is in the code, you already clear 86% at step 300,000 and 87% at step 600,000 and 88% at 3 Million.
I had made another, small and irrelevant, change, in my full tests, so I am running the full tests again without it (the value/steps above are from that new series). It seems to be better again without it… maybe a new SOTA (update: touched 88.33% at step 4,800,000 ! … and 88.5 at 6.8 Millions !. MLPs perform poorly when applied to data even slightly more complicated than MNIST)
I do not understand what is all the hype around MNIST. Once again, this is PI-MNIST and that makes it very different (to put it simply: no geometry, so no convolution).
I would like anybody to give me a reference to some ‘other method that worked on MNIST but did not make it further’, that uses PI-MNIST and gets more than 98.4% on it.
And if anybody tries it on yet another dataset, could they please notify me so I look at it, before they make potentially damaging statements.
Your result is virtually identical to the first-ranking unambiguously permutation-invariant method (MLP 256-128-100). HOG+SVM does even better, but it’s unclear to me whether that meets your criteria.
Could you be more precise about what kinds of algorithms you consider it fair to compare against, and why?
No regularisation. I wrote about that in the analysis.
Without max-norm (or maxout, ladder, VAT: all forms of regularisation), BP/SGD only achieves 98.75% (from the dropout −2014- paper).
Regularisation must come from outside the system. - SO can be seen that way—or through local interactions (neighbors). Many papers clearly suggest that should improve the result.
So, as I see it, there are three possible different fairness criteria which define what we can compare your model with.
Virtually anything goes—convolutions, CNNs, pretraining on imagenet, …
Permutation-invariant models are allowed, everything else is disallowed. For instance, MLPs are ok, CNNs are forbidden, tensor decompositions are forbidden, SVMs are ok as long as the transformations used are permutation-invariant. Pre-processing is allowed as long as it’s permutation-invariant.
The restriction from the criterion 2 is enabled. Also, the model must be biologically plausible, or, shall we say, similar to the brain. Or maybe similar to how a potential brain of another creature might be? Not sure. This rules out SGD, regularization that uses norm of vectors, etc. are forbidden. Strengthening neuron connections based on something that happens locally is allowed.
Personally, I know basically nothing about the landscape of models satisfying the criterion 3.
I had to use a dataset for my explorations and MNIST was simple; and I used PI-MNIST to show an ‘impressive’ result so that people have to look at it. I expected the ‘PI’ to be understood, and it is not. Note that I could readily answer the ‘F-MNIST challenge’.
If I had just expressed an opinion on how to go about AI, the way I did in the roadmap, it would have been just, rightly, ignored. The point was to show it is not ‘ridiculous’ and the system fits with that roadmap.
I see that your last post is about complexity science. This is an example of it. The domain of application is nature. Nature is complex, and maths have difficulties with complexity. The field of chaos theory puttered in the 80s for that reason. If you want to know more about it, start with Turing morphogenesis (read the conclusion), then Prigogine. In NN, there is Kohonen.
Some things are theoretical correct, but practically useless. You know how to win the lotto, but nobody does it. Better something simple that works and can be reasoned about, even without a mathematical theory. AI is not quantum physics.
Maybe it could be said that intelligence is to cut through all the details to, then, reason using what is left, but the devil is in those details.
It would be surprising to me if the algorithm really performed this poorly on fashion mnist. F-MNIST is harder, but (intentionally) very similar to MNIST.
CIFAR maybe with limited categories would be a logical “hard” test IF it can be made to work on F-MNIST.
On the other hand (without claiming that I understand the ins and outs of the algorithm) I could imagine that out of the neuroinspired playbook it misses the winner-takes-all competition between neurons that allows modelling of multi-modal distribution and possibly allows easier distinction of not-linearly-separable datapoints.
See my comment on reversing the shades on F-MNIST. I will check it later but I see it gets up to 48% in the ‘wrong’ order and that is surprisingly good. I worked on CIFAR, but that is another story. As-is it gives bad results and you have to add other ‘things’.
As you guessed, I belong to neuroinspired branch and most of my ‘giants’ belong there. I strongly expected, when I started my investigations, to use some of the works that I knew and appreciated along the lines your are mentioning, and I investigated some of them early on.
To my surprise, I did not need them to get to this result, so they are absent.
The two neuronal layers form of the neocortex is where they will be useful. This is only one layer.
Another (bad) reason, is that they add to the GPU hell… that has limited my investigations. It is identified source of potential improvements.
I like this idea. It seems to me like a fair test. I will run the code overnight with default settings and see what happens.
Initial results indicate the code performs poorly on F-MNIST. It is possible this is a hyperparameter-tuning issue but my default conclusion is that MNIST (created in 1998, before the invention of modern GPUs) is just too easy.
Btw, a multilayer perceptron (which is a permutation invariant model) with 230000 parameters and, AFAIK, no data augmentaiton used, can achieve 88.33% accuracy on FashionMNIST.
I doubt that this would be the best a MLP can achieve on F-MNIST.
I will put it this way: SONNs and MLPs do the same thing, in a different way. Therefore they should achieve the same accuracy. If this SONN can get near 90%, so should MLPs.
It is likely that nobody has bothered to try ‘without convolutions’ because it is so old-fashioned.
Convolutions are for repeated locally aggregated correlations.
Update: It gets higher if you run it for long enough.
Update: 3 runs (2 random) , 10 million steps. All three over 88.33 (average 9.5-10.5 million on the 3: 88.43). New SOTA ? Please check and update.
Update 2: 89.85 at step 50 Million with QuantUpP = 3.2 and quantUpN = 39. It does perform very well. I will leave it at that. As said in my post, those are the two important parameters (no, it is not a universal super-intelligence in 600 lines of code). Be rational, and think about what the fact that this mechanism works so well means (I am talking to everybody, there).
I looked at it, the informed way.
It gets over 88% with very limited effort.
As I pointed, the two dataset are similar in technical description, but they are ‘reversed’ in the data.
MNIST is black dots on white background. F-MNIST is white dots on black background. The histograms are very different.
I tried to make it work despite that, just with parameter changes, and it does.
Here are the changes to the code:
on line 555: quantUpP = 1.9 ;
on line 556: quantUpN = 24.7 ;
with rand(1000), as it is in the code, you already clear 86% at step 300,000 and 87% at step 600,000 and 88% at 3 Million.
I had made another, small and irrelevant, change, in my full tests, so I am running the full tests again without it (the value/steps above are from that new series). It seems to be better again without it… maybe a new SOTA (update: touched 88.33% at step 4,800,000 ! … and 88.5 at 6.8 Millions !. MLPs perform poorly when applied to data even slightly more complicated than MNIST)
I do not understand what is all the hype around MNIST. Once again, this is PI-MNIST and that makes it very different (to put it simply: no geometry, so no convolution).
I would like anybody to give me a reference to some ‘other method that worked on MNIST but did not make it further’, that uses PI-MNIST and gets more than 98.4% on it.
And if anybody tries it on yet another dataset, could they please notify me so I look at it, before they make potentially damaging statements.
Here with 2 conv and less than 100k parameters the accuracy is ~92%. https://github.com/zalandoresearch/fashion-mnist
SOTA on Fashion-MNIST is >96%. https://paperswithcode.com/sota/image-classification-on-fashion-mnist
no convolution.
You are comparing pears and apples.
I have shared the base because it has real scientific (and philosophical) value.
Geometry and other are separate, and of lesser scientific value. they are more technology.
Your result is virtually identical to the first-ranking unambiguously permutation-invariant method (MLP 256-128-100). HOG+SVM does even better, but it’s unclear to me whether that meets your criteria.
Could you be more precise about what kinds of algorithms you consider it fair to compare against, and why?
I am going after pure BP/SGD, so neural networks (no SVM), no convolution,...
No pre-processing either. That is changing the dataset.
It is just a POC, to make a point: you do not need mathematics for AGI. Our brain does not.
I will publish a follow-up post soon.
Also,
No regularisation. I wrote about that in the analysis.
Without max-norm (or maxout, ladder, VAT: all forms of regularisation), BP/SGD only achieves 98.75% (from the dropout −2014- paper).
Regularisation must come from outside the system. - SO can be seen that way—or through local interactions (neighbors). Many papers clearly suggest that should improve the result.
That is yet to do.
What is BP in BP/SGD?
So, as I see it, there are three possible different fairness criteria which define what we can compare your model with.
Virtually anything goes—convolutions, CNNs, pretraining on imagenet, …
Permutation-invariant models are allowed, everything else is disallowed. For instance, MLPs are ok, CNNs are forbidden, tensor decompositions are forbidden, SVMs are ok as long as the transformations used are permutation-invariant. Pre-processing is allowed as long as it’s permutation-invariant.
The restriction from the criterion 2 is enabled. Also, the model must be biologically plausible, or, shall we say, similar to the brain. Or maybe similar to how a potential brain of another creature might be? Not sure. This rules out SGD, regularization that uses norm of vectors, etc. are forbidden. Strengthening neuron connections based on something that happens locally is allowed.
Personally, I know basically nothing about the landscape of models satisfying the criterion 3.
BP is Back-Propagation.
We are completely missing the plot here.
I had to use a dataset for my explorations and MNIST was simple; and I used PI-MNIST to show an ‘impressive’ result so that people have to look at it. I expected the ‘PI’ to be understood, and it is not. Note that I could readily answer the ‘F-MNIST challenge’.
If I had just expressed an opinion on how to go about AI, the way I did in the roadmap, it would have been just, rightly, ignored. The point was to show it is not ‘ridiculous’ and the system fits with that roadmap.
I see that your last post is about complexity science. This is an example of it. The domain of application is nature. Nature is complex, and maths have difficulties with complexity. The field of chaos theory puttered in the 80s for that reason. If you want to know more about it, start with Turing morphogenesis (read the conclusion), then Prigogine. In NN, there is Kohonen.
Some things are theoretical correct, but practically useless. You know how to win the lotto, but nobody does it. Better something simple that works and can be reasoned about, even without a mathematical theory. AI is not quantum physics.
Maybe it could be said that intelligence is to cut through all the details to, then, reason using what is left, but the devil is in those details.
[Duplicate Comment.]
It would be surprising to me if the algorithm really performed this poorly on fashion mnist. F-MNIST is harder, but (intentionally) very similar to MNIST.
CIFAR maybe with limited categories would be a logical “hard” test IF it can be made to work on F-MNIST.
On the other hand (without claiming that I understand the ins and outs of the algorithm) I could imagine that out of the neuroinspired playbook it misses the winner-takes-all competition between neurons that allows modelling of multi-modal distribution and possibly allows easier distinction of not-linearly-separable datapoints.
See my comment on reversing the shades on F-MNIST. I will check it later but I see it gets up to 48% in the ‘wrong’ order and that is surprisingly good. I worked on CIFAR, but that is another story. As-is it gives bad results and you have to add other ‘things’.
As you guessed, I belong to neuroinspired branch and most of my ‘giants’ belong there. I strongly expected, when I started my investigations, to use some of the works that I knew and appreciated along the lines your are mentioning, and I investigated some of them early on.
To my surprise, I did not need them to get to this result, so they are absent.
The two neuronal layers form of the neocortex is where they will be useful. This is only one layer.
Another (bad) reason, is that they add to the GPU hell… that has limited my investigations. It is identified source of potential improvements.