Update: 3 runs (2 random) , 10 million steps. All three over 88.33 (average 9.5-10.5 million on the 3: 88.43). New SOTA ? Please check and update.
Update 2: 89.85 at step 50 Million with QuantUpP = 3.2 and quantUpN = 39. It does perform very well. I will leave it at that. As said in my post, those are the two important parameters (no, it is not a universal super-intelligence in 600 lines of code). Be rational, and think about what the fact that this mechanism works so well means (I am talking to everybody, there).
I looked at it, the informed way.
It gets over 88% with very limited effort.
As I pointed, the two dataset are similar in technical description, but they are ‘reversed’ in the data.
MNIST is black dots on white background. F-MNIST is white dots on black background. The histograms are very different.
I tried to make it work despite that, just with parameter changes, and it does.
Here are the changes to the code:
on line 555: quantUpP = 1.9 ;
on line 556: quantUpN = 24.7 ;
with rand(1000), as it is in the code, you already clear 86% at step 300,000 and 87% at step 600,000 and 88% at 3 Million.
I had made another, small and irrelevant, change, in my full tests, so I am running the full tests again without it (the value/steps above are from that new series). It seems to be better again without it… maybe a new SOTA (update: touched 88.33% at step 4,800,000 ! … and 88.5 at 6.8 Millions !. MLPs perform poorly when applied to data even slightly more complicated than MNIST)
I do not understand what is all the hype around MNIST. Once again, this is PI-MNIST and that makes it very different (to put it simply: no geometry, so no convolution).
I would like anybody to give me a reference to some ‘other method that worked on MNIST but did not make it further’, that uses PI-MNIST and gets more than 98.4% on it.
And if anybody tries it on yet another dataset, could they please notify me so I look at it, before they make potentially damaging statements.
Your result is virtually identical to the first-ranking unambiguously permutation-invariant method (MLP 256-128-100). HOG+SVM does even better, but it’s unclear to me whether that meets your criteria.
Could you be more precise about what kinds of algorithms you consider it fair to compare against, and why?
No regularisation. I wrote about that in the analysis.
Without max-norm (or maxout, ladder, VAT: all forms of regularisation), BP/SGD only achieves 98.75% (from the dropout −2014- paper).
Regularisation must come from outside the system. - SO can be seen that way—or through local interactions (neighbors). Many papers clearly suggest that should improve the result.
So, as I see it, there are three possible different fairness criteria which define what we can compare your model with.
Virtually anything goes—convolutions, CNNs, pretraining on imagenet, …
Permutation-invariant models are allowed, everything else is disallowed. For instance, MLPs are ok, CNNs are forbidden, tensor decompositions are forbidden, SVMs are ok as long as the transformations used are permutation-invariant. Pre-processing is allowed as long as it’s permutation-invariant.
The restriction from the criterion 2 is enabled. Also, the model must be biologically plausible, or, shall we say, similar to the brain. Or maybe similar to how a potential brain of another creature might be? Not sure. This rules out SGD, regularization that uses norm of vectors, etc. are forbidden. Strengthening neuron connections based on something that happens locally is allowed.
Personally, I know basically nothing about the landscape of models satisfying the criterion 3.
I had to use a dataset for my explorations and MNIST was simple; and I used PI-MNIST to show an ‘impressive’ result so that people have to look at it. I expected the ‘PI’ to be understood, and it is not. Note that I could readily answer the ‘F-MNIST challenge’.
If I had just expressed an opinion on how to go about AI, the way I did in the roadmap, it would have been just, rightly, ignored. The point was to show it is not ‘ridiculous’ and the system fits with that roadmap.
I see that your last post is about complexity science. This is an example of it. The domain of application is nature. Nature is complex, and maths have difficulties with complexity. The field of chaos theory puttered in the 80s for that reason. If you want to know more about it, start with Turing morphogenesis (read the conclusion), then Prigogine. In NN, there is Kohonen.
Some things are theoretical correct, but practically useless. You know how to win the lotto, but nobody does it. Better something simple that works and can be reasoned about, even without a mathematical theory. AI is not quantum physics.
Maybe it could be said that intelligence is to cut through all the details to, then, reason using what is left, but the devil is in those details.
Update: It gets higher if you run it for long enough.
Update: 3 runs (2 random) , 10 million steps. All three over 88.33 (average 9.5-10.5 million on the 3: 88.43). New SOTA ? Please check and update.
Update 2: 89.85 at step 50 Million with QuantUpP = 3.2 and quantUpN = 39. It does perform very well. I will leave it at that. As said in my post, those are the two important parameters (no, it is not a universal super-intelligence in 600 lines of code). Be rational, and think about what the fact that this mechanism works so well means (I am talking to everybody, there).
I looked at it, the informed way.
It gets over 88% with very limited effort.
As I pointed, the two dataset are similar in technical description, but they are ‘reversed’ in the data.
MNIST is black dots on white background. F-MNIST is white dots on black background. The histograms are very different.
I tried to make it work despite that, just with parameter changes, and it does.
Here are the changes to the code:
on line 555: quantUpP = 1.9 ;
on line 556: quantUpN = 24.7 ;
with rand(1000), as it is in the code, you already clear 86% at step 300,000 and 87% at step 600,000 and 88% at 3 Million.
I had made another, small and irrelevant, change, in my full tests, so I am running the full tests again without it (the value/steps above are from that new series). It seems to be better again without it… maybe a new SOTA (update: touched 88.33% at step 4,800,000 ! … and 88.5 at 6.8 Millions !. MLPs perform poorly when applied to data even slightly more complicated than MNIST)
I do not understand what is all the hype around MNIST. Once again, this is PI-MNIST and that makes it very different (to put it simply: no geometry, so no convolution).
I would like anybody to give me a reference to some ‘other method that worked on MNIST but did not make it further’, that uses PI-MNIST and gets more than 98.4% on it.
And if anybody tries it on yet another dataset, could they please notify me so I look at it, before they make potentially damaging statements.
Here with 2 conv and less than 100k parameters the accuracy is ~92%. https://github.com/zalandoresearch/fashion-mnist
SOTA on Fashion-MNIST is >96%. https://paperswithcode.com/sota/image-classification-on-fashion-mnist
no convolution.
You are comparing pears and apples.
I have shared the base because it has real scientific (and philosophical) value.
Geometry and other are separate, and of lesser scientific value. they are more technology.
Your result is virtually identical to the first-ranking unambiguously permutation-invariant method (MLP 256-128-100). HOG+SVM does even better, but it’s unclear to me whether that meets your criteria.
Could you be more precise about what kinds of algorithms you consider it fair to compare against, and why?
I am going after pure BP/SGD, so neural networks (no SVM), no convolution,...
No pre-processing either. That is changing the dataset.
It is just a POC, to make a point: you do not need mathematics for AGI. Our brain does not.
I will publish a follow-up post soon.
Also,
No regularisation. I wrote about that in the analysis.
Without max-norm (or maxout, ladder, VAT: all forms of regularisation), BP/SGD only achieves 98.75% (from the dropout −2014- paper).
Regularisation must come from outside the system. - SO can be seen that way—or through local interactions (neighbors). Many papers clearly suggest that should improve the result.
That is yet to do.
What is BP in BP/SGD?
So, as I see it, there are three possible different fairness criteria which define what we can compare your model with.
Virtually anything goes—convolutions, CNNs, pretraining on imagenet, …
Permutation-invariant models are allowed, everything else is disallowed. For instance, MLPs are ok, CNNs are forbidden, tensor decompositions are forbidden, SVMs are ok as long as the transformations used are permutation-invariant. Pre-processing is allowed as long as it’s permutation-invariant.
The restriction from the criterion 2 is enabled. Also, the model must be biologically plausible, or, shall we say, similar to the brain. Or maybe similar to how a potential brain of another creature might be? Not sure. This rules out SGD, regularization that uses norm of vectors, etc. are forbidden. Strengthening neuron connections based on something that happens locally is allowed.
Personally, I know basically nothing about the landscape of models satisfying the criterion 3.
BP is Back-Propagation.
We are completely missing the plot here.
I had to use a dataset for my explorations and MNIST was simple; and I used PI-MNIST to show an ‘impressive’ result so that people have to look at it. I expected the ‘PI’ to be understood, and it is not. Note that I could readily answer the ‘F-MNIST challenge’.
If I had just expressed an opinion on how to go about AI, the way I did in the roadmap, it would have been just, rightly, ignored. The point was to show it is not ‘ridiculous’ and the system fits with that roadmap.
I see that your last post is about complexity science. This is an example of it. The domain of application is nature. Nature is complex, and maths have difficulties with complexity. The field of chaos theory puttered in the 80s for that reason. If you want to know more about it, start with Turing morphogenesis (read the conclusion), then Prigogine. In NN, there is Kohonen.
Some things are theoretical correct, but practically useless. You know how to win the lotto, but nobody does it. Better something simple that works and can be reasoned about, even without a mathematical theory. AI is not quantum physics.
Maybe it could be said that intelligence is to cut through all the details to, then, reason using what is left, but the devil is in those details.
[Duplicate Comment.]