lsusr comments on [missing post]

lsusr 4 Jan 2022 10:30 UTC
9 points
I like this idea. It seems to me like a fair test. I will run the code overnight with default settings and see what happens.
- lsusr 4 Jan 2022 10:54 UTC
  18 points
  Parent
  Initial results indicate the code performs poorly on F-MNIST. It is possible this is a hyperparameter-tuning issue but my default conclusion is that MNIST (created in 1998, before the invention of modern GPUs) is just too easy.
```
testing 20000: 0: 44.413  1: 43.77
testing 40000: 0: 23.702  1: 23.34
testing 60000: 0: 21.822  1: 20.92
testing 80000: 0: 38.627  1: 37.75
testing 100000: 0: 25.107  1: 24.85
testing 120000: 0: 28.893  1: 28.34
testing 140000: 0: 29.203  1: 29.50
testing 160000: 0: 40.437  1: 39.55
testing 180000: 0: 49.828  1: 48.59
testing 200000: 0: 39.510  1: 39.81
testing 220000: 0: 39.938  1: 40.10
testing 240000: 0: 32.390  1: 31.65
testing 260000: 0: 25.367  1: 24.51
testing 280000: 0: 29.198  1: 28.66
testing 300000: 0: 29.823  1: 28.91
testing 320000: 0: 35.178  1: 34.20
testing 340000: 0: 32.370  1: 31.94
testing 360000: 0: 29.083  1: 28.31
testing 380000: 0: 30.117  1: 30.05
testing 400000: 0: 39.125  1: 39.21
```
  - philip_b 4 Jan 2022 15:06 UTC
    10 points
    Parent
    Btw, a multilayer perceptron (which is a permutation invariant model) with 230000 parameters and, AFAIK, no data augmentaiton used, can achieve 88.33% accuracy on FashionMNIST.
    - D𝜋 6 Jan 2022 8:09 UTC
      3 points
      Parent
      I doubt that this would be the best a MLP can achieve on F-MNIST.
      I will put it this way: SONNs and MLPs do the same thing, in a different way. Therefore they should achieve the same accuracy. If this SONN can get near 90%, so should MLPs.
      It is likely that nobody has bothered to try ‘without convolutions’ because it is so old-fashioned.
      Convolutions are for repeated locally aggregated correlations.
  - lsusr 5 Jan 2022 3:54 UTC
    7 points
    Parent
    Update: It gets higher if you run it for long enough.
    
    testing 59300000: 0: 58.615 1: 57.36 testing 59320000: 0: 50.902 1: 50.28 testing 59340000: 0: 68.415 1: 66.67 testing 59360000: 0: 71.813 1: 69.36 testing 59380000: 0: 70.275 1: 68.53 testing 59400000: 0: 71.577 1: 68.67
    - D𝜋 5 Jan 2022 9:19 UTC
      23 points
      Parent
      Update: 3 runs (2 random) , 10 million steps. All three over 88.33 (average 9.5-10.5 million on the 3: 88.43). New SOTA ? Please check and update.
      Update 2: 89.85 at step 50 Million with QuantUpP = 3.2 and quantUpN = 39. It does perform very well. I will leave it at that. As said in my post, those are the two important parameters (no, it is not a universal super-intelligence in 600 lines of code). Be rational, and think about what the fact that this mechanism works so well means (I am talking to everybody, there).
      I looked at it, the informed way.
      It gets over 88% with very limited effort.
      As I pointed, the two dataset are similar in technical description, but they are ‘reversed’ in the data.
      MNIST is black dots on white background. F-MNIST is white dots on black background. The histograms are very different.
      I tried to make it work despite that, just with parameter changes, and it does.
      Here are the changes to the code:
      on line 555: quantUpP = 1.9 ;
      on line 556: quantUpN = 24.7 ;
      with rand(1000), as it is in the code, you already clear 86% at step 300,000 and 87% at step 600,000 and 88% at 3 Million.
      I had made another, small and irrelevant, change, in my full tests, so I am running the full tests again without it (the value/steps above are from that new series). It seems to be better again without it… maybe a new SOTA (update: touched 88.33% at step 4,800,000 ! … and 88.5 at 6.8 Millions !. MLPs perform poorly when applied to data even slightly more complicated than MNIST)
      I do not understand what is all the hype around MNIST. Once again, this is PI-MNIST and that makes it very different (to put it simply: no geometry, so no convolution).
      I would like anybody to give me a reference to some ‘other method that worked on MNIST but did not make it further’, that uses PI-MNIST and gets more than 98.4% on it.
      And if anybody tries it on yet another dataset, could they please notify me so I look at it, before they make potentially damaging statements.
      - Maxime Riché 5 Jan 2022 10:35 UTC
        3 points
        Parent
        Here with 2 conv and less than 100k parameters the accuracy is ~92%. https://github.com/zalandoresearch/fashion-mnist
        
        SOTA on Fashion-MNIST is >96%. https://paperswithcode.com/sota/image-classification-on-fashion-mnist
        D𝜋 5 Jan 2022 10:50 UTC
        9 points
        Parent
        no convolution.
        You are comparing pears and apples.
        I have shared the base because it has real scientific (and philosophical) value.
        Geometry and other are separate, and of lesser scientific value. they are more technology.
        ZankerH 5 Jan 2022 12:52 UTC
        3 points
        Parent
        Your result is virtually identical to the first-ranking unambiguously permutation-invariant method (MLP 256-128-100). HOG+SVM does even better, but it’s unclear to me whether that meets your criteria.
        Could you be more precise about what kinds of algorithms you consider it fair to compare against, and why?
        D𝜋 6 Jan 2022 13:42 UTC
        3 points
        Parent
        I am going after pure BP/SGD, so neural networks (no SVM), no convolution,...
        No pre-processing either. That is changing the dataset.
        It is just a POC, to make a point: you do not need mathematics for AGI. Our brain does not.
        I will publish a follow-up post soon.
        D𝜋 7 Jan 2022 7:28 UTC
        3 points
        Parent
        Also,
        No regularisation. I wrote about that in the analysis.
        Without max-norm (or maxout, ladder, VAT: all forms of regularisation), BP/SGD only achieves 98.75% (from the dropout −2014- paper).
        Regularisation must come from outside the system. - SO can be seen that way—or through local interactions (neighbors). Many papers clearly suggest that should improve the result.
        That is yet to do.
        philip_b 7 Jan 2022 14:37 UTC
        1 point
        Parent
        What is BP in BP/SGD?
        
        So, as I see it, there are three possible different fairness criteria which define what we can compare your model with.
        
        Virtually anything goes—convolutions, CNNs, pretraining on imagenet, …
        Permutation-invariant models are allowed, everything else is disallowed. For instance, MLPs are ok, CNNs are forbidden, tensor decompositions are forbidden, SVMs are ok as long as the transformations used are permutation-invariant. Pre-processing is allowed as long as it’s permutation-invariant.
        The restriction from the criterion 2 is enabled. Also, the model must be biologically plausible, or, shall we say, similar to the brain. Or maybe similar to how a potential brain of another creature might be? Not sure. This rules out SGD, regularization that uses norm of vectors, etc. are forbidden. Strengthening neuron connections based on something that happens locally is allowed.
        
        Personally, I know basically nothing about the landscape of models satisfying the criterion 3.
        D𝜋 7 Jan 2022 15:43 UTC
        3 points
        Parent
        BP is Back-Propagation.
        We are completely missing the plot here.
        I had to use a dataset for my explorations and MNIST was simple; and I used PI-MNIST to show an ‘impressive’ result so that people have to look at it. I expected the ‘PI’ to be understood, and it is not. Note that I could readily answer the ‘F-MNIST challenge’.
        If I had just expressed an opinion on how to go about AI, the way I did in the roadmap, it would have been just, rightly, ignored. The point was to show it is not ‘ridiculous’ and the system fits with that roadmap.
        I see that your last post is about complexity science. This is an example of it. The domain of application is nature. Nature is complex, and maths have difficulties with complexity. The field of chaos theory puttered in the 80s for that reason. If you want to know more about it, start with Turing morphogenesis (read the conclusion), then Prigogine. In NN, there is Kohonen.
        Some things are theoretical correct, but practically useless. You know how to win the lotto, but nobody does it. Better something simple that works and can be reasoned about, even without a mathematical theory. AI is not quantum physics.
        Maybe it could be said that intelligence is to cut through all the details to, then, reason using what is left, but the devil is in those details.
    - lsusr 5 Jan 2022 22:32 UTC
      2 points
      Parent
      [Duplicate Comment.]
  - p.b. 4 Jan 2022 16:20 UTC
    3 points
    Parent
    It would be surprising to me if the algorithm really performed this poorly on fashion mnist. F-MNIST is harder, but (intentionally) very similar to MNIST.
    CIFAR maybe with limited categories would be a logical “hard” test IF it can be made to work on F-MNIST.
    On the other hand (without claiming that I understand the ins and outs of the algorithm) I could imagine that out of the neuroinspired playbook it misses the winner-takes-all competition between neurons that allows modelling of multi-modal distribution and possibly allows easier distinction of not-linearly-separable datapoints.
    - D𝜋 4 Jan 2022 17:06 UTC
      2 points
      Parent
      See my comment on reversing the shades on F-MNIST. I will check it later but I see it gets up to 48% in the ‘wrong’ order and that is surprisingly good. I worked on CIFAR, but that is another story. As-is it gives bad results and you have to add other ‘things’.
      As you guessed, I belong to neuroinspired branch and most of my ‘giants’ belong there. I strongly expected, when I started my investigations, to use some of the works that I knew and appreciated along the lines your are mentioning, and I investigated some of them early on.
      To my surprise, I did not need them to get to this result, so they are absent.
      The two neuronal layers form of the neocortex is where they will be useful. This is only one layer.
      Another (bad) reason, is that they add to the GPU hell… that has limited my investigations. It is identified source of potential improvements.