V_V comments on The Brain as a Universal Learning Machine

V_V 25 Jun 2015 15:04 UTC
13 points

The ferret rewiring experiments, the tongue based vision stuff, the visual regions learning to perform echolocation computations in the blind, this evidence together is decisive against the evolved modularity hypothesis as I’ve defined that hypothesis, at least for the cortex.

But none of these works as well as using the original task-specific regions, and anyway in all these experiments the original task-specific regions are still present and functional, therefore maybe the brain can partially use these regions by learning how to route the signals to them.

Sure. Once you have software loaded/learned into hardware, damage to the hardware is damage to the software. This doesn’t differentiate the two hypotheses.

But then why doesn’t universal learning just co-opt some other brain region to perform the task of the damaged one? In the cases where there is a congenital malformation, that makes the usual task-specific region missing or dysfunctional, why isn’t the task allocated to some other region?

And anyway why is the specialization pattern consistent across individuals and even species? If you train an artificial neural network multiple times on the same dataset from different random initializations each time the hidden nodes will specialize in a different way: at least ANNs have permutation symmetry between nodes in the same layer, and as long as nodes operate in the linear region of the activation function, there is also redundancy between layers. This means that many sets of weights specify the same or similar function, and the training process chooses one of them randomly depending on the initialization (and minibatch sampling, dropout, etc.).

If, as you claim, the basal ganglia and the cortex in the brain make up a sort of cpu-memory system, then there should be substantial permutation symmetry. After all, in a computer you can swap block or pages of memory around and as long as pointers (or page tables) are updated the behavior does not change, up to some performance issues due to cache misses. If the brain worked that way we should expect cortical regions to be allocated to different tasks in a more or less random pattern varying between individuals.
Instead we observe substantial consistency, even in the left-right specialization patterns which is remarkable since at macroscopic level the brain has substantial lateral symmetry.

This has also been tested via decortication experiments and confirms the general ULH—rabbits rely much less on their cortex for motor behavior, larger primates rely on it almost exclusively, cats and dogs are somewhere in between, etc. This evidence shows that the cortex is general purpose, and acquires complex circuitry through learning. Recent machine learning systems provide further evidence in the form of—this is how it could work.

Decortication experiments only show that certain species rely on the cortex more than others, they don’t show that that cortex is general purpose and acquires complex circuitry through learning.

Horses, for instance, are large animals with a long lifespan and a large brain (encephalization coefficient similar to that of cats and dogs), and yet a newborn horse is able to walk, run and follow their mother within a few hours from birth.

As I mentioned in the article, backprop is not really biologically plausible. Targetprop is, and there are good reasons to suspect the brain is using something like targetprop—as that theory is the latest result in a long line of work attempting to understand how the brain could be doing long range learning.

Targetprop is still highly speculative. It has not shown to work well in artificial neural networks and the evidence of biological plausibility is handwavy.

Biological plausibility was one of the heavily discussed aspects of RELUs.

Ok.

Of course convnents still work without weight sharing—it just may require more data and or better training and regularization

In principle yes, but trivially so as they are universal approximators. In practice, weight sharing enables these systems to easily learn translational invariance.

That’s actually extremely impressive—superhuman learning speed.

Humans get tired after continuously playing for a few hours, but in terms of overall playtime they learn faster.
- jacob_cannell 25 Jun 2015 19:19 UTC
  16 points
  Parent
  
  in all these experiments the original task-specific regions are still present and functional, therefore maybe the brain can partially use these regions by learning how to route the signals to them.
  
  No—these studies involve direct measurements (electrodes for the ferret rewiring, MRI for echolocation). They know the rewired auditory cortex is doing vision, etc.
  
  But then why doesn’t universal learning just co-opt some other brain region to perform the task of the damaged one?
  
  It can, and this does happen all the time. Humans can recover from serious brain damage (stroke, injury, etc). It takes time to retrain and reroute circuitry—similar to relearning everything that was lost all over again.
  
  And anyway why is the specialization pattern consistent across individuals and even species? If you train an artificial neural network multiple times on the same dataset
  
  Current ANN’s assume a fixed module layout, so they aren’t really comparable in module-task assignment.
  
  Much of the specialization pattern could just be geography—V1 becomes visual because it is closest to the visual input. A1 becomes auditory because it is closest to the auditory input. etc.
  
  This should be the default hypothesis, but there also could be some element of prior loading, perhaps from pattern generators in the brainstem. (I have read a theory that there is a pattern generator for faces that pretrains the visual cortex a little bit in the womb, so that it starts with a vague primitive face detector).
  
  After all, in a computer you can swap block or pages of memory around and as long as pointers (or page tables) are updated the behavior does not change, up to some performance issues due to cache misses. If the brain worked that way we should expect cortical regions to be allocated to different tasks in a more or less random pattern varying between individuals.
  
  I said the BG is kind-of-like the CPU, the cortex is kind-of-like a big FPGA, but that is an anlogy. The are huge differences between slow bio-circuitry and fast von neumman machines.
  
  Firstly the brain doesn’t really have a concept of ‘swapping memory’. The closest thing to that is retraining, where the hippocampus can train info into the cortex. It’s a slow complex process that is nothing like swapping memory.
  
  Finally the brain is much more optimized at the wiring/latency level. Functionality goes in certain places because that is where it is best for that functionality—it isn’t permutation symmetric in the slightest. Every location has latency/wiring tradeoffs. In a von neumman memory we just abstract that all away. Not in the brain. There is an actual optimal location for every concept/function etc.
  
  a newborn horse is able to walk, run and follow their mother within a few hours from birth.
  
  That is fast for mammals—I know first hand that it can take days for deer. Nonetheless, as we discussed, the brainstem provides a library of innate complex motor circuitry in particular, which various mammals can rely on to varying degrees, depending on how important complex early motor behavior is.
  
  Targetprop is still highly speculative. It has not shown to work well in artificial neural networks and the evidence of biological plausibility is handwavy.
  
  I agree that there is still more work to be done understanding the brain’s learning machinery. Targetprop is useful/exciting in ML, but it isn’t the full picture yet.
  
  [Atari] That’s actually extremely impressive—superhuman learning speed.
  
  Humans get tired after continuously playing for a few hours, but in terms of overall playtime they learn faster.
  
  Not at all. The Atari agent becomes semi-superhuman by day 3 of it’s life. When humans start playing atari, they already have trained vision and motor systems, and Atari is designed for these systems. Even then your statement is wrong—in that I don’t think any children achieve playtester levels of skill in just even a few days.
  What links here?
  - The shard theory of human values by Quintin Pope (4 Sep 2022 4:28 UTC; 254 points)
  - V_V 27 Jun 2015 19:31 UTC
    1 point
    Parent
    
    Finally the brain is much more optimized at the wiring/latency level. Functionality goes in certain places because that is where it is best for that functionality—it isn’t permutation symmetric in the slightest. Every location has latency/wiring tradeoffs. In a von neumman memory we just abstract that all away. Not in the brain. There is an actual optimal location for every concept/function etc.
    
    Well, the eyes are at the front of the head, but the optic nerves connect to the brain at the back, and they also cross at the optic chiasm. Axons also cross contralaterally in the spinal cord and if I recall correctly there are various nerves that also don’t take the shortest path.
    This seems to me as evidence that the nervous system is not strongly optimized for latency.
    - jacob_cannell 27 Jun 2015 20:25 UTC
      6 points
      Parent
      This is a total misconception, and it is a good example of the naive engineer fallacy (jumping to the conclusion that a system is poorly designed when you don’t understand how the system actually works and why).
      
      Remember the distributed software modules—including V1 - have components in multiple physical modules (cortex, cerebellum, thalamus, BG). Not every DSM has components in all subsystems, but V1 definitely has a thalamic relay component (VGN).
      
      The thalamus/BG is in the center of the brain, which makes sense from wiring minimization when you understand the DPM system. Low freq/compressed versions of the cortical map computations can interact at higher speeds inside the small compact volume of the BG/thalamus. The BG/thalamus basically contains a microcosm model of the cortex within itself.
      
      The thalamic relay comes first in sequential processing order, so moving cortical V1 closer to the eyes wouldn’t help in the slightest. (Draw this out if it doesn’t make sense)