jacob_cannell comments on The Brain as a Universal Learning Machine

jacob_cannell 23 Jun 2015 20:07 UTC
19 points
Thanks, I was waiting for at least one somewhat critical reply :)
Specifically, I think you fail to address the evidence for evolved modularity:
- The brain uses spatially specialized regions for different cognitive tasks.
- This specialization pattern is mostly consistent across different humans and even across different species.
The ferret rewiring experiments, the tongue based vision stuff, the visual regions learning to perform echolocation computations in the blind, this evidence together is decisive against the evolved modularity hypothesis as I’ve defined that hypothesis, at least for the cortex. The EMH posits that the specific cortical regions rely on complex innate circuitry specialized for specific tasks. The evidence disproves that hypothesis.

Damage to or malformation of some brain regions can cause specific forms of disability (e.g. face blindness). Sometimes the disability can be overcome but often not completely.

Sure. Once you have software loaded/learned into hardware, damage to the hardware is damage to the software. This doesn’t differentiate the two hypotheses.

In various mammals, infants are capable of complex behavior straight out of the womb. Human infants are only exhibit very simple behaviors and require many years to reach full cognitive maturity therefore the human brain relies more on learning than the brain of other mammals, but the basic architecture is the same, thus this is a difference of degree, not kind.

Yes—and I described what is known about that basic architecture. The extent to which a particular brain relies on learning vs innate behaviour depends on various tradeoffs such as organism lifetime and brain size. Small brained and short-living animals have much less to gain from learning (less time to acquire data, less hardware power), so they rely more on innate circuitry, much of which is encoded in the oldbrain and the brainstem. This is all very much evidence for the ULH. The generic learning structures—the cortex and cerbellum, generally grow in size with larger organisms and longer lifespans.

This has also been tested via decortication experiments and confirms the general ULH—rabbits rely much less on their cortex for motor behavior, larger primates rely on it almost exclusively, cats and dogs are somewhere in between, etc.

This evidence shows that the cortex is general purpose, and acquires complex circuitry through learning. Recent machine learning systems provide further evidence in the form of—this is how it could work.

For all the speculation, there is still no clear evidence that the brain uses anything similar to backpropagation.

As I mentioned in the article, backprop is not really biologically plausible. Targetprop is, and there are good reasons to suspect the brain is using something like targetprop—as that theory is the latest result in a long line of work attempting to understand how the brain could be doing long range learning. Investigating and testing the targetprop theory and really confirming it could take a while—even decades. On the other hand, if targetprop or some variant is proven to work in a brain-like AGI, that is something of a working theory that could then help accelerate neuroscience confirmation.

There seems to be a trend in AI where for any technique that is currently hot there are people who say: “This is how the brain works. We don’t know all the details, but studies X, Y and Z clearly point in this direction.” After a few years and maybe an AI (mini)winter the brain seems to work in another way...

I did not say deep learning is “how the brain works”. I said instead the brain is—roughly—a specific biological implementation of a ULH, which itself is a very general model which also will include any practical AGIs.

I said that DL helps indirectly confirm the ULH of the brain, specifically by showing how the complex task specific circuitry of the cortex could arise through a simple universal learning algorithm.

Computational modeling is key—if you can’t build something, you don’t understand it. To the extent that any AI model can functionally replicate specific brain circuits, it is useful to neuroscience. Period. Far more useful than psychological theorizing not grounded in circuit reality. So computational neuroscience and deep learning (which really is just the neuroscience inspired branch of machine learning) naturally have deep connections.

Some of the most successful deep learning approaches, such as modern convnets for computer vision, rely on quite un-biological features such as weight sharing and rectified linear units

Biological plausibility was one of the heavily discussed aspects of RELUs.

From the abstract:

“While logistic sigmoid neurons are more biologically plausible than hyperbolic tangent neurons, the latter work better for training multi-layer neural networks. This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of . . ”

Weight sharing is unbiological: true. It is also an important advantage that von-neumman (time-multiplexed) systems have over biological (non-multiplexed). The neuromorphic hardware approaches largely cannot handle weight-sharing. Of course convnents still work without weight sharing—it just may require more data and or better training and regularization. It is interesting to speculate how the brain deals with that, as is comparing the details of convent learning capability vs bio-vision. I don’t have time to get into that at the moment, but I did link to at least one article comparing convents to bio vision in the OP.

“Deep learning” is a quite vague term anyway,

Sure—so just taboo it then. When I use the term “deep learning”, it means something like “the branch of machine learning which is more related to neuroscience” (while still focused on end results rather than emulation).

Perhaps most importantly, deep learning methods generally work in supervised learning settings and they have quite weak priors: they require a dataset as big as ImageNet to yield good image recognition performances

Comparing two learning systems trained on completely different datasets with very different objective functions is complicated.

In general though, CNNs are a good model of fast feedforward vision—the first 150ms of the ventral stream. In that domain they are comparable to biovision, with the important caveat that biovision computes a larger and richer output parameter map than most any CNNs. Most CNNs (there are many different types) are more narrowly focused, but also probably learn faster because of advantages like weight sharing. The amount of data required to train the CNN up to superhuman performance on narrow tasks is comparable or less than that required to train a human visual system up to high performance. (but again the cortex is doing something more like transfer learning, which is harder)

Past 150 ms or so and humans start making multiple saccades and also start to integrate information from a larger number of brain regions, including frontal and temporal cortical regions. At that point the two systems aren’t even comparable, humans are using more complex ‘mental programs’ over multiple saccades to make visual judgements.

Of course, eventually we will have AGI systems that also integrate those capabilities.

days of continuous simulated gameplay on the ATARI 2600 emulator to obtain good scores

That’s actually extremely impressive—superhuman learning speed.

Therefore I would say that deep learning methods, while certainly interesting from an engineering perspective, are probably not very much relevant to the understanding of the brain, at least given the current state of the evidence.

In that case, I would say you may want to read up more on the field. If you haven’t yet, check out the original sparse coding paper (over 3000 citations), to get an idea of how crucial new computational models have been for advancing our understanding of cortex.
- V_V 25 Jun 2015 15:04 UTC
  13 points
  Parent
  
  The ferret rewiring experiments, the tongue based vision stuff, the visual regions learning to perform echolocation computations in the blind, this evidence together is decisive against the evolved modularity hypothesis as I’ve defined that hypothesis, at least for the cortex.
  
  But none of these works as well as using the original task-specific regions, and anyway in all these experiments the original task-specific regions are still present and functional, therefore maybe the brain can partially use these regions by learning how to route the signals to them.
  
  Sure. Once you have software loaded/learned into hardware, damage to the hardware is damage to the software. This doesn’t differentiate the two hypotheses.
  
  But then why doesn’t universal learning just co-opt some other brain region to perform the task of the damaged one? In the cases where there is a congenital malformation, that makes the usual task-specific region missing or dysfunctional, why isn’t the task allocated to some other region?
  
  And anyway why is the specialization pattern consistent across individuals and even species? If you train an artificial neural network multiple times on the same dataset from different random initializations each time the hidden nodes will specialize in a different way: at least ANNs have permutation symmetry between nodes in the same layer, and as long as nodes operate in the linear region of the activation function, there is also redundancy between layers. This means that many sets of weights specify the same or similar function, and the training process chooses one of them randomly depending on the initialization (and minibatch sampling, dropout, etc.).
  
  If, as you claim, the basal ganglia and the cortex in the brain make up a sort of cpu-memory system, then there should be substantial permutation symmetry. After all, in a computer you can swap block or pages of memory around and as long as pointers (or page tables) are updated the behavior does not change, up to some performance issues due to cache misses. If the brain worked that way we should expect cortical regions to be allocated to different tasks in a more or less random pattern varying between individuals.
  Instead we observe substantial consistency, even in the left-right specialization patterns which is remarkable since at macroscopic level the brain has substantial lateral symmetry.
  
  This has also been tested via decortication experiments and confirms the general ULH—rabbits rely much less on their cortex for motor behavior, larger primates rely on it almost exclusively, cats and dogs are somewhere in between, etc. This evidence shows that the cortex is general purpose, and acquires complex circuitry through learning. Recent machine learning systems provide further evidence in the form of—this is how it could work.
  
  Decortication experiments only show that certain species rely on the cortex more than others, they don’t show that that cortex is general purpose and acquires complex circuitry through learning.
  
  Horses, for instance, are large animals with a long lifespan and a large brain (encephalization coefficient similar to that of cats and dogs), and yet a newborn horse is able to walk, run and follow their mother within a few hours from birth.
  
  As I mentioned in the article, backprop is not really biologically plausible. Targetprop is, and there are good reasons to suspect the brain is using something like targetprop—as that theory is the latest result in a long line of work attempting to understand how the brain could be doing long range learning.
  
  Targetprop is still highly speculative. It has not shown to work well in artificial neural networks and the evidence of biological plausibility is handwavy.
  
  Biological plausibility was one of the heavily discussed aspects of RELUs.
  
  Ok.
  
  Of course convnents still work without weight sharing—it just may require more data and or better training and regularization
  
  In principle yes, but trivially so as they are universal approximators. In practice, weight sharing enables these systems to easily learn translational invariance.
  
  That’s actually extremely impressive—superhuman learning speed.
  
  Humans get tired after continuously playing for a few hours, but in terms of overall playtime they learn faster.
  - jacob_cannell 25 Jun 2015 19:19 UTC
    16 points
    Parent
    
    in all these experiments the original task-specific regions are still present and functional, therefore maybe the brain can partially use these regions by learning how to route the signals to them.
    
    No—these studies involve direct measurements (electrodes for the ferret rewiring, MRI for echolocation). They know the rewired auditory cortex is doing vision, etc.
    
    But then why doesn’t universal learning just co-opt some other brain region to perform the task of the damaged one?
    
    It can, and this does happen all the time. Humans can recover from serious brain damage (stroke, injury, etc). It takes time to retrain and reroute circuitry—similar to relearning everything that was lost all over again.
    
    And anyway why is the specialization pattern consistent across individuals and even species? If you train an artificial neural network multiple times on the same dataset
    
    Current ANN’s assume a fixed module layout, so they aren’t really comparable in module-task assignment.
    
    Much of the specialization pattern could just be geography—V1 becomes visual because it is closest to the visual input. A1 becomes auditory because it is closest to the auditory input. etc.
    
    This should be the default hypothesis, but there also could be some element of prior loading, perhaps from pattern generators in the brainstem. (I have read a theory that there is a pattern generator for faces that pretrains the visual cortex a little bit in the womb, so that it starts with a vague primitive face detector).
    
    After all, in a computer you can swap block or pages of memory around and as long as pointers (or page tables) are updated the behavior does not change, up to some performance issues due to cache misses. If the brain worked that way we should expect cortical regions to be allocated to different tasks in a more or less random pattern varying between individuals.
    
    I said the BG is kind-of-like the CPU, the cortex is kind-of-like a big FPGA, but that is an anlogy. The are huge differences between slow bio-circuitry and fast von neumman machines.
    
    Firstly the brain doesn’t really have a concept of ‘swapping memory’. The closest thing to that is retraining, where the hippocampus can train info into the cortex. It’s a slow complex process that is nothing like swapping memory.
    
    Finally the brain is much more optimized at the wiring/latency level. Functionality goes in certain places because that is where it is best for that functionality—it isn’t permutation symmetric in the slightest. Every location has latency/wiring tradeoffs. In a von neumman memory we just abstract that all away. Not in the brain. There is an actual optimal location for every concept/function etc.
    
    a newborn horse is able to walk, run and follow their mother within a few hours from birth.
    
    That is fast for mammals—I know first hand that it can take days for deer. Nonetheless, as we discussed, the brainstem provides a library of innate complex motor circuitry in particular, which various mammals can rely on to varying degrees, depending on how important complex early motor behavior is.
    
    Targetprop is still highly speculative. It has not shown to work well in artificial neural networks and the evidence of biological plausibility is handwavy.
    
    I agree that there is still more work to be done understanding the brain’s learning machinery. Targetprop is useful/exciting in ML, but it isn’t the full picture yet.
    
    [Atari] That’s actually extremely impressive—superhuman learning speed.
    
    Humans get tired after continuously playing for a few hours, but in terms of overall playtime they learn faster.
    
    Not at all. The Atari agent becomes semi-superhuman by day 3 of it’s life. When humans start playing atari, they already have trained vision and motor systems, and Atari is designed for these systems. Even then your statement is wrong—in that I don’t think any children achieve playtester levels of skill in just even a few days.
    What links here?
    The shard theory of human values by Quintin Pope (4 Sep 2022 4:28 UTC; 254 points)
    - V_V 27 Jun 2015 19:31 UTC
      1 point
      Parent
      
      Finally the brain is much more optimized at the wiring/latency level. Functionality goes in certain places because that is where it is best for that functionality—it isn’t permutation symmetric in the slightest. Every location has latency/wiring tradeoffs. In a von neumman memory we just abstract that all away. Not in the brain. There is an actual optimal location for every concept/function etc.
      
      Well, the eyes are at the front of the head, but the optic nerves connect to the brain at the back, and they also cross at the optic chiasm. Axons also cross contralaterally in the spinal cord and if I recall correctly there are various nerves that also don’t take the shortest path.
      This seems to me as evidence that the nervous system is not strongly optimized for latency.
      - jacob_cannell 27 Jun 2015 20:25 UTC
        6 points
        Parent
        This is a total misconception, and it is a good example of the naive engineer fallacy (jumping to the conclusion that a system is poorly designed when you don’t understand how the system actually works and why).
        
        Remember the distributed software modules—including V1 - have components in multiple physical modules (cortex, cerebellum, thalamus, BG). Not every DSM has components in all subsystems, but V1 definitely has a thalamic relay component (VGN).
        
        The thalamus/BG is in the center of the brain, which makes sense from wiring minimization when you understand the DPM system. Low freq/compressed versions of the cortical map computations can interact at higher speeds inside the small compact volume of the BG/thalamus. The BG/thalamus basically contains a microcosm model of the cortex within itself.
        
        The thalamic relay comes first in sequential processing order, so moving cortical V1 closer to the eyes wouldn’t help in the slightest. (Draw this out if it doesn’t make sense)
- advael 25 Jun 2015 21:08 UTC
  2 points
  Parent
  For e.g. the ferret rewiring experiments, tongue based vision, etc., is a plausible alternative hypothesis that there are more general subtypes of regions that aren’t fully specialized but are more interoperable than others?
  
  For example, (Playing devil’s advocate here) I could phrase all of the mentioned experiments as “sensory input remapping” among “sensory input processing modules.” Similarly, much of the work in BCI interfaces for e.g. controlling cursors or prosthetics could be called “motor control remapping”. Have we ever observed cortex being rewired for drastically dissimilar purposes? For example, motor cortex receiving sensory input?
  
  If we can’t do stuff like that, then my assumption would be that at the very least, a lot of the initial configuration is prenatal and follows kind of a “script” that might be determined by either some genome-encoded fractal rule of tissue formation, or similarities in the general conditions present during gestation. Either way, I’m not yet convinced there’s a strong argument that all brain function can be explained as working like a ULM (Even if a lot of it can)
  - jacob_cannell 25 Jun 2015 22:58 UTC
    5 points
    Parent
    
    Have we ever observed cortex being rewired for drastically dissimilar purposes? For example, motor cortex receiving sensory input?
    
    I’m not sure—I have a vague memory of something along those lines but .. nothing specific.
    
    From what I remember, motor, sensor, and association cortex do have some intrinsic differences at the microcircuit level. For example some motor cortex has larger pyramidal cells in the output layer. However, I believe most motor cortex is best described as sensorimotor—it depends heavily on sensor data from the body.
    
    a lot of the initial configuration is prenatal and follows kind of a “script” that might be determined by either some genome-encoded fractal rule
    
    Well yes—there is a general script for the overall architecture, and alot of innate functionality as well, especially in specific regions like the brainstem’s pattern generators. As I said in the article—there is always room for innate functionality in the architectural prior and in specific circuits—the brain is certainly not a pure ULM.
    
    Either way, I’m not yet convinced there’s a strong argument that all brain function can be explained as working like a ULM (Even if a lot of it can)
    
    ULM refers to the overall architecture, with the general learning part specifically implemented by the distributed BG/cortex/cerbellum modules. But the BG and hippocampal system also rely heavily on learning internally, as does the amygdala and .. probably almost all of it to varying degrees. The brainstem is specifically the place where we can point and say—this is mostly innate circuitry, but even it probably has some learning going on.
  - [deleted] 26 Jun 2015 23:52 UTC
    2 points
    Parent
    
    For e.g. the ferret rewiring experiments, tongue based vision, etc., is a plausible alternative hypothesis that there are more general subtypes of regions that aren’t fully specialized but are more interoperable than others?
    
    It’s far more likely that different brain modules implement different learning rules, but all learn, than that they encode innate mental functionality which is not subject to learning at all.
    - advael 27 Jun 2015 1:31 UTC
      2 points
      Parent
      I’m inclined to agree. Actually I’ve been convinced for a while that this is a matter of degrees rather than being fully one way or the other (Modules versus learning rules), and am convinced by this article that the brain is more of a ULM than I had previously thought.
      
      Still, when I read that part the alternative hypothesis sprung to mind, so I was curious what the literature had to say about it (Or the post author.)
- nshepperd 7 Sep 2015 4:32 UTC
  1 point
  Parent
  
  The ferret rewiring experiments, the tongue based vision stuff, the visual regions learning to perform echolocation computations in the blind, this evidence together is decisive against the evolved modularity hypothesis as I’ve defined that hypothesis, at least for the cortex. The EMH posits that the specific cortical regions rely on complex innate circuitry specialized for specific tasks. The evidence disproves that hypothesis.
  
  It seems a little strange to treat this as a triumphant victory for the ULH. At the most, you’ve shown that the “fundamentalist” evolved modularity hypothesis is false. You didn’t really address how the ULH explains this same evidence.
  
  And there are other mysteries in this model, such as the apparent universality of specific cognitive heuristics and biases, or of various behaviours like altruism, deception, sexuality that seems obviously evolved. And, as V_V mentioned, the lateral asymmetry of the brain’s functionality vs the macroscopic symmetry.
  
  Otherwise, the conclusion I would draw from this is that both theories are wrong, or that some halfway combination of them is true (say, “universal” plasticity plus a genetic set of strong priors somehow encoded in the structure).