Steven Byrnes comments on [Intro to brain-like-AGI safety] 2. “Learning from scratch” in the brain

Steven Byrnes 4 Feb 2022 14:41 UTC
LW: 3 AF: 1
AF
Thanks! I’m not sure we have much disagreement here. Some relevant issues are:
- Memory ≠ Unstructured memory (and likewise, locally-random ≠ globally-random): There’s certainly a neural architecture, with certain types of connections between certain macroscopic regions.
- “just” a memory system + learning algorithm—with a dismissive tone of voice on the “just”: Maybe you didn’t mean it that way, but for the record, I would suggest that to the extent that you feel wowed by something the neocortex does, I claim you should copy that feeling, and feel equally wowed by “what learning-from-scratch algorithms are capable of”. The things that ML people are doing right now are but a tiny corner of a vast universe of possible learning-from-scratch algorithms.
- Generative models can be learned from scratch—obviously, e.g. StyleGAN. I imagine you agree, but you mentioned “generative”, so I’m just throwing this in.
- Dynamics is not unrelated to neural architecture: For example, there’s a kinda silly sense in which GPT-3 involves a “wave of activity” that goes from layer 1 through layers 2, 3, 4, …, ending in layer 96. I’m not saying anything profound here—it’s just that GPT-3 happens to be feedforward, not recurrent. But anyway, if you think it’s important that the neocortex has a bias for waves that travel in certain directions, I’d claim that such a bias can likewise be built out of a (not-perfectly-recurrent) neural architecture in the brain.
- Lottery ticket hypothesis: I was recently reading Brain From The Inside Out by Buzsáki, and Chapter 13 had a discussion which to my ears sounded like the author was proposing that we should brain learning as like the lottery ticket hypothesis. E.g. “Rich brain dynamics can emerge from the genetically programmed wiring circuits and biophysical properties of neurons during development even without any sensory input or experience.” Then “experience [is] a matching process” between data and this “preexisting dictionary of nonsense words combined with internally generated syntactical rules”—i.e., “preexisting neuronal patterns that coincide with the attended unexpected event are marked as important”. That all sounds to me like the lottery ticket hypothesis. And I have no argument there. What I’m not so sure about is who or what he was arguing against. Obviously the lottery ticket hypothesis is compatible with learning-from-scratch with locally-random initialization, indeed that’s the context in which that term is universally used. Maybe Buzsáki is arguing against globally-random, unstructured networks, or something? (Does anyone believe that? I dunno.) Anyway, that’s Buzsáki not you, but I bring it up because I felt like I was maybe getting similar vibes from your comment.
- Jon Garcia 5 Feb 2022 5:08 UTC
  6 points
  AF Parent
  
  Memory ≠ Unstructured memory (and likewise, locally-random ≠ globally-random): [...]
  
  Agreed. I didn’t mean to imply that you thought otherwise.
  
  “just” a memory system + learning algorithm—with a dismissive tone of voice on the “just”: [...]
  
  I apologize for how that came across. I had no intention of being dismissive. When I respond to a post or comment, I typically try to frame what I say for a typical reader as much for the original poster. In this case, I had a sense that a typical reader could get the wrong impression about how the neocortex does what it does if the only sorts of memory systems and learning algorithms that came to mind were things like a blank computer drive and stochastic gradient descent on a feed-forward neural network.
  
  You are absolutely right that the neocortex is equipped to learn from scratch, starting out generating garbage and gradually learning to make sense of the world/body/other-brain-regions/etc., which can legitimately be described as a memory system + learning algorithm. I just wanted anyone reading to appreciate that, at least in biological brains, there is no clean separation between learning algorithm and memory, but that the neocortex’s role as a hierarchical, dynamic, generative simulator is precisely what makes learning from scratch so efficient, since it only has to correlate its intrinsic dynamics with the statistically similar dynamics of learned experience.
  
  I’m sure that there are vastly more ways of implementing learning-from-scratch, maybe some much better ways in fact, and I realize that the exact implementation is probably not relevant to the arguments you plan to make in this sequence. I just feel that a basic understanding of what a real learning-from-scratch system looks like could help drive intuitions of what is possible.
  
  Generative models can be learned from scratch [...]
  
  Indeed, but of course including their own particular structural priors.
  
  Dynamics is not unrelated to neural architecture: [...]
  
  Well, what is a recurrent neural network after all but an arbitrarily deep feed-forward neural network with shared weights across layers? My comment on cortical waves was just to point out a clever way that the brain learns to organize its cortical maps and primes them to expect causality to operate (mostly) locally in space and time. For example, orientation columns in V1 may be adjacent to each other because similarly oriented edges (from moving objects) were consistently presented to the same part of the visual field close in time, such that traveling waves of excitation would teach pyramidal cell A to learn orientation A at time A and then teach neuron B to learn orientation B at time B.
  
  Lottery ticket hypothesis: [...]
  
  “Lottery tickets” (i.e., subnetworks with random initializations that just so happen to give them the right inductive bias to generalize well from the training data for a particular task) probably occur in the brain as much as in current deep learning architectures. However, the issue in DL is that the rest of the network often fails to contribute much to test performance beyond what the lottery ticket subnetwork was able to achieve, as though there was a chasm in model space that the other subnetworks were unable to cross to reach a solution. Evolution seems to have found a way around this problem, at least by the time the neocortex came along, in the sense that the brain seems adept at giving every subnetwork a useful job.
  
  Again, I think the nature of the neocortex as a generative simulator is what makes this feasible with sparse training data. The structure and dynamics of the neocortex have enough in common statistically with the structure and dynamics of the natural world that it is easy for it to align with experience. In contrast, the structure and (lack of) dynamics of current DL systems makes them more brittle when trying to make sense of the natural world.
  
  All that being said, I realize that my comments might be taking things down a rabbit hole. But I appreciate your feedback and look forward to seeing your perspective fleshed out more in the rest of this sequence.