jacob_cannell comments on The Brain as a Universal Learning Machine

jacob_cannell 2 Jul 2015 0:09 UTC
6 points

How does this “seed” find the correct high-level sensory features to plug into? How can it wire complex high-level behavioral programs (such as courtship behaviors) to low-level motor programs learned by unsupervised learning?

This particular idea is not well developed yet in my mind, and I haven’t really even searched the literature yet. So keep that in mind.

Leave courtship aside, let us focus on attraction—specifically evolution needs to encode detectors which can reliably identify high quality mates of the opposite sex apart from all kinds of other objects. The problem is that a good high quality face recognizer is too complex to specify in the genome—it requires many billions of synapses, so it needs to be learned. However, the genome can encode an initial crappy face detector. It can also encode scent/pheromone detectors, and it can encode general ‘complexity’ and or symmetry detectors that sit on top, so even if it doesn’t initially know what it is seeing, it can tell when something is about yeh complex/symmetric/interesting. It can encode the equivalent of : if you see an interesting face sized object which appears for many minutes at a time and moves at this speed, and you hear complex speech like sounds, and smell human scents, it’s probably a human face.

Then the problem is reduced in scope. The cortical map will grow a good face/person model/detector on it’s own, and then after this model is ready certain hormones in adolescence activate innate routines that learn where the face/person model patch is and help other modules plug into it. This whole process can also be improved by the use of a weak top down prior described above.

That being said, some systems—such as Atari’s DRL agent—can be considered simple early versions of ULMs.

Not so fast.

Actually on consideration I think you are right and I did get ahead of myself there. The Atari agent doesn’t really have a general memory subsystem. It has an episode replay system, but not general memory. Deepmind is working on general memory—they have the NTM paper and what not, but the Atari agent came before that.

I largely agree with your assessment of the Atari DRL agent.

Despite the name, no machine learning system, “deep” or otherwise, has been demonstrated to be able to efficiently learn any provably deep function (in the sense of boolean circuit depth-complexity), such as the parity function which any human of average intelligence could learn from a small number of examples.

I highly doubt that—but it all depends on what your sampling class for ‘human’ is. An average human drawn from the roughly 10 billion alive today? Or an average human drawn from the roughly 100 billion who have ever lived? (most of which would have no idea what a parity function is).

When you imagine a human learning the parity function from a small number of examples, what you really imagine is a human who has already learned the parity function, and thus internally has ‘parity function’ as one of perhaps a thousand types of functions they have learned, such that if you give them some data, it is one of the obvious things they may try.

Training a machine on a parity data set from scratch and expecting it to learn the parity function is equivalent to it inventing the parity function—and perhaps inventing mathematics as well. It should be compared to raising an infant without any knowledge of mathematics or anything related, and then training them on the raw data.
- V_V 2 Jul 2015 14:14 UTC
  4 points
  Parent
  
  However, the genome can encode an initial crappy face detector.
  
  It’s not that crappy given that newborns can not only recognize faces with significant accuracy, but also recognize facial expressions.
  
  The cortical map will grow a good face/person model/detector on it’s own, and then after this model is ready certain hormones in adolescence activate innate routines that learn where the face/person model patch is and help other modules plug into it.
  
  Having two separate face recognition modules, one genetically specified and another learned seems redundant, and still it’s not obvious to me how a genetically-specified sexual attraction program could find how to plug into a completely learned system, which would necessarily have some degree of randomness.
  
  It seems more likely that there is a single face recognition module which is genetically specified and then it becomes fine tuned by learning.
  
  I highly doubt that—but it all depends on what your sampling class for ‘human’ is. An average human drawn from the roughly 10 billion alive today? Or an average human drawn from the roughly 100 billion who have ever lived? (most of which would have no idea what a parity function is).
  
  Show a neolithic human a bunch of pebbles, some black and some white, laid out in a line. Ask them to add a black or white pebble to the line, and reward them if the number of black pebbles is even. Repeat multiple times.
  
  Even without a concept of “even number”, wouldn’t this neolithic human be able to figure out an algorithm to compute the right answer? They just need to scan the line, flipping a mental switch for each black pebble they encounter, and then add a black pebble if and only if the switch is not in the initial position.
  
  Maybe I’m overgeneralizing, but it seems unlikely to me that people able to invent complex hunting strategies, to build weapons, tools, traps, clothing, huts, to participate in tribe politics, etc. wouldn’t be able to figure something like that.
  - jacob_cannell 2 Jul 2015 19:07 UTC
    2 points
    Parent
    
    It’s not that crappy given that newborns can not only recognize faces with significant accuracy, but also recognize facial expressions.
    
    Do you have a link to that? ‘Newborn’ can mean many things—the visual system starts learning from the second the eyes open, and perhaps even before that through pattern generators projected onto the retina which help to ‘pretrain’ the viscortex.
    
    I know that infants have initial face detectors from the second they open their eyes, but from what I remember reading—they are pretty crappy indeed, and initially can’t tell a human face apart from a simple cartoon with 3 blobs for eyes and mouth.
    
    It seems more likely that there is a single face recognition module which is genetically specified and then it becomes fine tuned by learning.
    
    Except that it isn’t that simple, because—amongst other evidence—congenitally blind people still learn a model and recognizer for attractive people, and can discern someone’s relative beauty by scanning faces with their fingertips.
    
    Even without a concept of “even number”, wouldn’t this neolithic human be able to figure out an algorithm to compute the right answer?
    
    Not sure—we are getting into hypothetical scenarios here. Your visual version, with black and white pebbles laid out in a line, implicitly helps simplify the problem and may guide the priors in the right way. I am reasonably sure that this setup would also help any brain-like AGI.
  - Good_Burning_Plastic 3 Jul 2015 13:09 UTC
    0 points
    Parent
    
    Even without a concept of “even number”, wouldn’t this neolithic human be able to figure out an algorithm to compute the right answer? They just need to scan the line, flipping a mental switch for each black pebble they encounter, and then add a black pebble if and only if the switch is not in the initial position.
    
    Well, given how hard it is for Haitians to understand numerical sorting...
    - V_V 3 Jul 2015 15:12 UTC
      0 points
      Parent
      If I understand correctly, in the post you linked Scott is saying that Haitians are functionally innumerate, which should explain the difficulties with numerical sorting.
      
      My point is that the partity function should be learnable even without basic numeracy, although I admit that perhaps I’m overgeneralizing.
      
      Anyway, modern machine learning systems can learn to perform basic arithmentic such as addition and subtraction, and I think even sorting (since they are used for preordering for statstical machine translation), hence the problem doesn’t seem to be a lack of arithmetic knowledge or skill.
      
      Note that both addition and subtraction have constant circuit depth (they are in AC0) while parity has logarithmic circuit depth.