The HC then has several sub circuits which further compress the mental summary into something like a compact key which is then sent into a hetero-auto-associative memory circuit to find suitable matches.
This use of a symbolic key-store system is one that I think gets downplayed a lot by machine learning/neural network perspectives on how the brain works. I also fear you might be downplaying it a bit as well. In the book Memory and the Computational Brain, Gallistel and King make the pretty convincing argument that a lot of what goes in the brain must be various types of symbolic representation and processing. They show a few examples of types of processing that simply can’t be achieved with the kinds of neural networks used in most “deep learning” approaches.
Agree it is important; I emphasize it in this post and my previous one.
We can differentiate between learning at two levels: circuit level and program/symbolic level. Learning a direct circuit to solve the problem works well when you have lots of time/data, the problem is very important/frequent, and latency/speed is crucial. This is deep learning with standard feedforward and or RNNs, similar to what the cortex & cerebellum does.
There is also program/symbolic level learning, which is important when you have small amounts of data and need to learn fast, and lower speed is acceptable. This involves learning a ‘program’ level solution, which can be much more compact than a circuit level solution for some more complex problems where the minimal depth circuit is too high. It also allows for much faster learning in some cases. The brain implements program/symbolic learning with networks using the PFC for short term memory and the hippocampus for medium/long term memory, coordinated with the cortex & cerbellum by the basal ganglia.
Program/symbolic learning is a new ‘hot’ upcoming area of research in DL: neural turing machines, memory networks, learning program execution, etc. The inductive program learning field has been working on similar stuff, but the new DL approaches have the huge scaling advantages of modern SGD + ANN systems.
I encourage you to read Gallistel’s work, and also the recent experimental work of Hesslow’s group (I can provide references if you want). It runs contradictory to the viewpoint you’re expressing.
Learning a direct circuit to solve the problem works well when you have lots of time/data, the problem is very important/frequent, and latency/speed is crucial.
Gallistel shows that even when these criteria are met some problems are simply infeasible to learn with ‘standard’ perceptron nets.
This is deep learning with standard feedforward and or RNNs, similar to what the cortex & cerebellum does.
Hesslow’s work shows that the cerebellum carries out processing that is far more sophisticated than a standard feedforward network or RNN.
The brain implements program/symbolic learning with networks using the PFC for short term memory and the hippocampus for medium/long term memory
There is no evidence for this, and based on the aforementioned work, there’s no reason to think there ever will be evidence for this.
The inductive program learning field has been working on similar stuff
AI and machine learning work is separate from what I’m saying. Indeed the machine learning people have been coming up with amazing advances and models that learn very well; but there is no evidence that any of this has a direct analogue in the brain. It probably has an indirect analogue, but likely no direct one.
I encourage you to read Gallistel’s work, and also the recent experimental work of Hesslow’s group (I can provide references if you want).
The 2009 book you linked? I browsed the chapter contents. If there is anything new/interesting in there for me, it would only be in a couple of chapters. If there is some specific experimental evidence of importance—please provide links. I’ve browsed Gallistel’s publication list. I don’t see anything popping out as relevant.
Gallistel shows that even when these criteria are met some problems are simply infeasible to learn with ‘standard’ perceptron nets.
Link? Sounds uninteresting and not new. There are limits to what any particular model can learn, and ‘Standard’ perceptron nets aren’t especially interesting. There is a body of work in DL on learning theory, and I doubt that Gallistel has much to contribute there—being in cog sci.
Hesslow’s work shows that the cerebellum carries out processing that is far more sophisticated than a standard feedforward network or RNN.
I’m beyond skeptical—especially given that a standard RNN (or a sufficiently deep feedforward net) is a universal approximator. You are referring to Germund Hesslow? Which publication?
.The brain implements program/symbolic learning with networks using the PFC for short term memory and the hippocampus for medium/long term memory
There is no evidence for this, and based on the aforementioned work, there’s no reason to think there ever will be evidence for this.
Which part? It should be obvious the brain can learn mental programs—and I gave many examples such as reading. The short term memory role of stripes in PFC is well established, although all cortical regions are recurrent and thus have some memory capacity—the PFC appears to be specialized. The BG’s role in controlling the cortex, especially the PFC loops, is pretty well established and I linked to the supporting research in my brain ULM post. The hippocampus’s role in longer term memory—also well established.
I’m really glad that you’re interested in this subject.
I recommend the 2009 book for the argument it presents that a symbolic key-value-store memory seems to be necessary for a lot of what the brains of humans and various other animals do. You say it has ‘nothing new’, so I assume then that you’re already familiar with this argument.
Link? Sounds uninteresting and not new.
I’m beyond skeptical—especially given that a standard RNN (or a sufficiently deep feedforward net) is a universal approximator.
You’re referring to the Cybenko theorem and other theorems, which only establish ‘universality’ for a very narrow definition of ‘universal’. In particular, a feedforward neural net lacks persistent memory. RNNs do not necessarily solve this problem! In many (not all, but the most common) RNN formulations, what exists is simply a form of ‘volatile’ memory that is easily overwritten when new training data emerges. In contrast, experiments involving https://en.wikipedia.org/wiki/Eyeblink_conditioning show that nervous systems store persistent memories. In particular, if you train an individual to respond to a conditioning stimulus, and then later ‘un-train’ the individual, and then attempt to train the individual again, they will learn much faster than the first time. A persistent change to the neural network structure has occurred. There have been various ways of trying to get around this problem of RNNs such as https://en.wikipedia.org/wiki/Long_short_term_memory but they wind up being either incredibly large (the Cybenko theorem does not place a limit on the size of the net) and thus infeasible, or otherwise ineffective.
Why ineffective? Experiments show why. Hesslow’s recent experiment on cerebellar Purkinje cells: http://www.pnas.org/content/111/41/14930.short shows that this mechanism of learning spatiotemporal behavior and storing it persistently can be isolated to a single cell. This is very significant. It shows that not only the perceptron model, but even the Hodgkin-Huxley model is woefully inadequate for describing neural behavior.
The entire argument around the difference between the ‘standard’ neural network way of doing things and the way the brain seems to do things revolves around symbolic processing, as I said. In particular, any explanation of memory must be able to explain its persistence, the fact that symbolic information (numbers, etc.) can be stored and retrieved, and this all occurs persistently. Especially, the property of retrieval is often misunderstood. Retrieval means that, given some ‘key’ or ‘pointer’ to a memory, we can retrieve that memory. Often, network/associative explanations of memory revolve around purely associative memories. That is, memories of the form where if you have part of the memory, the system gives you back the rest of the memory. This is all well and good, but to actually form a general-purpose memory you need to do something somewhat different: be able to recall the memory when all you have is just a pointer to the memory (as is done in the main memory of a computer). This can be implemented in an associative memory but it requires two additional mechanisms: A mechanism to associate a pointer with a memory, and a mechanism to integrate the memory and pointer together in an associative structure. We do not yet know what form such a mechanism takes in the brain.
Gallistel’s other ideas—like using RNA or DNA to store memories—seem dubious and ill-supported by evidence. But he’s generally right about the need for a compact symbolic memory system.
I recommend the 2009 book for the argument it presents that a symbolic key-value-store memory seems to be necessary for a lot of what the brains of humans and various other animals do. You say it has ‘nothing new’, so I assume then that you’re already familiar with this argument.
I actually said:
If there is anything new/interesting in there for me, it would only be in a couple of chapters.
The whole symbolic key-value-store memory is a main key point of my OP and my earlier brain article. “Memory and the computational brain”, from what I can tell, seems to provide a good overview of the recent neuroscience stuff which I covered in my ULM post. I’m not disparaging the book, just saying that it isn’t something that I have time to read at the moment, and most of the material looks familiar.
There have been various ways of trying to get around this problem of RNNs such as https://en.wikipedia.org/wiki/Long_short_term_memory but they wind up being either incredibly large (the Cybenko theorem does not place a limit on the size of the net) and thus infeasible, or otherwise ineffective.
LSTM is already quite powerful, and new variants—such as the recent grid LSTM—continue to expand the range of what can feasibly be learned. In many ways their learning abilities are already beyond the brain (see the parity discussion in the other thread).
That being said, LSTM isn’t everything, and a general AGI will also need a memory-based symbolic system, which can excel especially at rapid learning from few examples—as discussed. Neural turing machines and memory networks and related are now expanding into that frontier. You seem to be making a point that standard RNNs can’t do effective symbolic learning—and I agree. That’s what the new memory based systems are for.
Why ineffective? Experiments show why. Hesslow’s recent experiment on cerebellar Purkinje cells: http://www.pnas.org/content/111/41/14930.short shows that this mechanism of learning spatiotemporal behavior and storing it persistently can be isolated to a single cell. This is very significant.
Ok, I read enough of that paper to get the gist. I don’t think it’s that significant. Assuming their general conclusion is correct and they didn’t make any serious experimental mistakes, all that they have shown is that the neuron itself can learn a simple timing response. The function they learned only requires that the neuron model a single parameter—a t value. We have already known for a while that many neurons feature membrane plasticity and other such mechanisms that effectively function as learnable per-neuron parameters that effect the transfer function. This has been known and even incorporated into some ANNs and found to be somewhat useful. It isn’t world changing. The cell isn’t learning a complex spatiotemporal pattern—such as entire song. It’s just learning a single or a handful of variables.
What do you actually think memories are? Memories are simply reconstructions of a prior state of the system. When you remember something, your brain literally returns at least partially to the neural state of activation that it was in which you originally perceived the event you are remembering.
What do you think the “pointer” or “key” to a memory in the human brain is? Generally, it involves priming. Priming is simply presenting a stimulus that has been associated with the prior state.
The “persistent change” you’re looking for is exactly how artificial neural networks learn. They change the strength of the connections between the neurons.
Symbol processing is completely possible with an associative network system. The symbol is encoded as a particular pattern of neuronal activations. The visual letter “A” is actually a state in the visual cortex when a certain combination of neurons are firing in response to the pattern of brightness contrast signals that rod and cone cells generate when we see an “A”. The sound “A” is similarly encoded and our brain learns to associate the two together. Eventually, there is a higher layer neuron, or pattern of neurons that activate most strongly when we see or hear an “A”, and this “symbol” can then be combined or associated with other symbols to create words or otherwise processed by the brain.
You don’t need some special mechanism. An associative memory can store any memory input pattern completely, assuming it has enough neurons in enough layers to reconstruct most of the possible states of input.
Key or Pointer based memory retrieval can be completely duplicated by just associating the key or pointer to the memory state, such that priming the network with the key or pointer reconstructs the original state.
Key or Pointer based memory retrieval can be completely duplicated by just associating the key or pointer to the memory state, such that priming the network with the key or pointer reconstructs the original state.
Yes this is why I said you can implement general-purpose memory with associative memory. However, you need two additional mechanisms which the naive associative view doesn’t address: You need the ability to create a pointer for a newly-generated memory and to associate this together with the memory. The basic RNN-based associative memory formulation does not have this mechanism, and we have no idea what form this mechanism takes in the brain. Also, you need the ability to work directly on pointers and to store pointers themselves in memory locations which can then be pointed to. However, this is more a processing constraint.
You’re assuming that a Von Neumann Architecture is a more general-purpose memory than an associative memory system, when in fact, it’s the other way around.
To get your pointer-based memory, you just have to construct a pointer as a specific compression or encoding of the memory in the associative network. For instance, you could mentally associate the number 2015 with a series of memories that have occurred in the last six months. In the future, you could then retrieve all memories that have been “hashed” to that number just by being primed with the number.
Remember that even on a computer, a pointer is simply a numerical value that represents the “address” of the particular segment of data that we want to retrieve. In that sense, it is a symbol that connects to and represents some symbols, not unlike a variable or function.
We can model this easily in an associative memory without any additional mechanisms, simply by having a multi-layer model that can combine and abstract different features of the input space into what are essentially symbols or abstract representations.
Von Neumann Architecture digital computers are nothing more than physical symbol processing systems. Which is to say that it is just one of many possible implementations of Turing Machines. According to Hava Siegelmann, a recurrent neural network with real precision weights would be, theoretically speaking, a Super Turing Machine.
If that isn’t enough, there are already models called Neural Turing Machines that combine recurrent neural networks with the Von Neumann memory model to create networks that can directly interface with pointer-based memory.
To get your pointer-based memory, you just have to construct a pointer as a specific compression or encoding of the memory in the associative network.
Again, that’s what I’m saying. How do you get from a memory to a pointer? We do not yet know how the brain does this. We have models that can do this, but very little experimental data. We of course know that it’s possible, we just don’t know the form this mechanism takes in the brain.
You’re assuming that a Von Neumann Architecture is a more general-purpose memory than an associative memory system, when in fact, it’s the other way around.
I’m assuming nothing of the sort. I’m not talking about which kind of memory is more general purpose (and, really, you have to take into account memory plus processing to be able to talk about generality in this sense). I’m talking about what the brain does. The usual ‘associative memory’ view says that all we have is an associative/content-addressable memory system. That’s fine, but it’s like saying the brain is made up of neurons. It lacks descriptive power. I want to know the specifics of how memory formation and recall happens, not hand-waving. Theoretical descriptions can help, but without experimental evidence they are of limited utility in understanding the brain.
That’s why the Hesslow experiment is so intriguing: It is actual experimental evidence that clearly illustrates what a single neuron is capable of learning and shows that even when it comes to such a drastically reduced and simplified system, our understanding is still very limited.
According to Hava Siegelmann, a recurrent neural network with real precision weights would be, theoretically speaking, a Super Turing Machine.
This is irrelevant as real precision weights are physically impossible.
This use of a symbolic key-store system is one that I think gets downplayed a lot by machine learning/neural network perspectives on how the brain works. I also fear you might be downplaying it a bit as well. In the book Memory and the Computational Brain, Gallistel and King make the pretty convincing argument that a lot of what goes in the brain must be various types of symbolic representation and processing. They show a few examples of types of processing that simply can’t be achieved with the kinds of neural networks used in most “deep learning” approaches.
Agree it is important; I emphasize it in this post and my previous one.
We can differentiate between learning at two levels: circuit level and program/symbolic level. Learning a direct circuit to solve the problem works well when you have lots of time/data, the problem is very important/frequent, and latency/speed is crucial. This is deep learning with standard feedforward and or RNNs, similar to what the cortex & cerebellum does.
There is also program/symbolic level learning, which is important when you have small amounts of data and need to learn fast, and lower speed is acceptable. This involves learning a ‘program’ level solution, which can be much more compact than a circuit level solution for some more complex problems where the minimal depth circuit is too high. It also allows for much faster learning in some cases. The brain implements program/symbolic learning with networks using the PFC for short term memory and the hippocampus for medium/long term memory, coordinated with the cortex & cerbellum by the basal ganglia.
Program/symbolic learning is a new ‘hot’ upcoming area of research in DL: neural turing machines, memory networks, learning program execution, etc. The inductive program learning field has been working on similar stuff, but the new DL approaches have the huge scaling advantages of modern SGD + ANN systems.
I encourage you to read Gallistel’s work, and also the recent experimental work of Hesslow’s group (I can provide references if you want). It runs contradictory to the viewpoint you’re expressing.
Gallistel shows that even when these criteria are met some problems are simply infeasible to learn with ‘standard’ perceptron nets.
Hesslow’s work shows that the cerebellum carries out processing that is far more sophisticated than a standard feedforward network or RNN.
There is no evidence for this, and based on the aforementioned work, there’s no reason to think there ever will be evidence for this.
AI and machine learning work is separate from what I’m saying. Indeed the machine learning people have been coming up with amazing advances and models that learn very well; but there is no evidence that any of this has a direct analogue in the brain. It probably has an indirect analogue, but likely no direct one.
The 2009 book you linked? I browsed the chapter contents. If there is anything new/interesting in there for me, it would only be in a couple of chapters. If there is some specific experimental evidence of importance—please provide links. I’ve browsed Gallistel’s publication list. I don’t see anything popping out as relevant.
Link? Sounds uninteresting and not new. There are limits to what any particular model can learn, and ‘Standard’ perceptron nets aren’t especially interesting. There is a body of work in DL on learning theory, and I doubt that Gallistel has much to contribute there—being in cog sci.
I’m beyond skeptical—especially given that a standard RNN (or a sufficiently deep feedforward net) is a universal approximator. You are referring to Germund Hesslow? Which publication?
Which part? It should be obvious the brain can learn mental programs—and I gave many examples such as reading. The short term memory role of stripes in PFC is well established, although all cortical regions are recurrent and thus have some memory capacity—the PFC appears to be specialized. The BG’s role in controlling the cortex, especially the PFC loops, is pretty well established and I linked to the supporting research in my brain ULM post. The hippocampus’s role in longer term memory—also well established.
I’m really glad that you’re interested in this subject.
I recommend the 2009 book for the argument it presents that a symbolic key-value-store memory seems to be necessary for a lot of what the brains of humans and various other animals do. You say it has ‘nothing new’, so I assume then that you’re already familiar with this argument.
You’re referring to the Cybenko theorem and other theorems, which only establish ‘universality’ for a very narrow definition of ‘universal’. In particular, a feedforward neural net lacks persistent memory. RNNs do not necessarily solve this problem! In many (not all, but the most common) RNN formulations, what exists is simply a form of ‘volatile’ memory that is easily overwritten when new training data emerges. In contrast, experiments involving https://en.wikipedia.org/wiki/Eyeblink_conditioning show that nervous systems store persistent memories. In particular, if you train an individual to respond to a conditioning stimulus, and then later ‘un-train’ the individual, and then attempt to train the individual again, they will learn much faster than the first time. A persistent change to the neural network structure has occurred. There have been various ways of trying to get around this problem of RNNs such as https://en.wikipedia.org/wiki/Long_short_term_memory but they wind up being either incredibly large (the Cybenko theorem does not place a limit on the size of the net) and thus infeasible, or otherwise ineffective.
Why ineffective? Experiments show why. Hesslow’s recent experiment on cerebellar Purkinje cells: http://www.pnas.org/content/111/41/14930.short shows that this mechanism of learning spatiotemporal behavior and storing it persistently can be isolated to a single cell. This is very significant. It shows that not only the perceptron model, but even the Hodgkin-Huxley model is woefully inadequate for describing neural behavior.
The entire argument around the difference between the ‘standard’ neural network way of doing things and the way the brain seems to do things revolves around symbolic processing, as I said. In particular, any explanation of memory must be able to explain its persistence, the fact that symbolic information (numbers, etc.) can be stored and retrieved, and this all occurs persistently. Especially, the property of retrieval is often misunderstood. Retrieval means that, given some ‘key’ or ‘pointer’ to a memory, we can retrieve that memory. Often, network/associative explanations of memory revolve around purely associative memories. That is, memories of the form where if you have part of the memory, the system gives you back the rest of the memory. This is all well and good, but to actually form a general-purpose memory you need to do something somewhat different: be able to recall the memory when all you have is just a pointer to the memory (as is done in the main memory of a computer). This can be implemented in an associative memory but it requires two additional mechanisms: A mechanism to associate a pointer with a memory, and a mechanism to integrate the memory and pointer together in an associative structure. We do not yet know what form such a mechanism takes in the brain.
Gallistel’s other ideas—like using RNA or DNA to store memories—seem dubious and ill-supported by evidence. But he’s generally right about the need for a compact symbolic memory system.
I actually said:
The whole symbolic key-value-store memory is a main key point of my OP and my earlier brain article. “Memory and the computational brain”, from what I can tell, seems to provide a good overview of the recent neuroscience stuff which I covered in my ULM post. I’m not disparaging the book, just saying that it isn’t something that I have time to read at the moment, and most of the material looks familiar.
LSTM is already quite powerful, and new variants—such as the recent grid LSTM—continue to expand the range of what can feasibly be learned. In many ways their learning abilities are already beyond the brain (see the parity discussion in the other thread).
That being said, LSTM isn’t everything, and a general AGI will also need a memory-based symbolic system, which can excel especially at rapid learning from few examples—as discussed. Neural turing machines and memory networks and related are now expanding into that frontier. You seem to be making a point that standard RNNs can’t do effective symbolic learning—and I agree. That’s what the new memory based systems are for.
Ok, I read enough of that paper to get the gist. I don’t think it’s that significant. Assuming their general conclusion is correct and they didn’t make any serious experimental mistakes, all that they have shown is that the neuron itself can learn a simple timing response. The function they learned only requires that the neuron model a single parameter—a t value. We have already known for a while that many neurons feature membrane plasticity and other such mechanisms that effectively function as learnable per-neuron parameters that effect the transfer function. This has been known and even incorporated into some ANNs and found to be somewhat useful. It isn’t world changing. The cell isn’t learning a complex spatiotemporal pattern—such as entire song. It’s just learning a single or a handful of variables.
What do you actually think memories are? Memories are simply reconstructions of a prior state of the system. When you remember something, your brain literally returns at least partially to the neural state of activation that it was in which you originally perceived the event you are remembering.
What do you think the “pointer” or “key” to a memory in the human brain is? Generally, it involves priming. Priming is simply presenting a stimulus that has been associated with the prior state.
The “persistent change” you’re looking for is exactly how artificial neural networks learn. They change the strength of the connections between the neurons.
Symbol processing is completely possible with an associative network system. The symbol is encoded as a particular pattern of neuronal activations. The visual letter “A” is actually a state in the visual cortex when a certain combination of neurons are firing in response to the pattern of brightness contrast signals that rod and cone cells generate when we see an “A”. The sound “A” is similarly encoded and our brain learns to associate the two together. Eventually, there is a higher layer neuron, or pattern of neurons that activate most strongly when we see or hear an “A”, and this “symbol” can then be combined or associated with other symbols to create words or otherwise processed by the brain.
You don’t need some special mechanism. An associative memory can store any memory input pattern completely, assuming it has enough neurons in enough layers to reconstruct most of the possible states of input.
Key or Pointer based memory retrieval can be completely duplicated by just associating the key or pointer to the memory state, such that priming the network with the key or pointer reconstructs the original state.
Yes this is why I said you can implement general-purpose memory with associative memory. However, you need two additional mechanisms which the naive associative view doesn’t address: You need the ability to create a pointer for a newly-generated memory and to associate this together with the memory. The basic RNN-based associative memory formulation does not have this mechanism, and we have no idea what form this mechanism takes in the brain. Also, you need the ability to work directly on pointers and to store pointers themselves in memory locations which can then be pointed to. However, this is more a processing constraint.
You’re assuming that a Von Neumann Architecture is a more general-purpose memory than an associative memory system, when in fact, it’s the other way around.
To get your pointer-based memory, you just have to construct a pointer as a specific compression or encoding of the memory in the associative network. For instance, you could mentally associate the number 2015 with a series of memories that have occurred in the last six months. In the future, you could then retrieve all memories that have been “hashed” to that number just by being primed with the number.
Remember that even on a computer, a pointer is simply a numerical value that represents the “address” of the particular segment of data that we want to retrieve. In that sense, it is a symbol that connects to and represents some symbols, not unlike a variable or function.
We can model this easily in an associative memory without any additional mechanisms, simply by having a multi-layer model that can combine and abstract different features of the input space into what are essentially symbols or abstract representations.
Von Neumann Architecture digital computers are nothing more than physical symbol processing systems. Which is to say that it is just one of many possible implementations of Turing Machines. According to Hava Siegelmann, a recurrent neural network with real precision weights would be, theoretically speaking, a Super Turing Machine.
If that isn’t enough, there are already models called Neural Turing Machines that combine recurrent neural networks with the Von Neumann memory model to create networks that can directly interface with pointer-based memory.
Again, that’s what I’m saying. How do you get from a memory to a pointer? We do not yet know how the brain does this. We have models that can do this, but very little experimental data. We of course know that it’s possible, we just don’t know the form this mechanism takes in the brain.
I’m assuming nothing of the sort. I’m not talking about which kind of memory is more general purpose (and, really, you have to take into account memory plus processing to be able to talk about generality in this sense). I’m talking about what the brain does. The usual ‘associative memory’ view says that all we have is an associative/content-addressable memory system. That’s fine, but it’s like saying the brain is made up of neurons. It lacks descriptive power. I want to know the specifics of how memory formation and recall happens, not hand-waving. Theoretical descriptions can help, but without experimental evidence they are of limited utility in understanding the brain.
That’s why the Hesslow experiment is so intriguing: It is actual experimental evidence that clearly illustrates what a single neuron is capable of learning and shows that even when it comes to such a drastically reduced and simplified system, our understanding is still very limited.
This is irrelevant as real precision weights are physically impossible.