Today’s neural networks definitely have problems solving more ‘structured’ problems, but I don’t think that ‘neural nets can’t learn long time-series data’ is a good way of framing this. To go through your examples:
This shouldn’t have been a major issue, except that with each switch it discarded past observations. Had the car maintained this history it would have seen that some sort of large object was progressing across the street on a collision course, and had plenty of time to stop.
From a brief reading of the report, this sounds like this control logic is part of the system surrounding the neural network, not the network itself.
One network predicts the odds of winning and another network figures out which move to perform. This turns a time-series problem (what strategy to perform) into a two separate stateless[1] problems.
I don’t see how you think this is ‘stateless’. AlphaStar’s architecture contains an LSTM(‘Core’) which is then fed into the value and move networks, similar to most time series applications of neural networks.
Most conspicuously, human beings know how to build walls with buildings. This requires a sequence of steps that don’t generate a useful result until the last of them are completed. A wall is useless until the last building is put into place. AlphaStar (the red player in the image below) does not know how to build walls.
But the network does learn how to build its economy, which also doesn’t pay off for a very long time. I think the issue here is more about a lack of ‘reasoning’ skills than time-scales: the network can’t think conceptually, and so doesn’t know that a wall needs to completely block off an area to be useful. It just learns a set of associations.
ML can generate classical music just fine but can’t figure out the chorus/verse system used in rock & roll.
MustNet was trained from scratch on MIDI data, but it’s still able to generate music with lots of structure on both short and long time scales. GPT2 does the same for text. I’m not sure if MuseNet is able to generate chorus/verse structures in particular, but again this seems more like an issue of lack of logic/concepts than time scales(that is, MuseNet can make pieces that ‘sound right’ but has no conceptual understanding of their structure)
I’ll note that AlphaStar, GPT2, and MuseNet all use the Transformer architecture, which seems quite effective for structured time-series data. I think this is because its attentional mechanism lets it zoom in on the relevant parts of past experiences.
I also don’t see how connectome-specific-waves are supposed to help. I think(?) your suggestion is to store slow-changing data in the largest eigenvectors of the Laplacian—but why would this be an improvement? It’s already the case(by the nature of the matrix) that the largest eigenvectors of e.g. an RNN’s transition matrix will tend to store data for longer time periods.
Thank you for the correction. AlphaStar is not completely stateless (even ignoring fog-of-war-related issues).
I think the issue here is more about a lack of ‘reasoning’ skills than time-scales: the network can’t think conceptually...
This is exactly what I mean. The problem I’m trying to elucidate is that today’s ML techniques can’t create good conceptual bridges from short time-scale data to long time-scale data (and vice-versa). In other words, that they cannot generalize concepts from one time scale to another. If we want to take ML to the next level then we’ll have to build a system that can. We may disagree about how to best phrase this but I think we’re on the same page concerning the capabilities of today’s ML systems.
As for connectome-specific harmonic waves, yes, my suggestion is to store slow-changing data in the largest eigenvectors of the Laplacian. The problem with LSTM (and similar RNN systems) is that there’s a combinatorial explosion[1] when you try to backpropagate their state cells. This is the computational cliff I mentioned in the article.
The human brain has no known mechanism for conventional backpropagation in the style of artificial neural networks. I believe no such mechanism exists. I hypothesize instead that the human brain doesn’t run into the aforementioned computational cliff because there’s no physical mechanism to hit that cliff.
So if the human brain doesn’t use backpropagation then what does it use? I think a combination of Laplacian eigenvectors and predictive modeling. If everything so far is true then this sidesteps the RNN computational cliff. I think it uses something involving resonance[2] between state networks instead, but we can reach this conclusion without knowing how the human brain works.
This is promising for a two related reasons: one involving power and the other involving trainability.
Concerning power: I think resonance could provide a conceptual bridge between shorter time-scales to longer time-scales. This solves the problem of fractal organization in the time domain and provides a computational mechanism for forming logic/concepts and then integrating them with larger/smaller parts of the internal conceptual architecture.
Concerning trainability: You don’t have to backpropagate when training the human brain (because you can’t). If CSHW and predictive modeling is how the human brain gradient ascends then this could completely sidestep the aforementioned computational cliff involved in training RNNs. Such a machine would require a hyperlinearly smaller quantity of training data to solve complex problems.
I think these two ideas work together; the human brain sidesteps the computational cliff because it uses concepts (eigenvectors) in place of raw low-level associations.
I mean that the necessary quantity of training data explodes, not that it’s hard to calculate the backpropagated connection weights for a single training datum.
Today’s neural networks definitely have problems solving more ‘structured’ problems, but I don’t think that ‘neural nets can’t learn long time-series data’ is a good way of framing this. To go through your examples:
From a brief reading of the report, this sounds like this control logic is part of the system surrounding the neural network, not the network itself.
I don’t see how you think this is ‘stateless’. AlphaStar’s architecture contains an LSTM(‘Core’) which is then fed into the value and move networks, similar to most time series applications of neural networks.
But the network does learn how to build its economy, which also doesn’t pay off for a very long time. I think the issue here is more about a lack of ‘reasoning’ skills than time-scales: the network can’t think conceptually, and so doesn’t know that a wall needs to completely block off an area to be useful. It just learns a set of associations.
MustNet was trained from scratch on MIDI data, but it’s still able to generate music with lots of structure on both short and long time scales. GPT2 does the same for text. I’m not sure if MuseNet is able to generate chorus/verse structures in particular, but again this seems more like an issue of lack of logic/concepts than time scales(that is, MuseNet can make pieces that ‘sound right’ but has no conceptual understanding of their structure)
I’ll note that AlphaStar, GPT2, and MuseNet all use the Transformer architecture, which seems quite effective for structured time-series data. I think this is because its attentional mechanism lets it zoom in on the relevant parts of past experiences.
I also don’t see how connectome-specific-waves are supposed to help. I think(?) your suggestion is to store slow-changing data in the largest eigenvectors of the Laplacian—but why would this be an improvement? It’s already the case(by the nature of the matrix) that the largest eigenvectors of e.g. an RNN’s transition matrix will tend to store data for longer time periods.
Thank you for the correction. AlphaStar is not completely stateless (even ignoring fog-of-war-related issues).
This is exactly what I mean. The problem I’m trying to elucidate is that today’s ML techniques can’t create good conceptual bridges from short time-scale data to long time-scale data (and vice-versa). In other words, that they cannot generalize concepts from one time scale to another. If we want to take ML to the next level then we’ll have to build a system that can. We may disagree about how to best phrase this but I think we’re on the same page concerning the capabilities of today’s ML systems.
As for connectome-specific harmonic waves, yes, my suggestion is to store slow-changing data in the largest eigenvectors of the Laplacian. The problem with LSTM (and similar RNN systems) is that there’s a combinatorial explosion[1] when you try to backpropagate their state cells. This is the computational cliff I mentioned in the article.
The human brain has no known mechanism for conventional backpropagation in the style of artificial neural networks. I believe no such mechanism exists. I hypothesize instead that the human brain doesn’t run into the aforementioned computational cliff because there’s no physical mechanism to hit that cliff.
So if the human brain doesn’t use backpropagation then what does it use? I think a combination of Laplacian eigenvectors and predictive modeling. If everything so far is true then this sidesteps the RNN computational cliff. I think it uses something involving resonance[2] between state networks instead, but we can reach this conclusion without knowing how the human brain works.
This is promising for a two related reasons: one involving power and the other involving trainability.
Concerning power: I think resonance could provide a conceptual bridge between shorter time-scales to longer time-scales. This solves the problem of fractal organization in the time domain and provides a computational mechanism for forming logic/concepts and then integrating them with larger/smaller parts of the internal conceptual architecture.
Concerning trainability: You don’t have to backpropagate when training the human brain (because you can’t). If CSHW and predictive modeling is how the human brain gradient ascends then this could completely sidestep the aforementioned computational cliff involved in training RNNs. Such a machine would require a hyperlinearly smaller quantity of training data to solve complex problems.
I think these two ideas work together; the human brain sidesteps the computational cliff because it uses concepts (eigenvectors) in place of raw low-level associations.
I mean that the necessary quantity of training data explodes, not that it’s hard to calculate the backpropagated connection weights for a single training datum.
Two state networks in resonance automatically exchange information and vice-versa.