Yes, I’ve read your big universal learner post, and I’m not convinced.
Do you actually believe that evolved modularity is a better explanation of the brain then the ULM hypothesis? Do you have evidence for this belief or is it simply that which you want to be true? Do you understand why the computational neuroscience and machine learning folks are moving away from the latter towards the former? If you do have evidence please provide it in a critique in the comments for that post where I will respond.
First off, you’re seriously misrepresenting the success of deep learning as support for your thesis. Deep learning algorithms are extremely powerful, and probably have a role to play in building AGI, but they aren’t the end-all, be-all of AI research.
Make some specific predictions for the next 5 years about deep learning or ANNs. Let us see if we actually have significant differences of opinion. If so I expect to dominate you in any prediction market or bets concerning the near term future of AI.
First off the bat, you absolutely can create an AGI that is a pure ANN. In fact the most successful early precursor AGI we have—the atari deepmind agent—is a pure ANN. Your claim that ANNs/Deep Learning is not the end of all AGI research is quickly becoming a minority position.
Humans can learn echolocation, but they can’t learn echolocation the way bats and dolphins can learn echolocation
Notably, the general learner hypothesis does not explain why non-surgically-modified brains are so standardized in structure and functional layout. Something that you yourself bring up in your article.
I discussed this in the comments—it absolutely does explain neurotypical standardization. It’s a result of topographic/geometric wiring optimization. There is an exactly optimal location for every piece of functionality, and the brain tends to find those same optimal locations in each human. But if you significantly perturb the input sense or the brain geometry, you can get radically different results.
Consider the case of extremehydrocephaly—where fluid fills in the center of the brain and replaces most of the brain and squeezes the remainder out to a thin surface near the skull. And yet, these patients can have above average IQs. Optimal dynamic wiring can explain this—the brain is constantly doing global optimization across the wiring structure, adapting to even extreme deformations and damage. How does evolved modularity explain this?
It also obviously has hard-coded specialized modules, to some degree, which is why (for example) all human cultures develop language and music, which isn’t something you’d expect if we were all starting from zero.
This is nonsense—language processing develops in general purpose cortical modules, there is no specific language circuitry.
There is a small amount of innate circuit structures—mainly in the brainstem, which can generate innate algorithms especially for walking behavior.
The question is which aspect dominates brain performance.
This is rather obvious—it depends on the ratio of pure learning structures (cortex, hippocampus, cerebellum) to innate circuit structures (brainstem, some midbrain, etc). In humans 95% or more of the circuitry is general purpose learning.
What about Watson?
Not an AGI.
Finally, I don’t have the background to refute your argument on the efficiency of the brain (although I know clever people who do who disagree with you).
The correct thing to do here is update. Instead you are searching for ways in which you can ignore the evidence.
But, taking it as a given that you’re right, it sounds like you’re assuming all future AIs will draw the same amount of power as a real brain and fit in the same spatial footprint.
Obviously not—in theory given a power budget you can split it up into N AGIs or one big AGI. In practice due to parallel scaling limitations, there is always some optimal N. Even on a single GPU today, you need N about 100 or more to get good performance.
You can’t just invest all your energy into one big AGI and expect better performance—that is a mind numbingly naive strategy.
To sum up: yes, I’ve read your thing. No, it’s not as convincing as you seem to believe.
Update, or provide counter evidence, or stop wasting my time.
In fact the most successful early precursor AGI we have—the atari deepmind agent—is a pure ANN.
People have been using ANNs for reinforcement learning tasks since at least the TD-Gammon system with varying success. The Deepmind Atari agent is bigger and the task is sexier, but calling it an early precursor AGI seems far fetched.
Consider the case of extreme hydrocephaly—where fluid fills in the center of the brain and replaces most of the brain and squeezes the remainder out to a thin surface near the skull. And yet, these patients can have above average IQs. Optimal dynamic wiring can explain this—the brain is constantly doing global optimization across the wiring structure, adapting to even extreme deformations and damage. How does evolved modularity explain this?
I suppose that the network topology of these brains is essentially normal, isn’t it? If that’s the case, then all the modules are still there, they are just squeezed against the skull wall.
This is nonsense—language processing develops in general purpose cortical modules, there is no specific language circuitry.
If I understand correctly, damage to Broca’s area or Wernicke’s area tends to cause speech impairment. This may be more or less severe depending on the individual, which is consistent with the evolved modularity hypotheses: genetically different individuals may have small differences in the location and shape of the brain modules.
Under the universal learning machine hypothesis, instead, we would expect that speech impairment following localized brain damage to quickly heal in most cases as other brain areas are recruited to the task. Note that there are large rewards for regaining linguistic ability, hence the brain would sacrifice other abilities if it could. This generally does not happen.
In fact, for most people with completely healthy brains it is difficult to learn a new language as well as a native speaker after the age of 10. This suggests that our language processing machinery is hard-wired to a significant extent.
The Deepmind Atari agent is bigger and the task is sexier, but calling it an early precursor AGI seems far fetched.
Hardly. It can learn a wide variety of tasks—many at above human level—in a variety of environments—all with only a few million neurons. It was on the cover of Nature for a reason.
Remember a mouse brain has the same core architecture as a human brain. The main components are all there and basically the same—just smaller—and with different size allocations across modules.
I suppose that the network topology of these brains is essentially normal, isn’t it? If that’s the case, then all the modules are still there, they are just squeezed against the skull wall.
From what I’ve read the topology is radically deformed, modules are lost, timing between remaining modules is totally changed—it’s massive brain damage. It’s so wierd that they can even still think that it has lead some neuroscientists to seriously consider that cognition comes from something other than neurons and synapses.
Under the universal learning machine hypothesis, instead, we would expect that speech impairment following localized brain damage to quickly heal in most cases as other brain areas are recruited to the task.
Not at all—relearning language would take at least as much time and computational power as learning it in the first place. Language is perhaps the most computationally challenging thing that humans learn—it takes roughly a decade to learn up to a high fluent adult level. Children learn faster—they have far more free cortical capacity. All of this is consistent with the ULH, and I bet it can even vaguely predict the time required for relearning language—although measuring the exact extent of damage to language centers is probably difficult .
This suggests that our language processing machinery is hard-wired to a significant extent.
Absolutely not—because you can look at the typical language modules in the microscope, and they are basically the same as the other cortical modules. Furthermore, there is no strong case for any mechanism that can encode any significant genetically predetermined task specific wiring complexity into the cortex. It is just like an ANN—the wiring is random. The modules are all basically the same.
First off the bat, you absolutely can create an AGI that is a pure ANN. In fact the most successful early precursor AGI we have—the atari deepmind agent—is a pure ANN. Your claim that ANNs/Deep Learning is not the end of all AGI research is quickly becoming a minority position.
The deepmind agent has no memory, one of the problems that I noted in the first place with naive ANN systems. The deepmind’s team’s solution to this is the neural Turing machine model, which is a hybrid system between a neural network and a database. It’s not a pure ANN. It isn’t even neuromorphic.
Improving its performance is going to involve giving it more structure and more specialized components, and not just throwing more neurons and training time at it.
For goodness sake: Geoffrey Hinton, the father of deep learning, believes that the future of machine vision is explicitly integrating the idea of three dimensional coordinates and geometry into the structure of the network itself, and moving away from more naive and general purpose conv-nets.
Your position is not as mainstream as you like to present it.
The real test here would be to take a brain and give it an entirely new sense
Done and done. Next!
If you’d read the full sentence that I wrote, you’d appreciate that remapping existing senses doesn’t actually address my disagreement. I want a new sense, to make absolutely sure that the subjects aren’t just re-using hard coding from a different system. Snarky, but not a useful contribution to the conversation.
This is nonsense—language processing develops in general purpose cortical modules, there is no specific language circuitry.
This is far from the mainstream linguistic perspective. Go argue with Noam Chomsky; he’s smarter than I am. Incidentally, you didn’t answer the question about birds and cats. Why can’t cats learn to do complex language tasks? Surely they also implement the universal learning algorithm just as parrots do.
What about Watson?
Not an AGI.
AGIs literally don’t exist, so that’s hardly a useful argument. Watson is the most powerful thing in its (fairly broad) class, and it’s not a neural network.
Finally, I don’t have the background to refute your argument on the efficiency of the brain (although I know clever people who do who disagree with you).
The correct thing to do here is update. Instead you are searching for ways in which you can ignore the evidence.
No, it really isn’t. I don’t update based on forum posts on topics I don’t understand, because I have no way to distinguish experts from crackpots.
The deepmind’s team’s solution to this is the neural Turing machine model, which is a hybrid system between a neural network and a database. It’s not a pure ANN.
Yes it is a pure ANN—according to my use of the term ANN (arguing over definitions is a waste of time). ANNs are fully general circuit models, which obviously can re-implement any module from any computer—memory, database, whatever. The defining characteristics of an ANN are—simulated network circuit structure based on analog/real valued nodes, and some universal learning algorithm over the weights—such as SGD.
Your position is not as mainstream as you like to present it.
You don’t understand my position. I don’t believe DL as it exists today is somehow the grail of AI. And yes I’m familiar with Hinton’s ‘Capsule’ proposals. And yes I agree there is still substantial room for improvement in ANN microarchitecture, and especially for learning invariances—and unsupervised especially.
This is far from the mainstream linguistic perspective.
For any theory of anything the brain does—if it isn’t grounded in computational neuroscience data, it is probably wrong—mainstream or not.
No, it really isn’t. I don’t update based on forum posts on topics I don’t understand, because I have no way to distinguish experts from crackpots.
You don’t update on forum posts? Really? You seem pretty familiar with MIRI and LW positions. So are you saying that you arrived at those positions all on your own somehow? Then you just showed up here, thankfully finding other people who just happened to have arrived at all the same ideas?
Yes it is a pure ANN—according to my use of the term ANN (arguing over definitions is a waste of time). ANNs are fully general circuit models, which obviously can re-implement any module from any computer—memory, database, whatever. The defining characteristics of an ANN are—simulated network circuit structure based on analog/real valued nodes, and some universal learning algorithm over the weights—such as SGD.
You could say that any machine learning system is an ANN, under a sufficiently vague definition. That’s not particularly useful in a discussion, however.
Yes it is a pure ANN—according to my use of the term ANN (arguing over definitions is a waste of time). ANNs are fully general circuit models, which obviously can re-implement any module from any computer—memory, database, whatever. The defining characteristics of an ANN are—simulated network circuit structure based on analog/real valued nodes, and some universal learning algorithm over the weights—such as SGD.
I think you misunderstood me. The current DeepMind AI that they’ve shown the public is a pure ANN. However, it has serious limitations because it’s not easy to implement long-term memory as a naive ANN. So they’re working on a successor called the “neural Turing machine” which marries an ANN to a database retrieval system—a specialized module.
You don’t understand my position. I don’t believe DL as it exists today is somehow the grail of AI. And yes I’m familiar with Hinton’s ‘Capsule’ proposals. And yes I agree there is still substantial room for improvement in ANN microarchitecture, and especially for learning invariances—and unsupervised especially.
The thing is, many of those improvements are dependent on the task at hand. It’s really, really hard for an off-the-shelf convnet neural network to learn the rules of three dimensional geometry, so we have to build it into the network. Our own visual processing shows signs of having the same structure imbedded in it.
The same structure would not, for example, benefit an NLP system, so we’d give it a different specialized structure, tuned to the hierarchical nature of language. The future, past a certain point, isn’t making ‘neural networks’ better. It’s making ‘machine vision’ networks better, or ‘natural language’ networks better. To make a long story short, specialized modules are an obvious place to go when you run into problem too complex to teach a naive convnet to do efficiently. Both for human engineers over the next 5-10, and for evolution over the last couple of billion.
You don’t update on forum posts? Really? You seem pretty familiar with MIRI and LW positions. So are you saying that you arrived at those positions all on your own somehow?
I have a CS and machine learning background, and am well-read on the subject outside LW. My math is extremely spotty, and my physics is non-existent. I update on things I read that I understand, or things from people I believe to be reputable. I don’t know you well enough to judge whether you usually say things that make sense, and I don’t have the physics to understand the argument you made or judge its validity. Therefore, I’m not inclined to update much on your conclusion.
EDIT: Oh, and you still haven’t responded to the cat thing. Which, seriously, seems like a pretty big hole in the universal learner hypothesis.
I update on things I read that I understand, or things from people I believe to be reputable.
So you are claiming that either you already understood AI/AGI completely when you arrived to LW, or you updated on LW/MIRI writings because they are ‘reputable’ - even though their positions are disavowed or even ridiculed by many machine learning experts.
EDIT: Oh, and you still haven’t responded to the cat thing. Which, seriously, seems like a pretty big hole in the universal learner hypothesis.
I replied here, and as expected—it looks like you are factually mistaken in your assertion that disagreed with the ULH. Better yet, the outcome of your cat vs bird observation was correctly predicted by the ULH, so that’s yet more evidence in its favor.
Let us see if we actually have significant differences of opinion. If so I expect to dominate you in any prediction market or bets concerning the near term future of AI.
and rudeness
or stop wasting my time
No one has any obligation to manage your time. If you want to stop wasting your time, you stop wasting your time.
Do you actually believe that evolved modularity is a better explanation of the brain then the ULM hypothesis? Do you have evidence for this belief or is it simply that which you want to be true? Do you understand why the computational neuroscience and machine learning folks are moving away from the latter towards the former? If you do have evidence please provide it in a critique in the comments for that post where I will respond.
Make some specific predictions for the next 5 years about deep learning or ANNs. Let us see if we actually have significant differences of opinion. If so I expect to dominate you in any prediction market or bets concerning the near term future of AI.
First off the bat, you absolutely can create an AGI that is a pure ANN. In fact the most successful early precursor AGI we have—the atari deepmind agent—is a pure ANN. Your claim that ANNs/Deep Learning is not the end of all AGI research is quickly becoming a minority position.
What the scottsman!
Done and done. Next!
I discussed this in the comments—it absolutely does explain neurotypical standardization. It’s a result of topographic/geometric wiring optimization. There is an exactly optimal location for every piece of functionality, and the brain tends to find those same optimal locations in each human. But if you significantly perturb the input sense or the brain geometry, you can get radically different results.
Consider the case of extreme hydrocephaly—where fluid fills in the center of the brain and replaces most of the brain and squeezes the remainder out to a thin surface near the skull. And yet, these patients can have above average IQs. Optimal dynamic wiring can explain this—the brain is constantly doing global optimization across the wiring structure, adapting to even extreme deformations and damage. How does evolved modularity explain this?
This is nonsense—language processing develops in general purpose cortical modules, there is no specific language circuitry.
There is a small amount of innate circuit structures—mainly in the brainstem, which can generate innate algorithms especially for walking behavior.
This is rather obvious—it depends on the ratio of pure learning structures (cortex, hippocampus, cerebellum) to innate circuit structures (brainstem, some midbrain, etc). In humans 95% or more of the circuitry is general purpose learning.
Not an AGI.
The correct thing to do here is update. Instead you are searching for ways in which you can ignore the evidence.
Obviously not—in theory given a power budget you can split it up into N AGIs or one big AGI. In practice due to parallel scaling limitations, there is always some optimal N. Even on a single GPU today, you need N about 100 or more to get good performance.
You can’t just invest all your energy into one big AGI and expect better performance—that is a mind numbingly naive strategy.
Update, or provide counter evidence, or stop wasting my time.
People have been using ANNs for reinforcement learning tasks since at least the TD-Gammon system with varying success. The Deepmind Atari agent is bigger and the task is sexier, but calling it an early precursor AGI seems far fetched.
I suppose that the network topology of these brains is essentially normal, isn’t it? If that’s the case, then all the modules are still there, they are just squeezed against the skull wall.
If I understand correctly, damage to Broca’s area or Wernicke’s area tends to cause speech impairment.
This may be more or less severe depending on the individual, which is consistent with the evolved modularity hypotheses: genetically different individuals may have small differences in the location and shape of the brain modules.
Under the universal learning machine hypothesis, instead, we would expect that speech impairment following localized brain damage to quickly heal in most cases as other brain areas are recruited to the task. Note that there are large rewards for regaining linguistic ability, hence the brain would sacrifice other abilities if it could. This generally does not happen.
In fact, for most people with completely healthy brains it is difficult to learn a new language as well as a native speaker after the age of 10. This suggests that our language processing machinery is hard-wired to a significant extent.
Hardly. It can learn a wide variety of tasks—many at above human level—in a variety of environments—all with only a few million neurons. It was on the cover of Nature for a reason.
Remember a mouse brain has the same core architecture as a human brain. The main components are all there and basically the same—just smaller—and with different size allocations across modules.
From what I’ve read the topology is radically deformed, modules are lost, timing between remaining modules is totally changed—it’s massive brain damage. It’s so wierd that they can even still think that it has lead some neuroscientists to seriously consider that cognition comes from something other than neurons and synapses.
Not at all—relearning language would take at least as much time and computational power as learning it in the first place. Language is perhaps the most computationally challenging thing that humans learn—it takes roughly a decade to learn up to a high fluent adult level. Children learn faster—they have far more free cortical capacity. All of this is consistent with the ULH, and I bet it can even vaguely predict the time required for relearning language—although measuring the exact extent of damage to language centers is probably difficult .
Absolutely not—because you can look at the typical language modules in the microscope, and they are basically the same as the other cortical modules. Furthermore, there is no strong case for any mechanism that can encode any significant genetically predetermined task specific wiring complexity into the cortex. It is just like an ANN—the wiring is random. The modules are all basically the same.
The deepmind agent has no memory, one of the problems that I noted in the first place with naive ANN systems. The deepmind’s team’s solution to this is the neural Turing machine model, which is a hybrid system between a neural network and a database. It’s not a pure ANN. It isn’t even neuromorphic.
Improving its performance is going to involve giving it more structure and more specialized components, and not just throwing more neurons and training time at it.
For goodness sake: Geoffrey Hinton, the father of deep learning, believes that the future of machine vision is explicitly integrating the idea of three dimensional coordinates and geometry into the structure of the network itself, and moving away from more naive and general purpose conv-nets.
Source: https://github.com/WalnutiQ/WalnutiQ/issues/157
Your position is not as mainstream as you like to present it.
If you’d read the full sentence that I wrote, you’d appreciate that remapping existing senses doesn’t actually address my disagreement. I want a new sense, to make absolutely sure that the subjects aren’t just re-using hard coding from a different system. Snarky, but not a useful contribution to the conversation.
This is far from the mainstream linguistic perspective. Go argue with Noam Chomsky; he’s smarter than I am. Incidentally, you didn’t answer the question about birds and cats. Why can’t cats learn to do complex language tasks? Surely they also implement the universal learning algorithm just as parrots do.
AGIs literally don’t exist, so that’s hardly a useful argument. Watson is the most powerful thing in its (fairly broad) class, and it’s not a neural network.
No, it really isn’t. I don’t update based on forum posts on topics I don’t understand, because I have no way to distinguish experts from crackpots.
Yes it is a pure ANN—according to my use of the term ANN (arguing over definitions is a waste of time). ANNs are fully general circuit models, which obviously can re-implement any module from any computer—memory, database, whatever. The defining characteristics of an ANN are—simulated network circuit structure based on analog/real valued nodes, and some universal learning algorithm over the weights—such as SGD.
You don’t understand my position. I don’t believe DL as it exists today is somehow the grail of AI. And yes I’m familiar with Hinton’s ‘Capsule’ proposals. And yes I agree there is still substantial room for improvement in ANN microarchitecture, and especially for learning invariances—and unsupervised especially.
For any theory of anything the brain does—if it isn’t grounded in computational neuroscience data, it is probably wrong—mainstream or not.
You don’t update on forum posts? Really? You seem pretty familiar with MIRI and LW positions. So are you saying that you arrived at those positions all on your own somehow? Then you just showed up here, thankfully finding other people who just happened to have arrived at all the same ideas?
You could say that any machine learning system is an ANN, under a sufficiently vague definition. That’s not particularly useful in a discussion, however.
I think you misunderstood me. The current DeepMind AI that they’ve shown the public is a pure ANN. However, it has serious limitations because it’s not easy to implement long-term memory as a naive ANN. So they’re working on a successor called the “neural Turing machine” which marries an ANN to a database retrieval system—a specialized module.
The thing is, many of those improvements are dependent on the task at hand. It’s really, really hard for an off-the-shelf convnet neural network to learn the rules of three dimensional geometry, so we have to build it into the network. Our own visual processing shows signs of having the same structure imbedded in it.
The same structure would not, for example, benefit an NLP system, so we’d give it a different specialized structure, tuned to the hierarchical nature of language. The future, past a certain point, isn’t making ‘neural networks’ better. It’s making ‘machine vision’ networks better, or ‘natural language’ networks better. To make a long story short, specialized modules are an obvious place to go when you run into problem too complex to teach a naive convnet to do efficiently. Both for human engineers over the next 5-10, and for evolution over the last couple of billion.
I have a CS and machine learning background, and am well-read on the subject outside LW. My math is extremely spotty, and my physics is non-existent. I update on things I read that I understand, or things from people I believe to be reputable. I don’t know you well enough to judge whether you usually say things that make sense, and I don’t have the physics to understand the argument you made or judge its validity. Therefore, I’m not inclined to update much on your conclusion.
EDIT: Oh, and you still haven’t responded to the cat thing. Which, seriously, seems like a pretty big hole in the universal learner hypothesis.
So you are claiming that either you already understood AI/AGI completely when you arrived to LW, or you updated on LW/MIRI writings because they are ‘reputable’ - even though their positions are disavowed or even ridiculed by many machine learning experts.
I replied here, and as expected—it looks like you are factually mistaken in your assertion that disagreed with the ULH. Better yet, the outcome of your cat vs bird observation was correctly predicted by the ULH, so that’s yet more evidence in its favor.
Let me point out the blatant hubris:
and rudeness
No one has any obligation to manage your time. If you want to stop wasting your time, you stop wasting your time.
Hubris—perhaps, but it was a challenge. Making predictions/bets can help clarify differences in world models.
The full quote was this:
In the context that he had just claimed that he wasn’t going to update.