In the brain, the same circuitry that is used to solve vision is used to solve most of the rest of cognition—vision is 10% of the cortex. Going from superhuman vision to superhuman Go suggests superhuman anything/everything is getting near.
The reason being that strong Go requires both deep slow inference over huge data/time (which DL excels in, similar to what the cortex/cerebellum specialize in), combined with fast/low data inference (the MCTS part here). There is still much room for improvement in generalizing beyond current MCTS techniques, and better integration into larger scale ANNs, but that is increasingly looking straightforward.
It’s tempting to assume that the “keystone, foundational aspect” of intelligence is learning essentially the same way that artificial neural networks learn.
Yes, but only because “ANN” is enormously broad (tensor/linear algebra program space), and basically includes all possible routes to AGI (all possible approximations of bayesian inference).
But humans can do things like “one-shot” learning, learning from weak supervision, learning in non-stationary environments, etc. which no current neural network can do, and not just because a matter of scale or architectural “details”.
Bayesian methods excel at one shot learning, and are steadily integrating themselves into ANN techniques (providing the foundation needed to derive new learning and inference rules). Progress in transfer and semi-supervised learning is also progressing rapidly and the theory is all there. I don’t know about non-stationary as much, but I’d be pretty surprised if there wasn’t progress there as well.
Thus I think it’s fair to say that we still don’t know what the foundational aspects of intelligence are.
LOL. Generalized DL + MCTS is—rather obviously—a practical approximation of universal intelligence like AIXI. I doubt MCTS scales to all domains well enough, but the obvious next step is for DL to eat MCTS techniques (so that super new complex heuristic search techniques can be learned automatically).
In the brain, the same circuitry that is used to solve vision is used to solve most of the rest of cognition
And in a laptop the same circuitry that it is used to run a spreadsheet is used to play a video game.
Systems that are Turing-complete (in the limit of infinite resources) tend to have an independence between hardware and possibly many layers of software (program running on VM running on VM running on VM and so on). Things that look similar at a some levels may have lots of difference at other levels, and thus things that look simple at some levels can have lots of hidden complexity at other levels.
Going from superhuman vision
Human-level (perhaps weakly superhuman) vision is achieved only in very specific tasks where large supervised datasets are available. This is not very surprising, since even traditional “hand-coded” computer vision could achieve superhuman performances in some narrow and clearly specified tasks.
Yes, but only because “ANN” is enormously broad (tensor/linear algebra program space), and basically includes all possible routes to AGI (all possible approximations of bayesian inference).
Again, ANN are Turing-complete, therefore in principle they include literally everything, but so does the brute-force search of C programs.
In practice if you try to generate C programs by brute-force search you will get stuck pretty fast, while ANN with gradient descent training empirically work well on various kinds of practical problems, but not on all kinds practical problems that humans are good at, and how to make them work on these problems, if it even efficiently possible, is a whole open research field.
Bayesian methods excel at one shot learning
With lots of task-specific engineering.
Generalized DL + MCTS is—rather obviously—a practical approximation of universal intelligence like AIXI.
So are things like AIXI-tl, Hutter-search, Gödel machine, and so on. Yet I would not consider any of them as the “foundational aspect” of intelligence.
And in a laptop the same circuitry that it is used to run a spreadsheet is used to play a video game.
Exactly, and this a good analogy to illustrate my point. Discovering that the cortical circuitry is universal vs task-specific (like an ASIC) was a key discovery.
Human-level (perhaps weakly superhuman) vision is achieved only in very specific tasks where large supervised datasets are available.
Note I didn’t say that we have solved vision to superhuman level, but this is simply not true. Current SOTA nets can achieve human-level performance in at least some domains using modest amounts of unsupervised data combined with small amounts of supervised data.
Human vision builds on enormous amounts of unsupervised data—much larger than ImageNet. Learning in the brain is complex and multi-objective, but perhaps best described as self-supervised (unsupervised meta-learning of sub-objective functions which then can be used for supervised learning).
A five year old will have experienced perhaps 50 million seconds worth of video data. Imagenet consists of 1 million images, which is vaguely equivalent to 1 million seconds of video if we include 30x amplification for small translations/rotations.
The brain’s vision system is about 100x larger than current ‘large’ vision ANNs. But If deepmind decided to spend the cash on that and make it a huge one off research priority, do you really doubt that they could build a superhuman general vision system that learns with a similar dataset and training duration?
So are things like AIXI-tl, Hutter-search, Gödel machine, and so on. Yet I would not consider any of them as the “foundational aspect” of intelligence.
The foundation of intelligence is just inference—simply because universal inference is sufficient to solve any other problem. AIXI is already simple, but you can make it even simpler by replacing the planning component with inference over high EV actions, or even just inference over program space to learn approx planning.
So it all boils down to efficient inference. The new exciting progress in DL—for me at least—is in understanding how successful empirical optimization techniques can be derived as approx inference update schemes with various types of priors. This is what I referred to as new and upcoming “Bayesian methods”—bayesian grounded DL.
Yes, but only because “ANN” is enormously broad (tensor/linear algebra program space), and basically includes all possible routes to AGI (all possible approximations of bayesian inference).
“Enormously broad” is just another way of saying “not very useful”. We don’t even know in which sense (if any) the “deep networks” that are used in practice may be said to approximate Bayesian inference; the best we can do, AIUI, is make up a hand-wavy story about how they must be some “hierarchical” variation of single-layer networks, i.e. generalized linear models.
Specifically I meant approx bayesian inference over the tensor program space to learn the ANN, not that the ANN itself needs to implement bayesian inference (although they will naturally tend to learn that, as we see in all the evidence for various bayesian ops in the brain) .
In the brain, the same circuitry that is used to solve vision is used to solve most of the rest of cognition—vision is 10% of the cortex. Going from superhuman vision to superhuman Go suggests superhuman anything/everything is getting near.
The reason being that strong Go requires both deep slow inference over huge data/time (which DL excels in, similar to what the cortex/cerebellum specialize in), combined with fast/low data inference (the MCTS part here). There is still much room for improvement in generalizing beyond current MCTS techniques, and better integration into larger scale ANNs, but that is increasingly looking straightforward.
Yes, but only because “ANN” is enormously broad (tensor/linear algebra program space), and basically includes all possible routes to AGI (all possible approximations of bayesian inference).
Bayesian methods excel at one shot learning, and are steadily integrating themselves into ANN techniques (providing the foundation needed to derive new learning and inference rules). Progress in transfer and semi-supervised learning is also progressing rapidly and the theory is all there. I don’t know about non-stationary as much, but I’d be pretty surprised if there wasn’t progress there as well.
LOL. Generalized DL + MCTS is—rather obviously—a practical approximation of universal intelligence like AIXI. I doubt MCTS scales to all domains well enough, but the obvious next step is for DL to eat MCTS techniques (so that super new complex heuristic search techniques can be learned automatically).
And in a laptop the same circuitry that it is used to run a spreadsheet is used to play a video game.
Systems that are Turing-complete (in the limit of infinite resources) tend to have an independence between hardware and possibly many layers of software (program running on VM running on VM running on VM and so on). Things that look similar at a some levels may have lots of difference at other levels, and thus things that look simple at some levels can have lots of hidden complexity at other levels.
Human-level (perhaps weakly superhuman) vision is achieved only in very specific tasks where large supervised datasets are available. This is not very surprising, since even traditional “hand-coded” computer vision could achieve superhuman performances in some narrow and clearly specified tasks.
Again, ANN are Turing-complete, therefore in principle they include literally everything, but so does the brute-force search of C programs.
In practice if you try to generate C programs by brute-force search you will get stuck pretty fast, while ANN with gradient descent training empirically work well on various kinds of practical problems, but not on all kinds practical problems that humans are good at, and how to make them work on these problems, if it even efficiently possible, is a whole open research field.
With lots of task-specific engineering.
So are things like AIXI-tl, Hutter-search, Gödel machine, and so on. Yet I would not consider any of them as the “foundational aspect” of intelligence.
Exactly, and this a good analogy to illustrate my point. Discovering that the cortical circuitry is universal vs task-specific (like an ASIC) was a key discovery.
Note I didn’t say that we have solved vision to superhuman level, but this is simply not true. Current SOTA nets can achieve human-level performance in at least some domains using modest amounts of unsupervised data combined with small amounts of supervised data.
Human vision builds on enormous amounts of unsupervised data—much larger than ImageNet. Learning in the brain is complex and multi-objective, but perhaps best described as self-supervised (unsupervised meta-learning of sub-objective functions which then can be used for supervised learning).
A five year old will have experienced perhaps 50 million seconds worth of video data. Imagenet consists of 1 million images, which is vaguely equivalent to 1 million seconds of video if we include 30x amplification for small translations/rotations.
The brain’s vision system is about 100x larger than current ‘large’ vision ANNs. But If deepmind decided to spend the cash on that and make it a huge one off research priority, do you really doubt that they could build a superhuman general vision system that learns with a similar dataset and training duration?
The foundation of intelligence is just inference—simply because universal inference is sufficient to solve any other problem. AIXI is already simple, but you can make it even simpler by replacing the planning component with inference over high EV actions, or even just inference over program space to learn approx planning.
So it all boils down to efficient inference. The new exciting progress in DL—for me at least—is in understanding how successful empirical optimization techniques can be derived as approx inference update schemes with various types of priors. This is what I referred to as new and upcoming “Bayesian methods”—bayesian grounded DL.
“Enormously broad” is just another way of saying “not very useful”. We don’t even know in which sense (if any) the “deep networks” that are used in practice may be said to approximate Bayesian inference; the best we can do, AIUI, is make up a hand-wavy story about how they must be some “hierarchical” variation of single-layer networks, i.e. generalized linear models.
Specifically I meant approx bayesian inference over the tensor program space to learn the ANN, not that the ANN itself needs to implement bayesian inference (although they will naturally tend to learn that, as we see in all the evidence for various bayesian ops in the brain) .