“These types of IQ tests measure specific abilities which are correlated with general intelligence in humans, but these specific abilities are only a small subset of the systems/abilities required for general intelligence, and probably rely on a smallish subset of the brain’s circuitry.”
Sounds like you’re tinting your statement with a society-of-mindish perspective...would you say that’s fair? There are even stronger reasons to question the importance of such results. Rather than exercising some specific dedicated IQ-test circuitry in the brain, it could just as well be that, in humans, IQ scores reflect how well-tuned your cognitive machinery is by some general measure much like an athlete’s hundred-meter sprint time reflects his overall fitness as well as white muscle development in his legs. He would not achieve a low sprint time were it not for the proper functioning of his very complex, variegated biology. That one can build a simple robot that achieves faster sprint times says very little about that robot’s potential for gymnastics or martial arts. For a robot to perform well on the sprint time just does not require anything like the solution to everything that evolution had to solve for human athletes to perform as well as they do.
Arithmetic ability would be another example of a metric which is nowhere near “agi-complete” since it’s solvable by a relatively straightforward procedure.
In that case the solution of special “sub-tasks” of intelligence such as IQ puzzles would seem to me pretty uninformative. I’m reminded of the difference between Harry Foundalis’ Phaeco which learns how to represent visual structures independently vs. earlier work in Bongard problems which ‘cheated’ by pre-encoding the images as logical objects.
I suppose my view on this is somewhat an inversion of what I see as the norm: others dismiss IQ tests as meaningless for human beings but find them significant metrics for machine intelligence.
I don’t particularly endorse a society-of-mindish perspective (at least I don’t think so—I’m only vaguely familiar with the term in relation to something Minsky wrote).
I mostly agree with your general points above.
Rather than exercising some specific dedicated IQ-test circuitry in the brain, it could just as well be that, in humans, IQ scores reflect how well-tuned your cognitive machinery is by some general measure much like an athlete’s hundred-meter sprint time reflects his overall fitness as well as white muscle development in his legs.
Yes, that seems pretty plausible. However, another related explanation is that IQ in humans relates to several key high level tradeoffs in the space of niches in a tribe/economy. One such tradeoff is the neotany tradeoff—how much to delay learning and developoment. In general you can achieve higher brain task performance (general intelligence) by delaying learning/development to get more training data (life experience) at the obvious expense of missing out on earlier mating opportunities. High IQ humans of the type common on this site probably result from the combination of delayed development and high innate curiosity as a basic drive (traits which combine together well). In this model medium/average IQ correlates to a genetic strategy favoring earlier maturation to quickly attain social status and mating opportunities.
I suppose my view on this is somewhat an inversion of what I see as the norm: others dismiss IQ tests as meaningless for human beings but find them significant metrics for machine intelligence.
Really—do you mean norm for society in general or norm for LW? I agree that IQ tests are meaningful for humans but less so for AI/AGI.
However—I also do believe that this particular type of test measures something of value for AI, and this research does represent some amount of real progress (assuming the results are genuine and will be replicated). There are however probably better and more challenging types of QA tasks that more specifically test abilities important/hard for AGI that are easy for humans.
Really—do you mean norm for society in general or norm for LW?
The general norm for each position separately.
I also do believe that this particular type of test measures something of value for AI
Insofar as they showcase generally applicable methods, I would agree. Their use of deep learning seems encouraging, though I cannot tell from the abstract how domain-specfic their methods are, and thus to what extent similar techniques could figure into an architecture for general intelligence. If the techniques used don’t robustly generalise, then you’d have to tailor the approach to whatever particular domain you’re working in. Thus the society of mind remark—Minsky’s thesis as I understand it is that the mind is a kludge of tailor-made components that perform nicely in their domain but are basically useless outside of it (which seems to me incompatible with the phenomenon of neuroplasticity). Anybody advocating for novel, domain-specific tailoring of general algorithms to specific domains is then adhering to Minsky’s approach.
To take seriously the idea that some system represents a concrete step towards general intelligence, I’d have to see its performance on a battery of “agi-hard” metrics. I can’t give a precise definition of what such might be, but IQ subtests that drastically restrict the scope of NLP techniques needed seem obviously not to qualify.
A much more compelling performance would be the ability for a system to, say, read a textbook on topology and then pass an exam paper on the subject, with neither having been pre-formated into a convenient represention.
Thus the society of mind remark—Minsky’s thesis as I understand it is that the mind is a kludge of tailor-made components that perform nicely in their domain but are basically useless outside of it (which seems to me incompatible with the phenomenon of neuroplasticity).
In a complex ANN or a brain, you start with a really simple hierarchical prior over the network and a general purpose optimizer. After training you may get a ‘kludge of tailor-made components’ that perform really well on the domain you trained on. The result may be specific, but the process is very general.
A much more compelling performance would be the ability for a system to, say, read a textbook on topology and then pass an exam paper on the subject,
Yes, but that probably requires a large number of precursor capabilities AI systems do not yet possess.
I generally agree that a proper “agi-hard” metric will include a large battery of tests to get coverage over a wide range of abilities. We actually already have a good deal of experience on how to train AGIs and how to come up with good test metrics—in the field of education.
However you could view the various AI benchmarks in aggregation as an AGI test battery—each test measures only a narrow ability but combine enough of those tests and you have something more general. The recent development of textual QA benchmarks is another next step in that progression. Game environment tests such as Atari provide another orthogonal way to measure AGI progress.
Just to be clear: what I meant by “domain specific methods” in this context is auxiliary techniques that boost the performance of the general “component synthesis procedure” (such as an ANN) within a specific domain. It seems that if you want a truly general system, even one that works by producing hairy purpose specific components, then such auxiliary techniques cannot be used (unless synthesized by the agent itself). You can push this requirement to absurdity in practice, so I’m only stressing that it should be capable of tractably inventing its own auxiliary procedures in principle even if it didn’t actually invent all the ones it uses. On the whole however I pretty much concur.
“These types of IQ tests measure specific abilities which are correlated with general intelligence in humans, but these specific abilities are only a small subset of the systems/abilities required for general intelligence, and probably rely on a smallish subset of the brain’s circuitry.”
Sounds like you’re tinting your statement with a society-of-mindish perspective...would you say that’s fair? There are even stronger reasons to question the importance of such results. Rather than exercising some specific dedicated IQ-test circuitry in the brain, it could just as well be that, in humans, IQ scores reflect how well-tuned your cognitive machinery is by some general measure much like an athlete’s hundred-meter sprint time reflects his overall fitness as well as white muscle development in his legs. He would not achieve a low sprint time were it not for the proper functioning of his very complex, variegated biology. That one can build a simple robot that achieves faster sprint times says very little about that robot’s potential for gymnastics or martial arts. For a robot to perform well on the sprint time just does not require anything like the solution to everything that evolution had to solve for human athletes to perform as well as they do.
Arithmetic ability would be another example of a metric which is nowhere near “agi-complete” since it’s solvable by a relatively straightforward procedure.
In that case the solution of special “sub-tasks” of intelligence such as IQ puzzles would seem to me pretty uninformative. I’m reminded of the difference between Harry Foundalis’ Phaeco which learns how to represent visual structures independently vs. earlier work in Bongard problems which ‘cheated’ by pre-encoding the images as logical objects.
I suppose my view on this is somewhat an inversion of what I see as the norm: others dismiss IQ tests as meaningless for human beings but find them significant metrics for machine intelligence.
I don’t particularly endorse a society-of-mindish perspective (at least I don’t think so—I’m only vaguely familiar with the term in relation to something Minsky wrote).
I mostly agree with your general points above.
Yes, that seems pretty plausible. However, another related explanation is that IQ in humans relates to several key high level tradeoffs in the space of niches in a tribe/economy. One such tradeoff is the neotany tradeoff—how much to delay learning and developoment. In general you can achieve higher brain task performance (general intelligence) by delaying learning/development to get more training data (life experience) at the obvious expense of missing out on earlier mating opportunities. High IQ humans of the type common on this site probably result from the combination of delayed development and high innate curiosity as a basic drive (traits which combine together well). In this model medium/average IQ correlates to a genetic strategy favoring earlier maturation to quickly attain social status and mating opportunities.
Really—do you mean norm for society in general or norm for LW? I agree that IQ tests are meaningful for humans but less so for AI/AGI.
However—I also do believe that this particular type of test measures something of value for AI, and this research does represent some amount of real progress (assuming the results are genuine and will be replicated). There are however probably better and more challenging types of QA tasks that more specifically test abilities important/hard for AGI that are easy for humans.
Insofar as they showcase generally applicable methods, I would agree. Their use of deep learning seems encouraging, though I cannot tell from the abstract how domain-specfic their methods are, and thus to what extent similar techniques could figure into an architecture for general intelligence. If the techniques used don’t robustly generalise, then you’d have to tailor the approach to whatever particular domain you’re working in. Thus the society of mind remark—Minsky’s thesis as I understand it is that the mind is a kludge of tailor-made components that perform nicely in their domain but are basically useless outside of it (which seems to me incompatible with the phenomenon of neuroplasticity). Anybody advocating for novel, domain-specific tailoring of general algorithms to specific domains is then adhering to Minsky’s approach.
To take seriously the idea that some system represents a concrete step towards general intelligence, I’d have to see its performance on a battery of “agi-hard” metrics. I can’t give a precise definition of what such might be, but IQ subtests that drastically restrict the scope of NLP techniques needed seem obviously not to qualify.
A much more compelling performance would be the ability for a system to, say, read a textbook on topology and then pass an exam paper on the subject, with neither having been pre-formated into a convenient represention.
In a complex ANN or a brain, you start with a really simple hierarchical prior over the network and a general purpose optimizer. After training you may get a ‘kludge of tailor-made components’ that perform really well on the domain you trained on. The result may be specific, but the process is very general.
Yes, but that probably requires a large number of precursor capabilities AI systems do not yet possess.
I generally agree that a proper “agi-hard” metric will include a large battery of tests to get coverage over a wide range of abilities. We actually already have a good deal of experience on how to train AGIs and how to come up with good test metrics—in the field of education.
However you could view the various AI benchmarks in aggregation as an AGI test battery—each test measures only a narrow ability but combine enough of those tests and you have something more general. The recent development of textual QA benchmarks is another next step in that progression. Game environment tests such as Atari provide another orthogonal way to measure AGI progress.
Just to be clear: what I meant by “domain specific methods” in this context is auxiliary techniques that boost the performance of the general “component synthesis procedure” (such as an ANN) within a specific domain. It seems that if you want a truly general system, even one that works by producing hairy purpose specific components, then such auxiliary techniques cannot be used (unless synthesized by the agent itself). You can push this requirement to absurdity in practice, so I’m only stressing that it should be capable of tractably inventing its own auxiliary procedures in principle even if it didn’t actually invent all the ones it uses. On the whole however I pretty much concur.