Really—do you mean norm for society in general or norm for LW?
The general norm for each position separately.
I also do believe that this particular type of test measures something of value for AI
Insofar as they showcase generally applicable methods, I would agree. Their use of deep learning seems encouraging, though I cannot tell from the abstract how domain-specfic their methods are, and thus to what extent similar techniques could figure into an architecture for general intelligence. If the techniques used don’t robustly generalise, then you’d have to tailor the approach to whatever particular domain you’re working in. Thus the society of mind remark—Minsky’s thesis as I understand it is that the mind is a kludge of tailor-made components that perform nicely in their domain but are basically useless outside of it (which seems to me incompatible with the phenomenon of neuroplasticity). Anybody advocating for novel, domain-specific tailoring of general algorithms to specific domains is then adhering to Minsky’s approach.
To take seriously the idea that some system represents a concrete step towards general intelligence, I’d have to see its performance on a battery of “agi-hard” metrics. I can’t give a precise definition of what such might be, but IQ subtests that drastically restrict the scope of NLP techniques needed seem obviously not to qualify.
A much more compelling performance would be the ability for a system to, say, read a textbook on topology and then pass an exam paper on the subject, with neither having been pre-formated into a convenient represention.
Thus the society of mind remark—Minsky’s thesis as I understand it is that the mind is a kludge of tailor-made components that perform nicely in their domain but are basically useless outside of it (which seems to me incompatible with the phenomenon of neuroplasticity).
In a complex ANN or a brain, you start with a really simple hierarchical prior over the network and a general purpose optimizer. After training you may get a ‘kludge of tailor-made components’ that perform really well on the domain you trained on. The result may be specific, but the process is very general.
A much more compelling performance would be the ability for a system to, say, read a textbook on topology and then pass an exam paper on the subject,
Yes, but that probably requires a large number of precursor capabilities AI systems do not yet possess.
I generally agree that a proper “agi-hard” metric will include a large battery of tests to get coverage over a wide range of abilities. We actually already have a good deal of experience on how to train AGIs and how to come up with good test metrics—in the field of education.
However you could view the various AI benchmarks in aggregation as an AGI test battery—each test measures only a narrow ability but combine enough of those tests and you have something more general. The recent development of textual QA benchmarks is another next step in that progression. Game environment tests such as Atari provide another orthogonal way to measure AGI progress.
Just to be clear: what I meant by “domain specific methods” in this context is auxiliary techniques that boost the performance of the general “component synthesis procedure” (such as an ANN) within a specific domain. It seems that if you want a truly general system, even one that works by producing hairy purpose specific components, then such auxiliary techniques cannot be used (unless synthesized by the agent itself). You can push this requirement to absurdity in practice, so I’m only stressing that it should be capable of tractably inventing its own auxiliary procedures in principle even if it didn’t actually invent all the ones it uses. On the whole however I pretty much concur.
Insofar as they showcase generally applicable methods, I would agree. Their use of deep learning seems encouraging, though I cannot tell from the abstract how domain-specfic their methods are, and thus to what extent similar techniques could figure into an architecture for general intelligence. If the techniques used don’t robustly generalise, then you’d have to tailor the approach to whatever particular domain you’re working in. Thus the society of mind remark—Minsky’s thesis as I understand it is that the mind is a kludge of tailor-made components that perform nicely in their domain but are basically useless outside of it (which seems to me incompatible with the phenomenon of neuroplasticity). Anybody advocating for novel, domain-specific tailoring of general algorithms to specific domains is then adhering to Minsky’s approach.
To take seriously the idea that some system represents a concrete step towards general intelligence, I’d have to see its performance on a battery of “agi-hard” metrics. I can’t give a precise definition of what such might be, but IQ subtests that drastically restrict the scope of NLP techniques needed seem obviously not to qualify.
A much more compelling performance would be the ability for a system to, say, read a textbook on topology and then pass an exam paper on the subject, with neither having been pre-formated into a convenient represention.
In a complex ANN or a brain, you start with a really simple hierarchical prior over the network and a general purpose optimizer. After training you may get a ‘kludge of tailor-made components’ that perform really well on the domain you trained on. The result may be specific, but the process is very general.
Yes, but that probably requires a large number of precursor capabilities AI systems do not yet possess.
I generally agree that a proper “agi-hard” metric will include a large battery of tests to get coverage over a wide range of abilities. We actually already have a good deal of experience on how to train AGIs and how to come up with good test metrics—in the field of education.
However you could view the various AI benchmarks in aggregation as an AGI test battery—each test measures only a narrow ability but combine enough of those tests and you have something more general. The recent development of textual QA benchmarks is another next step in that progression. Game environment tests such as Atari provide another orthogonal way to measure AGI progress.
Just to be clear: what I meant by “domain specific methods” in this context is auxiliary techniques that boost the performance of the general “component synthesis procedure” (such as an ANN) within a specific domain. It seems that if you want a truly general system, even one that works by producing hairy purpose specific components, then such auxiliary techniques cannot be used (unless synthesized by the agent itself). You can push this requirement to absurdity in practice, so I’m only stressing that it should be capable of tractably inventing its own auxiliary procedures in principle even if it didn’t actually invent all the ones it uses. On the whole however I pretty much concur.