The sad/good thing is that this article represents progress. I recall that in Human Compatible Stuart Russell said that there was a joint declaration from some ML researchers that AGI is completely impossible, and its clear from this article that Oren is at least thinking about it as a real possibility that isn’t hundreds of years away. Automatically forming learning problems sounds a lot like automatically discovering actions, which is something Stuart Russell also mentioned in a list of necessary breakthroughs to reach AGI, so maybe there’s some widespread agreement about what is still missing.
That aside, even by some of Oren’s own metrics, we’ve made quite substantial progress—he mentions the Winograd schemas as a good test of when we’re approaching human-like language understanding and common sense, but what he may not know is that GPT-2 actually bridged a significant fraction of the gap on Winograd schema performance between the best existing language models and humans, which is a good object lesson in how the speed of progress can surprise you—from 63% to 71%, with humans at about 92% accuracy according to deepmind.
I find that the Winograd schemas is more useful as a guideline to adversarial queries to stump AIs than an actual test. An AI reaching human-level accuracy on Winograd schemas would be much less impressive to me than an AI passing the traditional Turing test conducted by an expert who is aware of Winograd schemas and experienced in adversarial queries in general. The former is more susceptible to Goodhart’s law due to the stringent format and limited problem space.
The sad/good thing is that this article represents progress. I recall that in Human Compatible Stuart Russell said that there was a joint declaration from some ML researchers that AGI is completely impossible, and its clear from this article that Oren is at least thinking about it as a real possibility that isn’t hundreds of years away. Automatically forming learning problems sounds a lot like automatically discovering actions, which is something Stuart Russell also mentioned in a list of necessary breakthroughs to reach AGI, so maybe there’s some widespread agreement about what is still missing.
That aside, even by some of Oren’s own metrics, we’ve made quite substantial progress—he mentions the Winograd schemas as a good test of when we’re approaching human-like language understanding and common sense, but what he may not know is that GPT-2 actually bridged a significant fraction of the gap on Winograd schema performance between the best existing language models and humans, which is a good object lesson in how the speed of progress can surprise you—from 63% to 71%, with humans at about 92% accuracy according to deepmind.
I find that the Winograd schemas is more useful as a guideline to adversarial queries to stump AIs than an actual test. An AI reaching human-level accuracy on Winograd schemas would be much less impressive to me than an AI passing the traditional Turing test conducted by an expert who is aware of Winograd schemas and experienced in adversarial queries in general. The former is more susceptible to Goodhart’s law due to the stringent format and limited problem space.