In order for me to update on this it would be great to have concrete examples of what does and does not consistute “nontrivial theoretical insights” according to you and Paul.
E.g. what was the insight from the 1980s? And what part of the AG(Z) architecture did you initially consider nontrivial?
A more precise version of of my claim: if you gave smart grad students from 1990 access to all of the non-AI technology of 2017 (esp. software tools + hardware + data) and a big budget, it would not take them long to reach nearly state of the art performance on supervised learning and RL. For example, I think it’s pretty plausible that 20 good grad students could do it in 3 years if they were motivated and reasonably well managed.
If they are allowed to query for 1 bit of advice per month (e.g. “should we explore approach X?”) then I think it’s more likely than not that they would succeed. The advice is obviously a huge advantage, but I don’t think that it can plausibly substitute for “nontrivial theoretical insight.”
There is lots of uncertainty about that operationalization, but the main question is just whether there are way too many small things to figure out and iterate on rather than whether there are big insights.
(Generative modeling involves a little bit more machinery. I don’t have a strong view on whether they would figure out GANs or VAEs, though I’d guess so. Autoregressive models aren’t terrible anyway.)
They certainly wouldn’t come up with every trick or clever idea, but I expect they’d come up with the most important ones. With only 60 person-years they wouldn’t be able to put in very much domain-specific effort for any domain, so probably wouldn’t actually set SOTA, but I think they would likely get within a few years.
(I independently came up with the AGZ and GAN algorithms while writing safety posts, which I consider reasonable evidence that the ideas are natural and aren’t that hard. I expect there are a large number of cases of independent invention, with credit reasonably going to whoever actually gets it working.)
I don’t have as strong a view about whether this was also true in the 70s. By the late 80s, neural nets trained with backprop were a relatively prominent/popular hypothesis about how to build AGI, so you would have spent less time on alternatives. You have some simple algorithms each of which might turn out to not be obvious (like Q learning, which I think are roughly as tricky as the AGZ algorithm). You have the basic ideas for CNNs (though I haven’t looked into this extensively and don’t know how much of the idea was actually developed by 1990 vs. in 1998). I feel less comfortable betting on the grad students if you take all those things away. But realistically it’s more like a continuous increase in probability of success rather than some insight that happened in the 80s.
If you tried to improve the grad students’ performance by shipping back some critical insights, what would they be?
Do you think that solving Starcraft (by self-play) will require some major insight or will it be just a matter of incremental improvement of existing methods?
I don’t think it will require any new insight. It might require using slightly different algorithms—better techniques for scaling, different architectures to handle incomplete information, maybe a different training strategy to handle the very long time horizons; if they don’t tie their hands it’s probably also worth adding on a bunch of domain-specific junk.
I updated towards a “fox” rather than “hedgehog” view of what intelligence is: you need to get many small things right, rather than one big thing. I’ll reply later if feel like I have a useful reply.
In order for me to update on this it would be great to have concrete examples of what does and does not consistute “nontrivial theoretical insights” according to you and Paul.
E.g. what was the insight from the 1980s? And what part of the AG(Z) architecture did you initially consider nontrivial?
A more precise version of of my claim: if you gave smart grad students from 1990 access to all of the non-AI technology of 2017 (esp. software tools + hardware + data) and a big budget, it would not take them long to reach nearly state of the art performance on supervised learning and RL. For example, I think it’s pretty plausible that 20 good grad students could do it in 3 years if they were motivated and reasonably well managed.
If they are allowed to query for 1 bit of advice per month (e.g. “should we explore approach X?”) then I think it’s more likely than not that they would succeed. The advice is obviously a huge advantage, but I don’t think that it can plausibly substitute for “nontrivial theoretical insight.”
There is lots of uncertainty about that operationalization, but the main question is just whether there are way too many small things to figure out and iterate on rather than whether there are big insights.
(Generative modeling involves a little bit more machinery. I don’t have a strong view on whether they would figure out GANs or VAEs, though I’d guess so. Autoregressive models aren’t terrible anyway.)
They certainly wouldn’t come up with every trick or clever idea, but I expect they’d come up with the most important ones. With only 60 person-years they wouldn’t be able to put in very much domain-specific effort for any domain, so probably wouldn’t actually set SOTA, but I think they would likely get within a few years.
(I independently came up with the AGZ and GAN algorithms while writing safety posts, which I consider reasonable evidence that the ideas are natural and aren’t that hard. I expect there are a large number of cases of independent invention, with credit reasonably going to whoever actually gets it working.)
I don’t have as strong a view about whether this was also true in the 70s. By the late 80s, neural nets trained with backprop were a relatively prominent/popular hypothesis about how to build AGI, so you would have spent less time on alternatives. You have some simple algorithms each of which might turn out to not be obvious (like Q learning, which I think are roughly as tricky as the AGZ algorithm). You have the basic ideas for CNNs (though I haven’t looked into this extensively and don’t know how much of the idea was actually developed by 1990 vs. in 1998). I feel less comfortable betting on the grad students if you take all those things away. But realistically it’s more like a continuous increase in probability of success rather than some insight that happened in the 80s.
If you tried to improve the grad students’ performance by shipping back some critical insights, what would they be?
Do you think that solving Starcraft (by self-play) will require some major insight or will it be just a matter of incremental improvement of existing methods?
I don’t think it will require any new insight. It might require using slightly different algorithms—better techniques for scaling, different architectures to handle incomplete information, maybe a different training strategy to handle the very long time horizons; if they don’t tie their hands it’s probably also worth adding on a bunch of domain-specific junk.
Thanks for taking the time to write that up.
I updated towards a “fox” rather than “hedgehog” view of what intelligence is: you need to get many small things right, rather than one big thing. I’ll reply later if feel like I have a useful reply.