“Blessings of scale” observations aside, it seems like right now, environments are not the bottleneck to DL/DRL work. No one failed to solve Go because gosh darn it, they just lacked a good Go simulator which correctly implemented the rules of the game; the limits to solving ALE-57 (like Montezuma’s Revenge) in general or as a single multi-task agent do not seem to be lack of Atari games where what we really need is ALE-526*; Procgen performance is not weak because of insufficient variation in levels; OpenAI Universe failed not for lack of tasks, to say the least; the challenge in creating or replicating GPT-3 is not in scraping the text (and GPT-3 didn’t even run 1 epoch!). Datasets/environments sometimes unlock new performance, like ImageNet, but even when one saturates, there’s typically more datasets which are not yet solved and cannot be solved simultaneously (JFT-300M, for example), and in the case of RL of course compute=data. If you went to any DRL researcher, I don’t think many of them would name “we’ve solved all the existing environments to superhuman level and have unemployed ourselves!” as their biggest bottleneck.
Is it really the case that at some point we will be drowning in so many GPUs and petaflops that our main problem will become coming up with ever more difficult tasks to give them something useful to train on? Or is this specifically a claim about friendly AGI, where we lack any kind of environment which would seem to force alignment for maximum score?
* Apparently the existing ALE suite was chosen pretty haphazardly:
Our testing set was constructed by choosing semi-randomly from the 381 games listed on Wikipedia at the time of writing. Of these games, 123 games have their own Wikipedia page, have a single player mode, are not adult-themed or prototypes, and can be emulated in ALE. From this list, 50 games were chosen at random to form the test set.
I wonder how the history of DRL would’ve changed if they had happened to select from the other 73, or if Pitfall & Montezuma’s Revenge had been omitted? I don’t however, think it would’ve been a good use of their time in 2013 to work on adding more ALE games rather than, say, debugging GPU libraries to make it easier to run NNs at all...
The fact that progress on existing environments (Go, ALE-57, etc) isn’t bottlenecked by environments doesn’t seem like particularly useful evidence. The question is whether we could be making much more progress towards AGI with environments that were more conducive to developing AGI. The fact that we’re running out of “headline” challenges along the lines of Go and Starcraft is one reason to think that having better environments would make a big difference—although to be clear, the main focus of my post is on the coming decades, and the claim that environments are currently a bottleneck does seem much weaker.
More concretely, is it possible to construct some dataset on which our current methods would get significantly closer to AGI than they are today? I think that’s plausible—e.g. perhaps we could take the linguistic corpus that GPT-3 was trained on, and carefully annotate what counts as good reasoning and what doesn’t. (In some ways this is what reward modelling is trying to do—but that focuses more on alignment than capabilities.)
Or another way of putting it: suppose we gave the field of deep learning 10,000x current compute and algorithms that are 10 years ahead of today. Would people know what to apply them to, in order to get much closer to AGI? If not, this also suggests that environments will be a bottleneck unless someone focuses on them within the next decade.
“Blessings of scale” observations aside, it seems like right now, environments are not the bottleneck to DL/DRL work. No one failed to solve Go because gosh darn it, they just lacked a good Go simulator which correctly implemented the rules of the game; the limits to solving ALE-57 (like Montezuma’s Revenge) in general or as a single multi-task agent do not seem to be lack of Atari games where what we really need is ALE-526*; Procgen performance is not weak because of insufficient variation in levels; OpenAI Universe failed not for lack of tasks, to say the least; the challenge in creating or replicating GPT-3 is not in scraping the text (and GPT-3 didn’t even run 1 epoch!). Datasets/environments sometimes unlock new performance, like ImageNet, but even when one saturates, there’s typically more datasets which are not yet solved and cannot be solved simultaneously (JFT-300M, for example), and in the case of RL of course compute=data. If you went to any DRL researcher, I don’t think many of them would name “we’ve solved all the existing environments to superhuman level and have unemployed ourselves!” as their biggest bottleneck.
Is it really the case that at some point we will be drowning in so many GPUs and petaflops that our main problem will become coming up with ever more difficult tasks to give them something useful to train on? Or is this specifically a claim about friendly AGI, where we lack any kind of environment which would seem to force alignment for maximum score?
* Apparently the existing ALE suite was chosen pretty haphazardly:
I wonder how the history of DRL would’ve changed if they had happened to select from the other 73, or if Pitfall & Montezuma’s Revenge had been omitted? I don’t however, think it would’ve been a good use of their time in 2013 to work on adding more ALE games rather than, say, debugging GPU libraries to make it easier to run NNs at all...
The fact that progress on existing environments (Go, ALE-57, etc) isn’t bottlenecked by environments doesn’t seem like particularly useful evidence. The question is whether we could be making much more progress towards AGI with environments that were more conducive to developing AGI. The fact that we’re running out of “headline” challenges along the lines of Go and Starcraft is one reason to think that having better environments would make a big difference—although to be clear, the main focus of my post is on the coming decades, and the claim that environments are currently a bottleneck does seem much weaker.
More concretely, is it possible to construct some dataset on which our current methods would get significantly closer to AGI than they are today? I think that’s plausible—e.g. perhaps we could take the linguistic corpus that GPT-3 was trained on, and carefully annotate what counts as good reasoning and what doesn’t. (In some ways this is what reward modelling is trying to do—but that focuses more on alignment than capabilities.)
Or another way of putting it: suppose we gave the field of deep learning 10,000x current compute and algorithms that are 10 years ahead of today. Would people know what to apply them to, in order to get much closer to AGI? If not, this also suggests that environments will be a bottleneck unless someone focuses on them within the next decade.