When it comes to the ‘ideas’ vs. ‘compute’ spectrum:
It seems to me like one of the main differences (but probably not the core one?) is whether or not whether or not something works seems predictable. Suppose Alice thinks that it’s hard to come up with something that works, but things that look like they’ll work do with pretty high probability, and suppose Bob thinks it’s easy to see lots of things that might work, but things that might work rarely do; I think Alice is more likely to think we’re ideas-limited (since if we had a textbook from the future, we could just code it up and train it real quick) and Bob is more likely to think we’re compute-limited (since our actual progress is going to look much more like ruling out all of the bad ideas that are in between us and the good ideas, and the more computational experiments we can run, the faster that process can happen).
I tend to be quite close to the end of the ‘ideas’ spectrum, tho the issue is pretty nuanced and mixed.
I think one of the things that’s interesting to me is not how much training time can be optimized, but ‘model size’—what seems important is not whether our RL algorithm can solve a double-pendulum lightning-quick but whether we can put the same basic RL architecture into an octopus’s body and have it figure out how to control the tentacles quickly. If the ‘exponential effort to get linear returns’ story is true, even if we’re currently not making the most of our hardware, gains of 100x in utilization of hardware only turn into 2 higher steps in the return space. I think the primary thing that inclines me towards the ‘ideas will drive progress’ view is that if there’s a method that’s exponential effort to linear returns and another method that’s, say, polynomial effort to linear returns, the second method should blow past the exponential one pretty quickly. (Even something that reduces the base of the exponent would be a big deal for complicated tasks.)
If you go down that route, then I think you start thinking a lot about the efficiency of other things (like how good human Go players are at turning games into knowledge) and what information theory suggests about strategies, and so on. And you also start thinking about how close we are—for a lot of these things, just turning up the resources plowed into existing techniques can work (like beating DotA) and so it’s not clear we need to search for “phase change” strategies first. (Even if you’re interested in, say, something like curing cancer, it’s not clear whether continuing improvements to current NN-based molecular dynamics predictors, causal network discovery tools, and other diagnostic and therapeutic aids will get to the finish line first as opposed to figuring out how to build robot scientists and then putting them to work on curing cancer.)
When it comes to the ‘ideas’ vs. ‘compute’ spectrum:
It seems to me like one of the main differences (but probably not the core one?) is whether or not whether or not something works seems predictable. Suppose Alice thinks that it’s hard to come up with something that works, but things that look like they’ll work do with pretty high probability, and suppose Bob thinks it’s easy to see lots of things that might work, but things that might work rarely do; I think Alice is more likely to think we’re ideas-limited (since if we had a textbook from the future, we could just code it up and train it real quick) and Bob is more likely to think we’re compute-limited (since our actual progress is going to look much more like ruling out all of the bad ideas that are in between us and the good ideas, and the more computational experiments we can run, the faster that process can happen).
I tend to be quite close to the end of the ‘ideas’ spectrum, tho the issue is pretty nuanced and mixed.
I think one of the things that’s interesting to me is not how much training time can be optimized, but ‘model size’—what seems important is not whether our RL algorithm can solve a double-pendulum lightning-quick but whether we can put the same basic RL architecture into an octopus’s body and have it figure out how to control the tentacles quickly. If the ‘exponential effort to get linear returns’ story is true, even if we’re currently not making the most of our hardware, gains of 100x in utilization of hardware only turn into 2 higher steps in the return space. I think the primary thing that inclines me towards the ‘ideas will drive progress’ view is that if there’s a method that’s exponential effort to linear returns and another method that’s, say, polynomial effort to linear returns, the second method should blow past the exponential one pretty quickly. (Even something that reduces the base of the exponent would be a big deal for complicated tasks.)
If you go down that route, then I think you start thinking a lot about the efficiency of other things (like how good human Go players are at turning games into knowledge) and what information theory suggests about strategies, and so on. And you also start thinking about how close we are—for a lot of these things, just turning up the resources plowed into existing techniques can work (like beating DotA) and so it’s not clear we need to search for “phase change” strategies first. (Even if you’re interested in, say, something like curing cancer, it’s not clear whether continuing improvements to current NN-based molecular dynamics predictors, causal network discovery tools, and other diagnostic and therapeutic aids will get to the finish line first as opposed to figuring out how to build robot scientists and then putting them to work on curing cancer.)