My big takeaway from the AlphaTensor paper is that DeepMind have extraordinary taste: they’re able to identify problems that their approach to AI can tackle well, today; and they can figure out how to regiment those problems in such a way as make the AI tackle it:
Their approach is a variant of the deep reinforcement-learning-guided Monte Carlo tree search that they have applied so successfully to playing Chess and Go. What they have done, very effectively, is to design a game with the objective of finding the most efficient tensor multiplication algorithm for a matrix of some dimension.
On the presupposition that we don’t get AGI, much more AI research will look a lot like this. Find a good question, figure out how to regiment the question into the constraints imposed by the model’s architecture, and apply the model to answer the question.
But those skills don’t look like the sort of thing that benefit from scaling compute:
This ability – being able to reorganise a question in the form of a model-appropriate game – doesn’t look nearly as susceptible to Moore’s Law-style exponential speed-ups. Researchers’ insights and abilities – in other terms, researcher productivity – don’t scale exponentially. It takes time and energy and the serendipitousness of a well-functioning research lab to cultivate them. Scaling compute down to an effective cost of zero doesn’t help if we’re not using these models to attack the right issues in the right way.
So the bottleneck won’t be compute! The bottleneck will be the sort of excellent taste that DeepMind keep displaying.
Everything benefits from scaling compute, because with more compute you can indiscriminately try more of everything, apply more meta-learning, etc.
Sure, but coming up with what to try, which hyperparameters to adjust, which heuristics to apply, etc. is not something for which we have a meaningful programme. You can’t brute force taste!
(Even if we eventually CAN, we’ve got a long way to go before we can disregard the intentions of the researchers entirely. You can’t Goodhart taste either!)
“being able to reorganise a question in the form of a model-appropriate game” seems like something we already have built a set of reasonable heuristics around—categorising different types of problems and their appropriate translations into ML-able tasks. There are well established ML approaches to, e.g. image captioning, time-series prediction, audio segmentation etc etc. is the bottleneck you’re concerned with the lack of breadth and granularity of these problem-sets, OP—and we can mark progress (to some extent) by the number of these problem sets we have robust ML translations for?
I think this is an important problem. Going from progress on ML benchmarks to progress on real-world tasks is a very difficult challenge. For example, years after human level performance on ImageNet, we still have lots of trouble with real-world applications of computer vision like self-driving cars and medical diagnostics. That’s because ImageNet isn’t a directly valuable real world task, but rather is built to be amenable to supervised learning models that output a single class label for each input.
While scale will improve performance within established paradigms, putting real world problems into ML paradigms remains squarely a problem for human research taste.