“being able to reorganise a question in the form of a model-appropriate game” seems like something we already have built a set of reasonable heuristics around—categorising different types of problems and their appropriate translations into ML-able tasks. There are well established ML approaches to, e.g. image captioning, time-series prediction, audio segmentation etc etc. is the bottleneck you’re concerned with the lack of breadth and granularity of these problem-sets, OP—and we can mark progress (to some extent) by the number of these problem sets we have robust ML translations for?
There are well established ML approaches to, e.g. image captioning, time-series prediction, audio segmentation etc etc. is the bottleneck you’re concerned with the lack of breadth and granularity of these problem-sets, OP—and we can mark progress (to some extent) by the number of these problem sets we have robust ML translations for?
I think this is an important problem. Going from progress on ML benchmarks to progress on real-world tasks is a very difficult challenge. For example, years after human level performance on ImageNet, we still have lots of trouble with real-world applications of computer vision like self-driving cars and medical diagnostics. That’s because ImageNet isn’t a directly valuable real world task, but rather is built to be amenable to supervised learning models that output a single class label for each input.
While scale will improve performance within established paradigms, putting real world problems into ML paradigms remains squarely a problem for human research taste.
“being able to reorganise a question in the form of a model-appropriate game” seems like something we already have built a set of reasonable heuristics around—categorising different types of problems and their appropriate translations into ML-able tasks. There are well established ML approaches to, e.g. image captioning, time-series prediction, audio segmentation etc etc. is the bottleneck you’re concerned with the lack of breadth and granularity of these problem-sets, OP—and we can mark progress (to some extent) by the number of these problem sets we have robust ML translations for?
I think this is an important problem. Going from progress on ML benchmarks to progress on real-world tasks is a very difficult challenge. For example, years after human level performance on ImageNet, we still have lots of trouble with real-world applications of computer vision like self-driving cars and medical diagnostics. That’s because ImageNet isn’t a directly valuable real world task, but rather is built to be amenable to supervised learning models that output a single class label for each input.
While scale will improve performance within established paradigms, putting real world problems into ML paradigms remains squarely a problem for human research taste.