I think you’ve misunderstood the lesson, and mis-generalized from your experience of manually improving results.
First, I don’t believe that you could write a generally-useful program to improve translations—maybe for Korean song lyrics, yes, but investing lots of human time and knowledge in solving a specific problem is exactly the class of mistakes the bitter lesson warns against.
Second, the techniques that were useful before capable models are usually different to the techniques that are useful to ‘amplify’ models—for example, “Let’s think step by step” would be completely useless to combine with previous question-answering techniques.
Third, the bitter lesson is not about deep learning; it’s about methods which leverage large amounts of compute. AlphaGo combining MCTS with learned heuristics is a perfect example of this:
One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.
I think your response shows I understood it pretty well. I used an example that you directly admit is against what the bitter lesson tries to teach as my primary example. I also never said anything about being able to program something directly better.
I pointed out that I used the things people decided to let go of so that I could improve the results massively over the current state of the machine translation for my own uses, and then implied we should do things like give language models dictionaries and information about parts of speech that it can use as a reference or starting point. We can still use things as an improvement over pure deep learning, by simply letting the machine use them as a reference. It would have to be trained to do so, of course, but that seems relatively easy.
The bitter lesson is about ‘scale is everything,’ but AlphaGo and its follow-ups use massively less compute to get up to those levels! Their search is not an exhaustive one, but a heuristic one that requires very little compute comparatively. Heuristic searches are less general, not more. It should be noted that I only mentioned AlphaGo to show that even it wasn’t a victory of scale like some people commonly seem to believe. It involved taking advantage of the fact that we know the structure of the game to give it a leg up.
I think you’ve misunderstood the lesson, and mis-generalized from your experience of manually improving results.
First, I don’t believe that you could write a generally-useful program to improve translations—maybe for Korean song lyrics, yes, but investing lots of human time and knowledge in solving a specific problem is exactly the class of mistakes the bitter lesson warns against.
Second, the techniques that were useful before capable models are usually different to the techniques that are useful to ‘amplify’ models—for example, “Let’s think step by step” would be completely useless to combine with previous question-answering techniques.
Third, the bitter lesson is not about deep learning; it’s about methods which leverage large amounts of compute. AlphaGo combining MCTS with learned heuristics is a perfect example of this:
I think your response shows I understood it pretty well. I used an example that you directly admit is against what the bitter lesson tries to teach as my primary example. I also never said anything about being able to program something directly better.
I pointed out that I used the things people decided to let go of so that I could improve the results massively over the current state of the machine translation for my own uses, and then implied we should do things like give language models dictionaries and information about parts of speech that it can use as a reference or starting point. We can still use things as an improvement over pure deep learning, by simply letting the machine use them as a reference. It would have to be trained to do so, of course, but that seems relatively easy.
The bitter lesson is about ‘scale is everything,’ but AlphaGo and its follow-ups use massively less compute to get up to those levels! Their search is not an exhaustive one, but a heuristic one that requires very little compute comparatively. Heuristic searches are less general, not more. It should be noted that I only mentioned AlphaGo to show that even it wasn’t a victory of scale like some people commonly seem to believe. It involved taking advantage of the fact that we know the structure of the game to give it a leg up.