quanticle comments on The ‘Bitter Lesson’ is Wrong

quanticle 21 Aug 2022 3:52 UTC
13 points
5

Even the example of a thing like AlphaGo wasn’t so much a triumph of deep-learning over everything, as, if you combine insights like search algorithms (Monte Carlo tree search) with trained heuristics from countless things, it goes much better.

AlphaGo isn’t an example of the bitter lesson. The bitter lesson is that AlphaZero, which was trained using pure self play, was able to defeat AlphaGo, with all of its careful and optimization and foreknowledge of heuristics.
- deepthoughtlife 21 Aug 2022 13:21 UTC
  1 point
  0
  Parent
  The entire thing I wrote is that marrying human insights, tools, etc with the scale increases leads to higher performance, and shouldn’t be discarded, not that you can’t do better with a crazy amount of resources than a small amount of resources and human insight.
  Much later, with much more advancement, things improve. Two years after the things AlphaGo was famous for happened, they used scale to surpass it, without changing any of the insights. Generating games against itself is not a change to the fundamental approach in a well defined game like Go. Simple games like Go are very well suited to the brute-force approach. It isn’t in the post, but this is more akin to using programs to generate math data for a network you want to know math. We could train a network on an infinite amount of symbolic math because we have easy generative programs, limited only by the cost we wanted. We could also just give those programs to an AI, and train it to use them. This is identical to what they did for AlphaZero. AlphaZero still uses the approach humans decided upon, not reinventing things on its own.
  Massive scale increases surpassing the achievements of earlier things is not something I argue against in the above post. Not using human data is hardly the same thing as not using human domain insights.
  It isn’t until 2 years after AlphaZero that they managed to make a version that actually learned how to play it on its own with MuZero. Given the scale rate increases in the field during that time, it’s hardly interesting that eventually it happened, but the scaling required an immense increase in money in the field in addition to algorithmic improvements.
  - quanticle 25 Aug 2022 4:20 UTC
    4 points
    3
    Parent
    I’m not sure how any of what you said actually disproves the Bitter Lesson. Maybe AlphaZero isn’t the best example of the Bitter Lesson, and MuZero is a better example. So what? Scale caught up eventually, though we may bicker about the exact timing.
    
    Massive scale increases surpassing the achievements of earlier things is not something I argue against in the above post. Not using human data is hardly the same thing as not using human domain insights.
    
    AlphaZero didn’t use any human domain insights. It used a tree search algorithm that’s generic across a number of different games. The entire reason that AlphaZero was so impressive when it was released was that it used an algorithm that did not encode any domain specific insights, but was still able to exceed state-of-the-art AI performance across multiple domains (in this case, chess and Go).
    - deepthoughtlife 25 Aug 2022 13:22 UTC
      1 point
      0
      Parent
      I was pretty explicit that scale improves things and eventually surpasses any particular level that you get to earlier with the help of domain knowledge...my point is that you can keep helping it, and it will still be better than it would be with just scale. MuZero is just evidence that scale eventually gets you to the place you already were, because they were trying very hard to get there and it eventually worked.
      AlphaZero did use domain insights. Just like AlphaGo. It wasn’t self-directed. It was told the rules. It was given a direct way to play games, and told to. It was told how to search. Domain insights in the real world are often simply being told which general strategies will work best. Domain insights aren’t just things like, ‘a knight is worth this many points’ in chess, or whatever the human-score equivalent is in Go (which I haven’t played.). Humans tweaked and altered things until they got the results they wanted from training. If they understood that they were doing so, and accepted it, they could get better results sooner, and much more cheaply.
      Also, state of the art isn’t the best that can be done.
      - quanticle 15 Sep 2022 4:37 UTC
        1 point
        −1
        Parent
        
        Domain insights in the real world are often simply being told which general strategies will work best.
        
        No, that’s not what domain insights are. Domain insights are just that, insights which are limited to a specific domain. So something like, “Trying to control the center of the board,” is a domain insight for chess. Another example of chess-specific domain insights is the large set of pre-computed opening and endgame books that engines like Stockfish are equipped with. These are specific to the domain of chess, and are not applicable to other games, such as Go.
        
        An AI that can use more general algorithms, such as tree search, to effectively come up with new domain insights is more general than an AI that has been trained with domain specific insights. AlphaZero is such an AI. The rules of chess are not insights. They are constraints. Insights, in this context, are ideas about which moves one can make within the constraints imposed by the rules in order to reach the objective most efficiently. They are heuristics that allow you to evaluate positions and strategies without having to calculate all the way out to the final move (a task that may be computationally infeasible).
        
        AlphaZero did not have any such insights. No one gave AlphaZero any heuristics about how to evaluate board positions. No one told it any tips or tricks about strategies that would make it more likely to end up in a winning position. It figured out everything on its own and did so at a level that was better than similar AIs that had been seeded with those heuristics. That is the true essence of the Bitter Lesson: human insights often make things worse. They slow the AI down. The best way to progress is just to add more scale, add more compute, and let the neural net figure things out on its own within the constraints that it’s been given.
        deepthoughtlife 17 Sep 2022 1:50 UTC
        1 point
        0
        Parent
        No. That’s a foolish interpretation of domain insight. We have a massive number of highly general strategies that nonetheless work better for some things than others. A domain insight is simply some kind of understanding involving the domain being put to use. Something as simple as whether to use a linked list or an array can use a minor domain insight. Whether to use a monte carlo search or a depth limited search and so one are definitely insights. Most advances in AI to this point have in fact been based on domain insights, and only a small amount on scaling within an approach (though more so recently). Even the ‘bitter lesson’ is an attempted insight into the domain (that is wrong due to being a severe overreaction to previous failure.)
        Also, most domain insights are in fact an understanding of constraints. ‘This path will never have a reward’ is both an insight and a constraint. ‘Dying doesn’t allow me to get the reward later’ is both a constraint and a domain insight. So is ‘the lists I sort will never have numbers that aren’t between 143 and 987’ (which is useful for and O(n) type of sorting). We are, in fact, trying to automate the process of getting domain insights via machine with this whole enterprise in AI, especially in whatever we have trained them for.
        Even, ‘should we scale via parameters or data’ is a domain insight. They recently found out they had gotten that wrong (Chinchilla) too because they focused too much on just scaling.
        Alphazero was given some minor domain insights (how to search and how to play the game), years later, and ended up slightly beating a much earlier approach, because they were trying to do that specifically. I specifically said that sort of thing happens. It’s just not as good as it could have been (probably).
        quanticle 13 Oct 2022 21:30 UTC
        2 points
        0
        Parent
        And now we have the same algorithms that were used to conquer Go and chess being used to conquer matrix multiplication.
        
        Are you still sure that AlphaZero is “domain specific”? And if so, what definition of “domain” covers board games, Atari video games, and matrix multiplication? At what point does the “domain” in question just become, “Thinking?”