Relatedly, the story does the gish-gallop thing where many of the links do not actually support the claim they are called on to support. For example, in “learning implicit tree search à la MuZero”, the link to MuZero does not support the claim that MuZero learns implicit tree search. (Originally the link directed to the MuZero paper, which definitely does not do implicit tree search, since it has explicit tree search hard-coded in; now the link goes to gwern’s page on MuZero, which a collection of many papers and it is unclear which one is about learning to do implicit tree search. Note that as far as I know, every Go program that can beat humans has tree search explicitly built in, so implicit tree search is not really a thing.)
The training routine of MuZero (and AlphaZero etc) uses explicit tree search as a source of better policies than the one the model currently spits out, and the model is adapted to output these better policies.
The model is trying to predict the output of the explicit tree search. There’s room to argue over whether or not it “learns implicit tree search” (ie learns to actually “run a search” internally in some sense), but certainly the possibility is not precluded by the presence of the explicit search; the only reason the explicit search is there at all is to give the model a signal about what it should aspire to do without explicit search.
It’s also true that, when the trained models are run in practice, they are usually run with explicit search on top, and this improves their performance. This does not mean they haven’t learned implicit search—only that a single forward pass of the model cannot do as well as a search guided by many forward passes of the same model, which is not a surprising outcome for any model (even models which do some kind of search inside each forward pass).
You’re at most making the claim that MuZero attempts to learn tree search. Does the MuZero paper provide any evidence that MuZero in fact does implicit tree search? I think not, which means it’s still misleading to link to that paper while claiming it shows neural nets can learn implicit tree search (I don’t particularly doubt the can learn it a bit, but I do contest the implication that MuZero does so to any substantial degree or that a non-negligible part of its strength comes from learning implicit tree search).
Edit: I should clarify what would change my mind here. If someone could show that MuZero (or any scaled-up variant of it) can beat humans at Go with the neural-net model alone (without the explicit tree search on top), I would change my mind. To my knowledge, no paper is currently claiming this, but let me know if I am wrong. Since my understanding is that the neural nets alone cannot beat humans, my interpretation is that the neural net part is providing something like roughly human-level “intuition” about what the right move should be, but without any actual search, so humans can still outperform this intuition machine by doing explicit search; but once you add on the tree search, the machines crush humans due to their speed.
Relatedly, the story does the gish-gallop thing where many of the links do not actually support the claim they are called on to support. For example, in “learning implicit tree search à la MuZero”, the link to MuZero does not support the claim that MuZero learns implicit tree search. (Originally the link directed to the MuZero paper, which definitely does not do implicit tree search, since it has explicit tree search hard-coded in; now the link goes to gwern’s page on MuZero, which a collection of many papers and it is unclear which one is about learning to do implicit tree search. Note that as far as I know, every Go program that can beat humans has tree search explicitly built in, so implicit tree search is not really a thing.)
I don’t agree with your read of the MuZero paper.
The training routine of MuZero (and AlphaZero etc) uses explicit tree search as a source of better policies than the one the model currently spits out, and the model is adapted to output these better policies.
The model is trying to predict the output of the explicit tree search. There’s room to argue over whether or not it “learns implicit tree search” (ie learns to actually “run a search” internally in some sense), but certainly the possibility is not precluded by the presence of the explicit search; the only reason the explicit search is there at all is to give the model a signal about what it should aspire to do without explicit search.
It’s also true that, when the trained models are run in practice, they are usually run with explicit search on top, and this improves their performance. This does not mean they haven’t learned implicit search—only that a single forward pass of the model cannot do as well as a search guided by many forward passes of the same model, which is not a surprising outcome for any model (even models which do some kind of search inside each forward pass).
You’re at most making the claim that MuZero attempts to learn tree search. Does the MuZero paper provide any evidence that MuZero in fact does implicit tree search? I think not, which means it’s still misleading to link to that paper while claiming it shows neural nets can learn implicit tree search (I don’t particularly doubt the can learn it a bit, but I do contest the implication that MuZero does so to any substantial degree or that a non-negligible part of its strength comes from learning implicit tree search).
Edit: I should clarify what would change my mind here. If someone could show that MuZero (or any scaled-up variant of it) can beat humans at Go with the neural-net model alone (without the explicit tree search on top), I would change my mind. To my knowledge, no paper is currently claiming this, but let me know if I am wrong. Since my understanding is that the neural nets alone cannot beat humans, my interpretation is that the neural net part is providing something like roughly human-level “intuition” about what the right move should be, but without any actual search, so humans can still outperform this intuition machine by doing explicit search; but once you add on the tree search, the machines crush humans due to their speed.