Just to play devil’s advocate, here’s an alternative: When there are multiple plausible tokens, maybe Gemini does multiple branching roll-outs for all of them, and then picks the branch that seems best (somehow or other).
That would be arguably consistent with saying “some of the strengths of AlphaGo-type systems”, in the sense that AlphaGo also did multiple roll-outs at inference time (and training time for that matter). But it wouldn’t entail any extra RL.
If this is true (a big “if”!), my vague impression is that it’s an obvious idea that has been tried lots of times but has been found generally unhelpful for LLMs. Maybe they found a way to make it work? Or maybe not but they’re doing it anyway because it sounds cool? Or maybe this whole comment is way off. I’m very far from an expert on this stuff.
That’s plausible. I had thought of another possible use of the rollout part of AlphaZero: trees of thought. That hasn’t been shown to help outside of particularly decomposable problems, and it’s pretty compute and time hungry, so that also doesn’t seem that useful.
Just to play devil’s advocate, here’s an alternative: When there are multiple plausible tokens, maybe Gemini does multiple branching roll-outs for all of them, and then picks the branch that seems best (somehow or other).
That would be arguably consistent with saying “some of the strengths of AlphaGo-type systems”, in the sense that AlphaGo also did multiple roll-outs at inference time (and training time for that matter). But it wouldn’t entail any extra RL.
If this is true (a big “if”!), my vague impression is that it’s an obvious idea that has been tried lots of times but has been found generally unhelpful for LLMs. Maybe they found a way to make it work? Or maybe not but they’re doing it anyway because it sounds cool? Or maybe this whole comment is way off. I’m very far from an expert on this stuff.
That’s plausible. I had thought of another possible use of the rollout part of AlphaZero: trees of thought. That hasn’t been shown to help outside of particularly decomposable problems, and it’s pretty compute and time hungry, so that also doesn’t seem that useful.