Thanks! I also would really like to see more work directly on alignment plans, directed at types of AGI we’re likely to actually get on the current historical path.
I looked at your link; I don’t think Demis said it would include RL; the article author mentioned RL but Demis didn’t seem to.
At a high level you can think of Gemini as combining some of the strengths of AlphaGo-type systems with the amazing language capabilities of the large models
What do you think he meant by “AlphaGo-type systems”? I could be wrong, but I interpreted that as a reference to RL.
Just to play devil’s advocate, here’s an alternative: When there are multiple plausible tokens, maybe Gemini does multiple branching roll-outs for all of them, and then picks the branch that seems best (somehow or other).
That would be arguably consistent with saying “some of the strengths of AlphaGo-type systems”, in the sense that AlphaGo also did multiple roll-outs at inference time (and training time for that matter). But it wouldn’t entail any extra RL.
If this is true (a big “if”!), my vague impression is that it’s an obvious idea that has been tried lots of times but has been found generally unhelpful for LLMs. Maybe they found a way to make it work? Or maybe not but they’re doing it anyway because it sounds cool? Or maybe this whole comment is way off. I’m very far from an expert on this stuff.
That’s plausible. I had thought of another possible use of the rollout part of AlphaZero: trees of thought. That hasn’t been shown to help outside of particularly decomposable problems, and it’s pretty compute and time hungry, so that also doesn’t seem that useful.
Thanks! I also would really like to see more work directly on alignment plans, directed at types of AGI we’re likely to actually get on the current historical path.
I looked at your link; I don’t think Demis said it would include RL; the article author mentioned RL but Demis didn’t seem to.
He said:
What do you think he meant by “AlphaGo-type systems”? I could be wrong, but I interpreted that as a reference to RL.
I missed that. I agree that firmly implies the use of RL.
Just to play devil’s advocate, here’s an alternative: When there are multiple plausible tokens, maybe Gemini does multiple branching roll-outs for all of them, and then picks the branch that seems best (somehow or other).
That would be arguably consistent with saying “some of the strengths of AlphaGo-type systems”, in the sense that AlphaGo also did multiple roll-outs at inference time (and training time for that matter). But it wouldn’t entail any extra RL.
If this is true (a big “if”!), my vague impression is that it’s an obvious idea that has been tried lots of times but has been found generally unhelpful for LLMs. Maybe they found a way to make it work? Or maybe not but they’re doing it anyway because it sounds cool? Or maybe this whole comment is way off. I’m very far from an expert on this stuff.
That’s plausible. I had thought of another possible use of the rollout part of AlphaZero: trees of thought. That hasn’t been shown to help outside of particularly decomposable problems, and it’s pretty compute and time hungry, so that also doesn’t seem that useful.