Learning on-the-fly remains, but I expect some combination of sim2real and muZero to work here.
Hmm? sim2real AFAICT is an approach to generating synthetic data, not to learning. MuZero is a system that can learn to play a bunch of games, with an architecture very unlike LLMs. This sentence doesn’t typecheck for me; what way of combining these concepts with LLMs are you imagining?
Imagine you were trying to build a robot that could: 1. Solve a complex mechanical puzzle it has never seen before 2. Play at an expert level a board game that I invented just now.
Both of these are examples of learning-on-the-fly. No amount of pre-training will ever produce a satisfying result.
The way I believe a human (or a cat) solves 1. is they: look at the puzzle, try some things, build a model of the toy in their head, try things on the model in their head, eventually solve the puzzle. There are efforts to get robots to follow the same process, but nothing I would consider “this is the obvious correct solution” quite yet.
The way to solve 2. (I think) is simply to have the LLM translate the rules of the game into a formal description and then run muZero on that.
Ideally there is some unified system that takes out the “translate into another domain and do your training there” step (which feels very anti-bitter-lesson). But I confess I haven’t the slightest idea how to build such a system.
Hmm? sim2real AFAICT is an approach to generating synthetic data, not to learning. MuZero is a system that can learn to play a bunch of games, with an architecture very unlike LLMs. This sentence doesn’t typecheck for me; what way of combining these concepts with LLMs are you imagining?
Imagine you were trying to build a robot that could:
1. Solve a complex mechanical puzzle it has never seen before
2. Play at an expert level a board game that I invented just now.
Both of these are examples of learning-on-the-fly. No amount of pre-training will ever produce a satisfying result.
The way I believe a human (or a cat) solves 1. is they: look at the puzzle, try some things, build a model of the toy in their head, try things on the model in their head, eventually solve the puzzle. There are efforts to get robots to follow the same process, but nothing I would consider “this is the obvious correct solution” quite yet.
The way to solve 2. (I think) is simply to have the LLM translate the rules of the game into a formal description and then run muZero on that.
Ideally there is some unified system that takes out the “translate into another domain and do your training there” step (which feels very anti-bitter-lesson). But I confess I haven’t the slightest idea how to build such a system.