Imagine you were trying to build a robot that could: 1. Solve a complex mechanical puzzle it has never seen before 2. Play at an expert level a board game that I invented just now.
Both of these are examples of learning-on-the-fly. No amount of pre-training will ever produce a satisfying result.
The way I believe a human (or a cat) solves 1. is they: look at the puzzle, try some things, build a model of the toy in their head, try things on the model in their head, eventually solve the puzzle. There are efforts to get robots to follow the same process, but nothing I would consider “this is the obvious correct solution” quite yet.
The way to solve 2. (I think) is simply to have the LLM translate the rules of the game into a formal description and then run muZero on that.
Ideally there is some unified system that takes out the “translate into another domain and do your training there” step (which feels very anti-bitter-lesson). But I confess I haven’t the slightest idea how to build such a system.
Imagine you were trying to build a robot that could:
1. Solve a complex mechanical puzzle it has never seen before
2. Play at an expert level a board game that I invented just now.
Both of these are examples of learning-on-the-fly. No amount of pre-training will ever produce a satisfying result.
The way I believe a human (or a cat) solves 1. is they: look at the puzzle, try some things, build a model of the toy in their head, try things on the model in their head, eventually solve the puzzle. There are efforts to get robots to follow the same process, but nothing I would consider “this is the obvious correct solution” quite yet.
The way to solve 2. (I think) is simply to have the LLM translate the rules of the game into a formal description and then run muZero on that.
Ideally there is some unified system that takes out the “translate into another domain and do your training there” step (which feels very anti-bitter-lesson). But I confess I haven’t the slightest idea how to build such a system.