Interesting! I have to read the papers in more depth but here are some of my initial reactions to that idea (let me know if it’s been addressed already):
AFAICT using learning to replace GPS either requires:1) Training examples of good actions or 2) An environment like chess where we can rapidly gain feedback through simulation. Sampling from the environment would be much more costly when these assumptions break down, and general purpose search can enable lower sample complexity because we get to use all the information in the world model
General purpose search requires certain properties of the world model that seem to be missing in current models. For instance, decomposing goals into subgoals is important for dealing with a high-dimensional action space, and that requires a high degree of modularity in the world model. Lazy world-modeling also seems important for planning in a world larger than yourself, but most of these properties aren’t present in the toy environments we use
Learning can be a component of general purpose search (eg as a general purpose generator of heuristic), where we can learn to rearrange the search ordering of actions so that more effective actions are searched first
I think using a fixed number of forward-passes to approximate GPS will eventually face limitations in environments that are complexed enough, because the space of programs which can dedicate potentially unlimited time to find solutions is strictly more expressive than the space of programs that has a fixed inference time
Agree, learning can’t entirely replace General Purpose Search, and I agree something like General Purpose Search will still in practice be the backbone behind learning, due to your reasoning.
That is, General Purpose Search will still be necessary for AIs, if only due to bootstrapping concerns, and I agree with your list of benefits of General Purpose Search.
Interesting! I have to read the papers in more depth but here are some of my initial reactions to that idea (let me know if it’s been addressed already):
AFAICT using learning to replace GPS either requires:1) Training examples of good actions or 2) An environment like chess where we can rapidly gain feedback through simulation. Sampling from the environment would be much more costly when these assumptions break down, and general purpose search can enable lower sample complexity because we get to use all the information in the world model
General purpose search requires certain properties of the world model that seem to be missing in current models. For instance, decomposing goals into subgoals is important for dealing with a high-dimensional action space, and that requires a high degree of modularity in the world model. Lazy world-modeling also seems important for planning in a world larger than yourself, but most of these properties aren’t present in the toy environments we use
Learning can be a component of general purpose search (eg as a general purpose generator of heuristic), where we can learn to rearrange the search ordering of actions so that more effective actions are searched first
I think using a fixed number of forward-passes to approximate GPS will eventually face limitations in environments that are complexed enough, because the space of programs which can dedicate potentially unlimited time to find solutions is strictly more expressive than the space of programs that has a fixed inference time
Agree, learning can’t entirely replace General Purpose Search, and I agree something like General Purpose Search will still in practice be the backbone behind learning, due to your reasoning.
That is, General Purpose Search will still be necessary for AIs, if only due to bootstrapping concerns, and I agree with your list of benefits of General Purpose Search.