Does it need to be either “pure search” or “no search”?
My expectation would be that in the limit it learns a ton of heuristics about what usually works, and learns to do a much more efficient search using those heuristics. This would especially be the case if e.g. capabilities researchers give the nets extra options to speed up the search (for instance, it’s totally possible to embed a few steps of gradient descent into the inference of a neural network, since gradient descent is differentiable—I don’t know if that will eventually be shown to help improve capabilities, but if it will, capabilities researchers would presumably do it).
Agreed that “search” is not a binary but more like a continuum, where we might call a program more “search-like” if it is enumerating possible actions and evaluating their consequences, and less “search-like” if it is directly mapping representations of inputs to actions. The argument in this post is that gradient descent (unlike evolution, and unlike human programmers) doesn’t select much for “search-like” programs. If we take depth-first search as a central example of search, and a thermostat as the paradigmatic non-search program, gradient descent will select for something more like the thermostat.
it’s totally possible to embed a few steps of gradient descent into the inference of a neural network, since gradient descent is differentiable
Agreed, and networks may even be learning something like this already! But in my ontology I wouldn’t call an algorithm that performs, say, 5 steps of gradient descent over a billion-parameter space and then outputs an action very “search-like”; the “search” part is generating a tiny fraction of the optimization pressure, relative to whatever process sets up the initial state and the error signal.
Maybe this is just semantics, because for high levels of capability search and control are not fundamentally different (what you’re pointing to with “much more efficient search”—an infinitely efficient search is just optimal control, you never even consider suboptimal actions!). But it does seem like for a fixed level of capabilities search is more brittle, somehow, and more likely to misgeneralize catastrophically.
Does it need to be either “pure search” or “no search”?
My expectation would be that in the limit it learns a ton of heuristics about what usually works, and learns to do a much more efficient search using those heuristics. This would especially be the case if e.g. capabilities researchers give the nets extra options to speed up the search (for instance, it’s totally possible to embed a few steps of gradient descent into the inference of a neural network, since gradient descent is differentiable—I don’t know if that will eventually be shown to help improve capabilities, but if it will, capabilities researchers would presumably do it).
Agreed that “search” is not a binary but more like a continuum, where we might call a program more “search-like” if it is enumerating possible actions and evaluating their consequences, and less “search-like” if it is directly mapping representations of inputs to actions. The argument in this post is that gradient descent (unlike evolution, and unlike human programmers) doesn’t select much for “search-like” programs. If we take depth-first search as a central example of search, and a thermostat as the paradigmatic non-search program, gradient descent will select for something more like the thermostat.
Agreed, and networks may even be learning something like this already! But in my ontology I wouldn’t call an algorithm that performs, say, 5 steps of gradient descent over a billion-parameter space and then outputs an action very “search-like”; the “search” part is generating a tiny fraction of the optimization pressure, relative to whatever process sets up the initial state and the error signal.
Maybe this is just semantics, because for high levels of capability search and control are not fundamentally different (what you’re pointing to with “much more efficient search”—an infinitely efficient search is just optimal control, you never even consider suboptimal actions!). But it does seem like for a fixed level of capabilities search is more brittle, somehow, and more likely to misgeneralize catastrophically.