One way of solving the problem of action selection is to treat our own actions as random variables, then build a probabilistic model, and condition that probability distribution on things going well for us in the future. See, for example, Planning By Probabilistic Inference, human brains (I would argue), upside-down RL (kinda, I think?), etc.
In this formulation, instrumental Occam’s razor is just a special case of epistemic Occam’s razor, it seems to me.
Yes, I agree with that. But (as I’ve said in the past) this formalism doesn’t do it for me. I have yet to see something which strikes me as a compelling argument in its favor.
So in the context of planning by probabilistic inference, instrumental occam seems almost like a bug rather than a feature—the unjustified bias toward simpler policies doesn’t seem to serve a clear purpose. It’s just an assumption.
Granted, the fact that I intuitively feel there should be some kind of instrumental occam is a point in favor of such methods in some sense.
I have yet to see something which strikes me as a compelling argument in its favor.
At some level, any system that does foresight-based planning has to treat its own actions as part of its world-model. In that context, the idea of “Treat my own actions as random variables—just like everything else in the world—and condition on good things happening in the future” is either exactly equivalent to Monte Carlo Tree Search, or awfully close, I think.
But whereas AlphaGo does MCTS in a simplistic, one-timestep-at-a-time manner, the vision here is to bring to bear the full power of the world-modeling framework—hierarchy, analogizing, compositionality, abstraction, etc. etc.—and apply all those tools to the task of MCTS action planning. Thus things like imitating conspecifics, and taking ideas from the world (“I see that wall is not fixed in place; so maybe I can move it!”), and building hierarchical probabilistic plans over arbitrary time-scales (“Maybe I can squeeze through that hole and see what’s on the other side”), and interleaving multiple plans, and on and on—all these things and more arise organically in this kind of framework. (...if the predictive-world-modeling tools are good enough, of course.)
Maybe there are other ways to formulate action selection that has all those features, but I don’t know them.
(Or I guess the simpler “compelling argument in its favor” is “It appears to be very effective in practice”.)
instrumental occam seems almost like a bug rather than a feature
Hmm. On further reflection, I don’t think the brain really uses Occam’s razor at all, either for action-selection or world-modeling, unless you define “complexity” in some way that makes it tautological. (As Charlie Steiner writes, you have to search through the infinite space of possibilities in some order, with some prior, and it’s both tempting and vacuous to say that the earlier choices are “simpler in the context of this algorithm” or whatever).
I think a more typical dynamic of brain algorithms is what Dileep George calls “memorize—generalize”, i.e. memorize a little snippet of pattern / idea (at some level of abstraction), and then see if the same pattern works on other contexts, or when glued together with other patterns, or when analogized to a different area.
I don’t currently think brain world-modeling algorithms have anything analogous to regularization.
One way of solving the problem of action selection is to treat our own actions as random variables, then build a probabilistic model, and condition that probability distribution on things going well for us in the future. See, for example, Planning By Probabilistic Inference, human brains (I would argue), upside-down RL (kinda, I think?), etc.
In this formulation, instrumental Occam’s razor is just a special case of epistemic Occam’s razor, it seems to me.
Yes, I agree with that. But (as I’ve said in the past) this formalism doesn’t do it for me. I have yet to see something which strikes me as a compelling argument in its favor.
So in the context of planning by probabilistic inference, instrumental occam seems almost like a bug rather than a feature—the unjustified bias toward simpler policies doesn’t seem to serve a clear purpose. It’s just an assumption.
Granted, the fact that I intuitively feel there should be some kind of instrumental occam is a point in favor of such methods in some sense.
At some level, any system that does foresight-based planning has to treat its own actions as part of its world-model. In that context, the idea of “Treat my own actions as random variables—just like everything else in the world—and condition on good things happening in the future” is either exactly equivalent to Monte Carlo Tree Search, or awfully close, I think.
But whereas AlphaGo does MCTS in a simplistic, one-timestep-at-a-time manner, the vision here is to bring to bear the full power of the world-modeling framework—hierarchy, analogizing, compositionality, abstraction, etc. etc.—and apply all those tools to the task of MCTS action planning. Thus things like imitating conspecifics, and taking ideas from the world (“I see that wall is not fixed in place; so maybe I can move it!”), and building hierarchical probabilistic plans over arbitrary time-scales (“Maybe I can squeeze through that hole and see what’s on the other side”), and interleaving multiple plans, and on and on—all these things and more arise organically in this kind of framework. (...if the predictive-world-modeling tools are good enough, of course.)
Maybe there are other ways to formulate action selection that has all those features, but I don’t know them.
(Or I guess the simpler “compelling argument in its favor” is “It appears to be very effective in practice”.)
Hmm. On further reflection, I don’t think the brain really uses Occam’s razor at all, either for action-selection or world-modeling, unless you define “complexity” in some way that makes it tautological. (As Charlie Steiner writes, you have to search through the infinite space of possibilities in some order, with some prior, and it’s both tempting and vacuous to say that the earlier choices are “simpler in the context of this algorithm” or whatever).
I think a more typical dynamic of brain algorithms is what Dileep George calls “memorize—generalize”, i.e. memorize a little snippet of pattern / idea (at some level of abstraction), and then see if the same pattern works on other contexts, or when glued together with other patterns, or when analogized to a different area.
I don’t currently think brain world-modeling algorithms have anything analogous to regularization.