Sure you can consider the TD style unrolling in model-free a sort of implicit planning, but it’s not really consequentialist in most situations as it can’t dynamically explore new relevant expansions of the state tree the way planning can. Or you could consider planning as a dynamic few-shot extension to fast learning/updating the decision function.
Human planning is sometimes explicit timestep by timestep (when playing certain board games for example) when that is what efficient planning demands, but in the more general case human planning uses more complex approximations that more freely jump across spatio-temporal approximation hierarchies.
Sure you can consider the TD style unrolling in model-free a sort of implicit planning, but it’s not really consequentialist in most situations as it can’t dynamically explore new relevant expansions of the state tree the way planning can. Or you could consider planning as a dynamic few-shot extension to fast learning/updating the decision function.
Human planning is sometimes explicit timestep by timestep (when playing certain board games for example) when that is what efficient planning demands, but in the more general case human planning uses more complex approximations that more freely jump across spatio-temporal approximation hierarchies.