I suspect that much of the appeal of shard theory is working through detailed explanations of model-free RL with general value function approximation for people who mostly think of AI in terms of planning/search/consequentialism.
But if you already come from a model-free RL value approx perspective, shard theory seems more natural.
Moment to moment decisions are made based on value-function bids, with little to no direct connection to reward or terminal values. The ‘shards’ are just what learned value-function approximating subcircuits look like in gory detail.
The brain may have a prior towards planning subcircuitry, but even without a strong prior planning submodules will eventually emerge naturally in a model-free RL learning machine of sufficient scale (there is no fundamental difference between model-free and model-based for universal learners). TD like updates ensure that the value function extends over longer timescales as training progresses. (and in general humans seem to plan on timescales which scale with their lifespan, as you’d expect)
I suspect that much of the appeal of shard theory is working through detailed explanations of model-free RL with general value function approximation for people who mostly think of AI in terms of planning/search/consequentialism.
But if you already come from a model-free RL value approx perspective, shard theory seems more natural.
Moment to moment decisions are made based on value-function bids, with little to no direct connection to reward or terminal values. The ‘shards’ are just what learned value-function approximating subcircuits look like in gory detail.
The brain may have a prior towards planning subcircuitry, but even without a strong prior planning submodules will eventually emerge naturally in a model-free RL learning machine of sufficient scale (there is no fundamental difference between model-free and model-based for universal learners). TD like updates ensure that the value function extends over longer timescales as training progresses. (and in general humans seem to plan on timescales which scale with their lifespan, as you’d expect)