At this point I think the general shape of brain-inspired algorithms for efficient model-based planning are fairly obvious but they translate into a use of large (ie TBs) of ‘fast weight’ memory at different timescales (mostly in prefrontal cortex, BG, hippocampus-adjacent and associated) combined with true recurrence, which currently seems prohibitively expensive to translate directly into transformers on GPUs (fast weights are equivalent to KV cache unique per experience sequence and thus expensive for inference). Further speculation on how to improve that probably shouldn’t be discussed in this public forum.
At this point I think the general shape of brain-inspired algorithms for efficient model-based planning are fairly obvious but they translate into a use of large (ie TBs) of ‘fast weight’ memory at different timescales (mostly in prefrontal cortex, BG, hippocampus-adjacent and associated) combined with true recurrence, which currently seems prohibitively expensive to translate directly into transformers on GPUs (fast weights are equivalent to KV cache unique per experience sequence and thus expensive for inference). Further speculation on how to improve that probably shouldn’t be discussed in this public forum.