From reading, I imagined a memory+cache structure instead of being closer to “cache all the way down”.
Note that the things being cached are not things stored in memory elsewhere. Rather, they’re (supposedly) outputs of costly-to-compute functions—e.g. the instrumental value of something would be costly to compute directly from our terminal goals and world model. And most of the values in cache are computed from other cached values, rather than “from scratch”—e.g. the instrumental value of X might be computed (and then cached) from the already-cached instrumental values of some stuff which X costs/provides.
Thanks for the guidance! Together with Gwern’s reply my understanding now is that caching can indeed be very fluidly integrated into the architecture (and that there is a whole fascinating field that I could try to learn about).
After letting the ideas settle for a bit, I think that one aspect that might have lead me to think
In my mind, there is an amount of internal confusion which feels much stronger than what I would expect for an agent as in the OP
is that a Bayesian agent as described still is (or at least could be) very “monolithic” in its world model. I struggle with putting this into words, but my thinking feels a lot more disjointed/local/modular.
It would make sense if there is a spectrum from “basically global/serial computation” to “fully distributed/parallel computation” where going more to the right adds sources of internal confusion.
Yeah, that’s one of the main things which the “causal models as programs” thing is meant to capture, especially in conjunction with message passing and caching. The whole thing is still behaviorally one big model insofar as the cache is coherent, but the implementation is a bunch of little sparsely-interacting submodel-instances.
Note that the things being cached are not things stored in memory elsewhere. Rather, they’re (supposedly) outputs of costly-to-compute functions—e.g. the instrumental value of something would be costly to compute directly from our terminal goals and world model. And most of the values in cache are computed from other cached values, rather than “from scratch”—e.g. the instrumental value of X might be computed (and then cached) from the already-cached instrumental values of some stuff which X costs/provides.
Coherence of Caches and Agents goes into more detail on that part of the picture, if you’re interested.
Thanks for the guidance! Together with Gwern’s reply my understanding now is that caching can indeed be very fluidly integrated into the architecture (and that there is a whole fascinating field that I could try to learn about).
After letting the ideas settle for a bit, I think that one aspect that might have lead me to think
is that a Bayesian agent as described still is (or at least could be) very “monolithic” in its world model. I struggle with putting this into words, but my thinking feels a lot more disjointed/local/modular. It would make sense if there is a spectrum from “basically global/serial computation” to “fully distributed/parallel computation” where going more to the right adds sources of internal confusion.
Yeah, that’s one of the main things which the “causal models as programs” thing is meant to capture, especially in conjunction with message passing and caching. The whole thing is still behaviorally one big model insofar as the cache is coherent, but the implementation is a bunch of little sparsely-interacting submodel-instances.