Thanks for the guidance! Together with Gwern’s reply my understanding now is that caching can indeed be very fluidly integrated into the architecture (and that there is a whole fascinating field that I could try to learn about).
After letting the ideas settle for a bit, I think that one aspect that might have lead me to think
In my mind, there is an amount of internal confusion which feels much stronger than what I would expect for an agent as in the OP
is that a Bayesian agent as described still is (or at least could be) very “monolithic” in its world model. I struggle with putting this into words, but my thinking feels a lot more disjointed/local/modular.
It would make sense if there is a spectrum from “basically global/serial computation” to “fully distributed/parallel computation” where going more to the right adds sources of internal confusion.
Yeah, that’s one of the main things which the “causal models as programs” thing is meant to capture, especially in conjunction with message passing and caching. The whole thing is still behaviorally one big model insofar as the cache is coherent, but the implementation is a bunch of little sparsely-interacting submodel-instances.
Thanks for the guidance! Together with Gwern’s reply my understanding now is that caching can indeed be very fluidly integrated into the architecture (and that there is a whole fascinating field that I could try to learn about).
After letting the ideas settle for a bit, I think that one aspect that might have lead me to think
is that a Bayesian agent as described still is (or at least could be) very “monolithic” in its world model. I struggle with putting this into words, but my thinking feels a lot more disjointed/local/modular. It would make sense if there is a spectrum from “basically global/serial computation” to “fully distributed/parallel computation” where going more to the right adds sources of internal confusion.
Yeah, that’s one of the main things which the “causal models as programs” thing is meant to capture, especially in conjunction with message passing and caching. The whole thing is still behaviorally one big model insofar as the cache is coherent, but the implementation is a bunch of little sparsely-interacting submodel-instances.