Yeah, that’s where my current thinking is at as well. I wouldn’t term it as having “multiple world models” — rather, as entertaining multiple possible candidates for the structure of some region of its world-model — but yes, I think we can say a lot about the convergent shape of world-models by reasoning from the idea that they need to be easy to adapt and recompose based on new evidence.
One possibility that I find plausible as a path to AGI is if we design something like a Language Model Cognitive Architecture (LMCA) along the lines of AutoGPT
I’ve also had this idea, as a steelman of the whole “externalized reasoning oversight” agenda — to prompt a LLM to generate a semantical world-model, with the LLM itself just playing the role of the planning process over it. However, I expect it wouldn’t work as intended, for two reasons:
Inasmuch as it’s successful, the world-model is unlikely to stay naively-human-interpretable. It’d drift towards alien wordings, concepts, connections. And even if we force it to look human-interpretable, stenography is convergent, and this sort of setup opens up many more dimensions in which to sneak in messages than standard chains-of-thoughts. And if we manage to defeat stenography as well, I then expect a WM to be forced to look like terabytes upon terabytes of complexly-interconnected text, each plan-making query on it generating mountains of data — perhaps too much to reasonably sort out. Tying-in to...
It’ll probably be too computationally intensive to work at all. Humans explicitly running generally-intelligent queries on their world-models takes a lot of time already, compared to the speed at which our instincts work. If each step of a query required a whole LLM forward-pass, instead of the minimal function required for it? I expect it’d require orders of magnitude more compute than Earth is going to have in the near-term.
And these two points aren’t independent: the more human-interpretable we’d force the WM to look, the more wasteful and impractical it’d be.
Yeah, that’s where my current thinking is at as well. I wouldn’t term it as having “multiple world models” — rather, as entertaining multiple possible candidates for the structure of some region of its world-model — but yes, I think we can say a lot about the convergent shape of world-models by reasoning from the idea that they need to be easy to adapt and recompose based on new evidence.
I’ve also had this idea, as a steelman of the whole “externalized reasoning oversight” agenda — to prompt a LLM to generate a semantical world-model, with the LLM itself just playing the role of the planning process over it. However, I expect it wouldn’t work as intended, for two reasons:
Inasmuch as it’s successful, the world-model is unlikely to stay naively-human-interpretable. It’d drift towards alien wordings, concepts, connections. And even if we force it to look human-interpretable, stenography is convergent, and this sort of setup opens up many more dimensions in which to sneak in messages than standard chains-of-thoughts. And if we manage to defeat stenography as well, I then expect a WM to be forced to look like terabytes upon terabytes of complexly-interconnected text, each plan-making query on it generating mountains of data — perhaps too much to reasonably sort out. Tying-in to...
It’ll probably be too computationally intensive to work at all. Humans explicitly running generally-intelligent queries on their world-models takes a lot of time already, compared to the speed at which our instincts work. If each step of a query required a whole LLM forward-pass, instead of the minimal function required for it? I expect it’d require orders of magnitude more compute than Earth is going to have in the near-term.
And these two points aren’t independent: the more human-interpretable we’d force the WM to look, the more wasteful and impractical it’d be.