I think this ties into modeling invariant abstractions of objects, and coming up with models that generalize to probable future states.
I think partly this is addressed in animals (including humans) by having a fraction of their brain devoted to predicting future sensations and forming a world model out of received sensations, but also having an action model that attempts to influence the world and self-models its own actions and their effects.
So things like the cubes, we learn a model of the motions of the cubes not just from watching video of them, but by stacking them up and knocking them over. We play and explore, and these manipulations allow us to test hypotheses.
I expect that having a portion of a model’s training be interactive exploration of a simulation would help close this gap.
The thing is, your actions can lead to additional scratches to the cubes, so actions aren’t causally separated from scratches. And the scratches will be visible on future states too, so if your model attempts to predict future states, it will attempt to predict the scratches.
I suspect ultimately one needs to have an explicit bias in favor of modelling large things accurately. Actions can help nail down the size comparisons, but they don’t directly force you to focus on the larger things.
I think this ties into modeling invariant abstractions of objects, and coming up with models that generalize to probable future states.
I think partly this is addressed in animals (including humans) by having a fraction of their brain devoted to predicting future sensations and forming a world model out of received sensations, but also having an action model that attempts to influence the world and self-models its own actions and their effects. So things like the cubes, we learn a model of the motions of the cubes not just from watching video of them, but by stacking them up and knocking them over. We play and explore, and these manipulations allow us to test hypotheses.
I expect that having a portion of a model’s training be interactive exploration of a simulation would help close this gap.
The thing is, your actions can lead to additional scratches to the cubes, so actions aren’t causally separated from scratches. And the scratches will be visible on future states too, so if your model attempts to predict future states, it will attempt to predict the scratches.
I suspect ultimately one needs to have an explicit bias in favor of modelling large things accurately. Actions can help nail down the size comparisons, but they don’t directly force you to focus on the larger things.