Come to think of it, how is it that humans pass the mirror test? There’s probably a lot of existing theorizing on this, but a quick guess without having read any of it: babies first spend a long time learning to control their body, and then learn an implicit rule like “if I can control it by an act of will, it is me”, getting a lot of training data that reinforces that rule. Then they see themselves in a mirror and notice that they can control their reflection through an act of will...
This is an incomplete answer since it doesn’t explain how they learn to understand that the entity in the mirror is not a part of their actual body, but it does somewhat suggest that maybe humans just interpolate their self-awareness from a bunch of training data too.
We formulate curiosity as the error in an agent’s ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model.
It probably could be extended to learn “other” and the “boundary between self and other” in a similar way.
I implemented a version of it myself and it worked. This was years ago. I can only imagine what will happen when someone redoes some of these old RL algo’s, with LLM’s providing the world model.
Come to think of it, how is it that humans pass the mirror test? There’s probably a lot of existing theorizing on this, but a quick guess without having read any of it: babies first spend a long time learning to control their body, and then learn an implicit rule like “if I can control it by an act of will, it is me”, getting a lot of training data that reinforces that rule. Then they see themselves in a mirror and notice that they can control their reflection through an act of will...
This is an incomplete answer since it doesn’t explain how they learn to understand that the entity in the mirror is not a part of their actual body, but it does somewhat suggest that maybe humans just interpolate their self-awareness from a bunch of training data too.
This was empirically demonstrated to be possible in this paper: “Curiosity-driven Exploration by Self-supervised Prediction”, Pathak et al
It probably could be extended to learn “other” and the “boundary between self and other” in a similar way.
I implemented a version of it myself and it worked. This was years ago. I can only imagine what will happen when someone redoes some of these old RL algo’s, with LLM’s providing the world model.
Also DEIR needs to implicitly distinguish between things it caused, and things it didn’t https://arxiv.org/abs/2304.10770