We formulate curiosity as the error in an agent’s ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model.
It probably could be extended to learn “other” and the “boundary between self and other” in a similar way.
I implemented a version of it myself and it worked. This was years ago. I can only imagine what will happen when someone redoes some of these old RL algo’s, with LLM’s providing the world model.
This was empirically demonstrated to be possible in this paper: “Curiosity-driven Exploration by Self-supervised Prediction”, Pathak et al
It probably could be extended to learn “other” and the “boundary between self and other” in a similar way.
I implemented a version of it myself and it worked. This was years ago. I can only imagine what will happen when someone redoes some of these old RL algo’s, with LLM’s providing the world model.
Also DEIR needs to implicitly distinguish between things it caused, and things it didn’t https://arxiv.org/abs/2304.10770