There is at least one example (I’ve struggled to dig up) of a memory-less RL agent learning to encode memory information in the state of the world.
I recall an example of a Mujoco agent whose memory was periodically wiped storing information in the position of its arms. I’m also having trouble digging it up though.
This policy itself is still a multilayer perceptron, which has no internal state, so we believe that in some cases the agent uses its arms to store information.
I recall an example of a Mujoco agent whose memory was periodically wiped storing information in the position of its arms. I’m also having trouble digging it up though.
In OpenAI’s Roboschool blog post: