Richard_Ngo comments on Let’s talk about “Convergent Rationality”

Richard_Ngo 26 Jun 2019 20:40 UTC
LW: 7 AF: 4
AF
There is at least one example (I’ve struggled to dig up) of a memory-less RL agent learning to encode memory information in the state of the world.
I recall an example of a Mujoco agent whose memory was periodically wiped storing information in the position of its arms. I’m also having trouble digging it up though.
- Buck 5 Dec 2019 1:21 UTC
  LW: 14 AF: 8
  AF Parent
  In OpenAI’s Roboschool blog post:
  
  This policy itself is still a multilayer perceptron, which has no internal state, so we believe that in some cases the agent uses its arms to store information.