This seems to me like a major spot where the dualistic model of self-and-world gets introduced into reinforcement learning AI design (which leads to the Anvil Problem). It seems possible to model memory as part of the environment by simply adding I/O actions to the list of actions available to the agent. However, if you want to act upon something read, you either need to model this by having atomic read-and-if-X-do-Y actions, or you still need some minimal memory to store the previous item(s) read in.
This seems to me like a major spot where the dualistic model of self-and-world gets introduced into reinforcement learning AI design (which leads to the Anvil Problem). It seems possible to model memory as part of the environment by simply adding I/O actions to the list of actions available to the agent. However, if you want to act upon something read, you either need to model this by having atomic read-and-if-X-do-Y actions, or you still need some minimal memory to store the previous item(s) read in.