Kaj_Sotala comments on Integrating disagreeing subagents

Kaj_Sotala 16 May 2019 15:38 UTC
2 points
Interestingly, was just reading a paper from DeepMind which talked about deep reinforcement learning systems learning better if they are supplemented with an episodic memory store which maintains a store of all previous situations. Upon re-encountering a similar situation as one that was encountered in the past, the neural net is restored to a similar state as it was in before:
In episodic meta-RL, meta-learning occurs within a recurrent neural network, as described in the previous section and Box 3. However, superimposed on this is an episodic memory system, the role of which is to reinstate patterns of activity in the recurrent network. As in episodic deep RL, the episodic memory catalogues a set of past events, which can be queried based on the current context. However, rather than linking contexts with value estimates, episodic meta-RL links them with stored activity patterns from the recurrent network’s internal or hidden units. These patterns are important because, through meta-RL, they come to summarize what the agent has learned from interacting with individual tasks (see Box 3 for details). In episodic meta- RL, when the agent encounters a situation that appears similar to one encountered in the past, it reinstates the hidden activations from the previous encounter, allowing previously learned information to immediately influence the current policy. In effect, episodic memory allows the system to recognize previously encountered tasks, retrieving stored solutions.
Through simulation work in bandit and navigation tasks, Ritter et al. [39] showed that episodic meta-RL, just like ‘vanilla’ meta-RL, learns strong inductive biases that enable it to rapidly solve novel tasks. More importantly, when presented with a previously encountered task, episodic meta-RL immediately retrieves and reinstates the solution it previously discovered, avoiding the need to re-explore. On the first encounter with a new task, the system benefits from the rapidity of meta-RL; on the second and later encounters, it benefits from the one-shot learning ability conferred by episodic control. [...]
Equally direct links connect episodic meta-RL with psychology and neuroscience. Indeed, the reinstatement mechanism involved in episodic meta-RL was directly inspired by neuroscience data indicating that episodic memory circuits can serve to reinstate patterns of activation in cerebral cortex, including areas supporting working memory (see [40]). Ritter and colleagues [39] (S. Ritter, PhD Thesis, Princeton University, 2019) show how such a function could itself be configured through RL, giving rise to a system that can strategically reinstate information about tasks encountered earlier (see also 50, 51, 52).
This would fit together with the thing about memory reconsolidation being key to adjusting all subagents (if a subagent is something like a memory pattern coding for a specific situation), as well otherwise fitting with a lot of data about memory change being key to this kind of thing.
Then again, H.M. could learn new skills despite being unable to learn new episodic memories...