Vanessa Kosoy comments on Vanessa Kosoy’s Shortform

Vanessa Kosoy 24 Jul 2023 11:07 UTC
LW: 2 AF: 2
AF
Here is the sketch of a simplified model for how a metacognitive agent deals with traps.
Consider some (unlearnable) prior $ζ$ over environments, s.t. we can efficiently compute the distribution $ζ (h)$ over observations given any history $h$ . For example, any prior over a small set of MDP hypotheses would qualify. Now, for each $h$ , we regard $ζ (h)$ as a “program” that the agent can execute and form beliefs about. In particular, we have a “metaprior” $ξ$ consisting of metahypotheses: hypotheses-about-programs.
For example, if we let every metahypothesis be a small infra-RDP satisfying appropriate assumptions, we probably have an efficient “metalearning” algorithm. More generally, we can allow a metahypothesis to be a learnable mixture of infra-RDPs: for instance, there is a finite state machine for specifying “safe” actions, and the infra-RDPs in the mixture guarantee no long-term loss upon taking safe actions.
In this setting, there are two levels of learning algorithms:
- The metalearning algorithm, which learns the correct infra-RDP mixture. The flavor of this algorithm is RL in a setting where we have a simulator of the environment (since we can evaluate $ζ (h)$ for any $h$ ). In particular, here we don’t worry about exploitation/exploration tradeoffs.
- The “metacontrol” algorithm, which given an infra-RDP mixture, approximates the optimal policy. The flavor of this algorithm is “standard” RL with exploitation/exploration tradeoffs.
In the simplest toy model, we can imagine that metalearning happens entirely in advance of actual interaction with the environment. More realistically, the two needs to happen in parallel. It is then natural to apply metalearning to the current environmental posterior rather than the prior (i.e. the histories starting from the history that already occurred). Such an agent satisfies “opportunistic” guarantees: if at any point of time, the posterior admits a useful metahypothesis, the agent can exploit this metahypothesis. Thus, we address both parts of the problem of traps:
- The complexity-theoretic part (subproblem 1.2) is addressed by approximating the intractable Bayes-optimality problem by the metacontrol problem of the (coarser) metahypothesis.
- The statistical part (subproblem 2.1) is addressed by opportunism: if at some point, we can easily learn something about the physical environment, then we do.