Thane Ruthenis comments on In Defense of Wrapper-Minds

Thane Ruthenis 29 Dec 2022 6:12 UTC
1 point
0
I agree that we aren’t going to actually get a pure wrapper-mind in practice, let alone an inner-aligned wrapper-mind. It very much only happens in the limit of a “perfect” training process.
But I argue that, inasmuch as training processes approximate this perfect ideal, so would the minds we get out of them approximate an $R$ -aligned wrapper-mind. The fact that practical exploration policies fall short of an idealized “all possible rewarding trajectories” exploration policy is just another way for a training process to be an imperfect approximation; and the less of an approximation it is (the more exhaustive the exploration policy is), the more the agent we’ll get will approximate an $R$ -maximizer.
For my argument to go through, we only need a exploration policy + reinforcement schedule that put some sufficient constraint on $R$ , while simultaneously making the training environment diverse enough to make it necessary to re-target one’s heuristics/shards at $R$ at runtime.
Hmm, maybe I’d underappreciated that last condition, actually. Imagine a training environment which often introduces scenarios that the agent never encountered before — that are OOD with regards to its earlier training. The only agents that can stay (roughly) aimed at $R$ in this case are those that incorporate (a good proxy of) $R$ in themselves, and can re-orient themselves back towards $R$ (or in its rough direction) even when taken off-distribution. I think this is the “sufficient diversity” condition I’m talking about.
And then we can approximate this condition by postulating, e. g.:
- an environment that sometimes takes agents to points that are on-distribution but far from its center, or
- an environment which gradually changes in-episode such that the agent has to have some mechanism for keeping itself aimed at $R$ through that, or
- a combination of environment complexity + memory constraints such that the agent can only store an optimal set of heuristics for a subset of that environment, which requires the agent to have some mechanism for re-deriving new $R$ -aligned heuristics at runtime, if it wants to move within that environment at runtime.
(And then I suspect that we only get to an AGI under such circumstances; any less adversity than that, and we indeed just get stuck with shallow heuristics that don’t generalize and can’t do anything genuinely exciting.)