I was playing with a variant of the Good Regulator Theorem recently which smells pretty similar to this.
The usual theorem says, roughly, “there exists an optimal policy which first constructs a model of the environment from its inputs, then makes a choice as a function of the model rather than a function of the inputs directly”. (In the original theorem, this was a deterministic model, and the inputs were assumed to be rich enough to perfectly reconstruct the environment state. This is easy to relax to a probabilistic model, however, with less-than-perfectly-informative inputs.)
The obvious shortcoming of this theorem is that it only says “there exists an optimal policy...”; in general, there may be far simpler optimal policies which do not explicitly build a model before making a choice. So: under what circumstances must an optimal policy build a model?
The approach I was playing with is conceptually similar to some of the ideas from Risks From Learned Optimization. Basically: an information bottleneck can force the use of a model. In the first timestep, the “agent” receives some input data X, and must choose a summary S(X) of that data to pass to itself in the second timestep. In the second timestep, it receives both the summary S(X) and some additional data Z. We can think of Z as “choosing which game the agent is playing”, i.e. Z chooses a utility function. The summary S(X) (i.e. the model) must therefore summarize all information relevant to any of the possible games which Z could choose, in order to achieve optimal play. For sufficiently rich Z, that means that the summary must include a full model of the environment.
(Drawing the parallel to mesa-optimizers: the first-timestep decision is analogous to the outer optimizer, the second-timestep decision is analogous to the inner optimizer. The inner optimizer has to work with models and optimization and whatnot mainly because it needs to process a bunch of information Z which is not available to the outer optimizer ahead of time; that’s why the first-timestep decision can’t just be “make decision Y in the next timestep”.)
Linking this back to the things you’re talking about: roughly speaking, if a model contains enough information for optimal play against a sufficiently rich set of utility functions, then the model “matches the world” (at least those parts of the world relevant to the utility functions).
I ask because I already have a result that says this in MDPs: you can compute all optimal value functions iff you know the environment dynamics up to isomorphism.
I was playing with a variant of the Good Regulator Theorem recently which smells pretty similar to this.
The usual theorem says, roughly, “there exists an optimal policy which first constructs a model of the environment from its inputs, then makes a choice as a function of the model rather than a function of the inputs directly”. (In the original theorem, this was a deterministic model, and the inputs were assumed to be rich enough to perfectly reconstruct the environment state. This is easy to relax to a probabilistic model, however, with less-than-perfectly-informative inputs.)
The obvious shortcoming of this theorem is that it only says “there exists an optimal policy...”; in general, there may be far simpler optimal policies which do not explicitly build a model before making a choice. So: under what circumstances must an optimal policy build a model?
The approach I was playing with is conceptually similar to some of the ideas from Risks From Learned Optimization. Basically: an information bottleneck can force the use of a model. In the first timestep, the “agent” receives some input data X, and must choose a summary S(X) of that data to pass to itself in the second timestep. In the second timestep, it receives both the summary S(X) and some additional data Z. We can think of Z as “choosing which game the agent is playing”, i.e. Z chooses a utility function. The summary S(X) (i.e. the model) must therefore summarize all information relevant to any of the possible games which Z could choose, in order to achieve optimal play. For sufficiently rich Z, that means that the summary must include a full model of the environment.
(Drawing the parallel to mesa-optimizers: the first-timestep decision is analogous to the outer optimizer, the second-timestep decision is analogous to the inner optimizer. The inner optimizer has to work with models and optimization and whatnot mainly because it needs to process a bunch of information Z which is not available to the outer optimizer ahead of time; that’s why the first-timestep decision can’t just be “make decision Y in the next timestep”.)
Linking this back to the things you’re talking about: roughly speaking, if a model contains enough information for optimal play against a sufficiently rich set of utility functions, then the model “matches the world” (at least those parts of the world relevant to the utility functions).
Is this a thoerem you’ve proven somewhere?
I have it in a notebook, might make a post soonish.
I ask because I already have a result that says this in MDPs: you can compute all optimal value functions iff you know the environment dynamics up to isomorphism.
(John made a post, I’ll just post this here so others can find it: https://www.lesswrong.com/posts/Dx9LoqsEh3gHNJMDk/fixing-the-good-regulator-theorem)