I have an intuition that this might have implications for the Orthogonality Thesis, but I’m quite unsure. To restate the Orthogonality Thesis in the terms above, “any combination of intelligence level and model of the world, M2”. This feels different than my intuition that advanced intelligences will tend to converge upon a shared model / encoding of the world even if they have different goals. Does this make sense? Is there a way to reconcile these intuitions?
Important point: neither of the models in this post are really “the optimizer’s model of the world”. M1 is an observer’s model of the world (or the “God’s-eye view”); the world “is being optimized” according to that model, and there isn’t even necessarily “an optimizer” involved.M2 says what the world is being-optimized-toward.
To bring “an optimizer” into the picture, we’d probably want to say that there’s some subsystem which “chooses”/determines θ′, in such a way that E[−logP[X|M2]|M1(θ′)]≤E[−logP[X|M2]|M1(θ)], compared to some other θ-values. We might also want to require this to work robustly, across a range of environments, although the expectation does that to some extent already. Then the interesting hypothesis is that there’s probably a limit to how low such a subsystem can make the expected-description-length without making θ′ depend on other variables in the environment. To get past that limit, the subsystem needs things like “knowledge” and a “model” of its own—the basic purpose of knowledge/models for an optimizer is to make the output depend on the environment. And it’s that model/knowledge which seems likely to converge on a similar shared model/encoding of the world.
I have an intuition that this might have implications for the Orthogonality Thesis, but I’m quite unsure. To restate the Orthogonality Thesis in the terms above, “any combination of intelligence level and model of the world, M2”. This feels different than my intuition that advanced intelligences will tend to converge upon a shared model / encoding of the world even if they have different goals. Does this make sense? Is there a way to reconcile these intuitions?
Important point: neither of the models in this post are really “the optimizer’s model of the world”. M1 is an observer’s model of the world (or the “God’s-eye view”); the world “is being optimized” according to that model, and there isn’t even necessarily “an optimizer” involved.M2 says what the world is being-optimized-toward.
To bring “an optimizer” into the picture, we’d probably want to say that there’s some subsystem which “chooses”/determines θ′, in such a way that E[−logP[X|M2]|M1(θ′)]≤E[−logP[X|M2]|M1(θ)], compared to some other θ-values. We might also want to require this to work robustly, across a range of environments, although the expectation does that to some extent already. Then the interesting hypothesis is that there’s probably a limit to how low such a subsystem can make the expected-description-length without making θ′ depend on other variables in the environment. To get past that limit, the subsystem needs things like “knowledge” and a “model” of its own—the basic purpose of knowledge/models for an optimizer is to make the output depend on the environment. And it’s that model/knowledge which seems likely to converge on a similar shared model/encoding of the world.
Thanks! I’m still wrapping my mind around a lot of this, but this gives me some new directions to think about.