Intuitively it feels like you are onto something. Whether it is inherent to the optimizer’s functionality or is an artifact of how we view it, is hard to say. Most selectors use the algorithms that are the same or similar to controllers. Gradient descent in simulated annealing can be thought of evaluating possible worlds (counterfactuals) and making the one with the highest utility actual. And vice versa, a guided missile can be thought of as a selector in a search space. I wonder if this is what you mean when you say
the selection vs control distinction is a map/territory distinction
My guess is that the distinction is in large part the matter of your “stance”. If you think in a Cartesian way, analyzing an unchanging external reality, then it’s more of a search. If you think in terms of changing that reality with your actions, then it’s a controller.
… Rereading what you said, I guess I am basically agreeing with you.
Take the use of natural selection and humans as examples of optimization and mesa-optimization—the entire concept of “natural selection” is a human-convenient way of describing a pattern in the universe. It’s approximately an optimizer, but in order to get rid of that “approximately” you have to reintroduce epicycles until your model is as complicated as a model of the world again. Humans aren’t optimizers either, that’s just a human-convenient way of describing humans.
More abstractly, the entire process of recognizing a mesa-optimizer—something that models the world and makes plans—is an act of stance-taking. Or Quinean radical translation or whatever. If a cat-recognizing neural net learns an attention mechanism that models the world of cats and makes plans, it’s not going to come with little labels on the neurons saying “these are my input-output interfaces, this is my model of the world, this is my planning algorithm.” It’s going to be some inscrutable little bit of linear algebra with suspiciously competent behavior.
Not only could this competent behavior be explained either by optimization or some variety of “rote behavior,” but the neurons don’t care about these boundaries and can occupy a continuum of possibilities between any two central examples. And worst of all, the same neurons might have multiple different useful ways of thinking about them, some of which are in terms of elements like “goals” and “search,” and others are in terms of the elements of rote behavior.
In light of this, the problem of mesa-optimizers is not “when will this bright line be crossed?” but “when will this simple model of the AI’s behavior be predictable useful?” Even though I think the first instinct is the opposite.
More abstractly, the entire process of recognizing a mesa-optimizer—something that models the world and makes plans—is an act of stance-taking.
And pretty specifically, the intentional stance. I think Daniel Dennett did some pretty powerful clarification decades ago which could help this debate.
Intuitively it feels like you are onto something. Whether it is inherent to the optimizer’s functionality or is an artifact of how we view it, is hard to say. Most selectors use the algorithms that are the same or similar to controllers. Gradient descent in simulated annealing can be thought of evaluating possible worlds (counterfactuals) and making the one with the highest utility actual. And vice versa, a guided missile can be thought of as a selector in a search space. I wonder if this is what you mean when you say
My guess is that the distinction is in large part the matter of your “stance”. If you think in a Cartesian way, analyzing an unchanging external reality, then it’s more of a search. If you think in terms of changing that reality with your actions, then it’s a controller.
… Rereading what you said, I guess I am basically agreeing with you.
Yeah I think this is definitely a “stance” thing.
Take the use of natural selection and humans as examples of optimization and mesa-optimization—the entire concept of “natural selection” is a human-convenient way of describing a pattern in the universe. It’s approximately an optimizer, but in order to get rid of that “approximately” you have to reintroduce epicycles until your model is as complicated as a model of the world again. Humans aren’t optimizers either, that’s just a human-convenient way of describing humans.
More abstractly, the entire process of recognizing a mesa-optimizer—something that models the world and makes plans—is an act of stance-taking. Or Quinean radical translation or whatever. If a cat-recognizing neural net learns an attention mechanism that models the world of cats and makes plans, it’s not going to come with little labels on the neurons saying “these are my input-output interfaces, this is my model of the world, this is my planning algorithm.” It’s going to be some inscrutable little bit of linear algebra with suspiciously competent behavior.
Not only could this competent behavior be explained either by optimization or some variety of “rote behavior,” but the neurons don’t care about these boundaries and can occupy a continuum of possibilities between any two central examples. And worst of all, the same neurons might have multiple different useful ways of thinking about them, some of which are in terms of elements like “goals” and “search,” and others are in terms of the elements of rote behavior.
In light of this, the problem of mesa-optimizers is not “when will this bright line be crossed?” but “when will this simple model of the AI’s behavior be predictable useful?” Even though I think the first instinct is the opposite.
And pretty specifically, the intentional stance. I think Daniel Dennett did some pretty powerful clarification decades ago which could help this debate.