rotatingpaguro comments on Optimisation Measures: Desiderata, Impossibility, Proposals

rotatingpaguro 22 Jan 2024 2:54 UTC
1 point
0
I remembered this when I read the following excerpt in Meaning and Agency:
In Belief in Intelligence, Eliezer sketches the peculiar mental state which regards something else as intelligent:
Imagine that I’m visiting a distant city, and a local friend volunteers to drive me to the airport. I don’t know the neighborhood. Each time my friend approaches a street intersection, I don’t know whether my friend will turn left, turn right, or continue straight ahead. I can’t predict my friend’s move even as we approach each individual intersection—let alone, predict the whole sequence of moves in advance.
Yet I can predict the result of my friend’s unpredictable actions: we will arrive at the airport.
[...]
I can predict the outcome of a process, without being able to predict any of the intermediate steps of the process.
In Measuring Optimization Power, he formalizes this idea by taking a preference ordering and a baseline probability distribution over the possible outcomes. In the airport example, the preference ordering might be how fast they arrive at the airport. The baseline probability distribution might be Eliezer’s probability distribution over which turns to take—so we imagine the friend turning randomly at each intersection. The optimization power of the friend is measured by how well they do relative to this baseline.
I think this can be a useful notion of agency, but constructing this baseline model does strike me as rather artificial. We’re not just sampling from Eliezer’s world-model. If we sampled from Eliezer’s world-model, the friend would turn randomly at each intersection, but they’d also arrive at the airport in a timely manner no matter which route they took—because Eliezer’s actual world-model believes the friend is capably pursuing that goal.
So to construct the baseline model, it is necessary to forget the existence of the agency we’re trying to measure while holding other aspects of our world-model steady. While it may be clear how to do this in many cases, it isn’t clear in general. I suspect if we tried to write down the algorithm for doing it, it would involve an “agency detector” at some point; you have to be able to draw a circle around the agent in order to selectively forget it. So this is more of an after-the-fact sanity check for locating agents, rather than a method of locating agents in the first place.