We’re not sold on the idea that the target set must be small. We’re skeptical because we need a notion of size, and the probability measure is the most natural way to define size. But if we think of the system as optimizing for that target, shouldn’t the target be more likely, and less small?
I would have gone with the entropy of the probability distribution if you wanted to use the probability measure; this captures the idea that the more powerful the optimizer is, the more you can constrain your expectations about what will happen, and the less information there is to gain from knowing exactly which state you end up in.
Thanks for this. We agree it’s natural to think that a stronger optimizer means less information from seeing the end state, but the question shows up again here. The general tension is that one version of thinking of optimization is something like, the optimizer has a high probability of hitting a narrow target. But the narrowness notion is often what is doing the work in making this seem intuitive, and under seemingly relevant notions of narrowness (how likely is this set of outcomes to be realized), then the set of outcomes we wanted to say is narrow is, in fact, not narrow at all. The lesson we take is that a lot of the ways we want to measure the underlying space rely on choices we make in describing the (size of the) space. If the choices reflect our uncertainty, then we get the puzzle we describe. I don’t see how moving to thinking in terms of entropy would address this. Given that we are working in continuous spaces, I think one way to see that we often makes choices like this, even with entropy, is to look at continuous generalizations of entropy. When we move to the continuous case, things become more subtle. Differential entropy (the most natural generalization) lacks some of the important properties that makes entropy a useful measure of uncertainty (it can be negative, and it is not invariant under continuous coordinate transformations). You can move to relative entropy to try to fix these problems, but this depends on a choice of an underlying measure m. What we see in both these cases is that the generalizations of entropy—both differential and relative—rely on some choice of a way to describe the underlying space (for differential, it is the choice of coordinate system, and for relative, the underlying measure m).
I would have gone with the entropy of the probability distribution if you wanted to use the probability measure; this captures the idea that the more powerful the optimizer is, the more you can constrain your expectations about what will happen, and the less information there is to gain from knowing exactly which state you end up in.
Thanks for this. We agree it’s natural to think that a stronger optimizer means less information from seeing the end state, but the question shows up again here. The general tension is that one version of thinking of optimization is something like, the optimizer has a high probability of hitting a narrow target. But the narrowness notion is often what is doing the work in making this seem intuitive, and under seemingly relevant notions of narrowness (how likely is this set of outcomes to be realized), then the set of outcomes we wanted to say is narrow is, in fact, not narrow at all. The lesson we take is that a lot of the ways we want to measure the underlying space rely on choices we make in describing the (size of the) space. If the choices reflect our uncertainty, then we get the puzzle we describe. I don’t see how moving to thinking in terms of entropy would address this. Given that we are working in continuous spaces, I think one way to see that we often makes choices like this, even with entropy, is to look at continuous generalizations of entropy. When we move to the continuous case, things become more subtle. Differential entropy (the most natural generalization) lacks some of the important properties that makes entropy a useful measure of uncertainty (it can be negative, and it is not invariant under continuous coordinate transformations). You can move to relative entropy to try to fix these problems, but this depends on a choice of an underlying measure m. What we see in both these cases is that the generalizations of entropy—both differential and relative—rely on some choice of a way to describe the underlying space (for differential, it is the choice of coordinate system, and for relative, the underlying measure m).