Can you provide any support for the notion that in general, a narrower search comes up with a higher proportion of bad worlds?
My intuition is that the more you optimize for X, the more you sacrifice everything else, unless it is inevitably implied by X. So anytime there is a trade-off between “seeming more good” and “being more good”, the impression-maximizing algorithm will prefer the former.
When you start with a general set of words, “seeming good” and “being good” are positively correlated. But when you already get into the subset of worlds that all seem very good, and you continue pushing for better and better impression, the correlation may gradually turn to negative. At this moment you may be unknowingly asking the AI to exploit your errors in judgement, because in given subset that may be the easiest way to improve the impression.
Another intuition is the closer you get to the “perfect” world, the more difficult it becomes to find a way to increase the amount of good. But the difficulty of exploiting a human bias that will cause humans to overestimate the value of the world, remains approximately constant.
Though this doesn’t prove that the world with maximum “seeming good” is some kind of hell. It could still be very good, although not nearly as good as the world with maximum “good”. (However, if the world with maximum “seeming good” happens to be some kind of hell, then maximizing for “seeming good” is the way to find it.)
This intuition seems correct in typical human situations. Everything is highly optimized already with different competing considerations, so optimizing for X does indeed necessarily sacrifice the other things that are also optimized for. So if you relax the constraints for X, you get more of the other things, if you continue optimizing for them.
However, it does not follow from this that if you relax your constraint on X, and take a random world meeting at least the lower value of X, your world will be any better in the non-X ways. You need to actually be optimizing for the non-X things to expect to get them.
it does not follow from this that if you relax your constraint on X, and take a random world meeting at least the lower value of X, your world will be any better in the non-X ways
Thanks but I don’t see the relevance of the reversal test. The reversal test involves changing the value of a parameter but not the amount of optimization. And the reversal test shouldn’t apply to a parameter that is already optimized over unless the current optimization is wrong or circumstances on which the optimization depends are changing.
My intuition is that the more you optimize for X, the more you sacrifice everything else, unless it is inevitably implied by X. So anytime there is a trade-off between “seeming more good” and “being more good”, the impression-maximizing algorithm will prefer the former.
When you start with a general set of words, “seeming good” and “being good” are positively correlated. But when you already get into the subset of worlds that all seem very good, and you continue pushing for better and better impression, the correlation may gradually turn to negative. At this moment you may be unknowingly asking the AI to exploit your errors in judgement, because in given subset that may be the easiest way to improve the impression.
Another intuition is the closer you get to the “perfect” world, the more difficult it becomes to find a way to increase the amount of good. But the difficulty of exploiting a human bias that will cause humans to overestimate the value of the world, remains approximately constant.
Though this doesn’t prove that the world with maximum “seeming good” is some kind of hell. It could still be very good, although not nearly as good as the world with maximum “good”. (However, if the world with maximum “seeming good” happens to be some kind of hell, then maximizing for “seeming good” is the way to find it.)
This intuition seems correct in typical human situations. Everything is highly optimized already with different competing considerations, so optimizing for X does indeed necessarily sacrifice the other things that are also optimized for. So if you relax the constraints for X, you get more of the other things, if you continue optimizing for them.
However, it does not follow from this that if you relax your constraint on X, and take a random world meeting at least the lower value of X, your world will be any better in the non-X ways. You need to actually be optimizing for the non-X things to expect to get them.
Great point!
Thanks but I don’t see the relevance of the reversal test. The reversal test involves changing the value of a parameter but not the amount of optimization. And the reversal test shouldn’t apply to a parameter that is already optimized over unless the current optimization is wrong or circumstances on which the optimization depends are changing.