The IC correspond roughly with what we want to value, but differs from it in subtle ways, enough that optimising for one could be disastrous for the other. If we didn’t optimise, this wouldn’t be a problem. Suppose we defined an acceptable world as one that we would judge “yeah, that’s pretty cool” or even “yeah, that’s really great”. Then assume we selected randomly among the acceptable worlds. This would probably result in a world of positive value: siren worlds and marketing worlds are rare, because they fulfil very specific criteria. They triumph because they score so high on the IC scale, but they are outnumbered by the many more worlds that are simply acceptable.
Implication: the higher you set your threshold of acceptability, the more likely you are to get a horrific world. Counter-intuitive to say the least.
“ask for too much and you wind up with nothing” is a fine fairy tale moral. Does it actually hold in these particular circumstances?
Imagine that there’s a landscape of possible words. There is a function (A) on this landscape, we don’t know how to define it, but it is how much we truly would prefer a world if only we knew. Somewhere this function has a peak, the most ideal “eutopia”. There is another function. This one we do define. It is intended to approximate the first function, but it does not do so perfectly. Our “acceptability criteria” is to require that this second function (B) has a value at least some threshold.
Now as we raise the acceptability criteria (threshold for function B), we might expect there to be two different regimes. In a first regime with low acceptability criteria, Function B is not that bad a proxy for function A, and raising the threshold increases the average true desirability of the worlds that meet it. In a second regime with high acceptability criteria, function B ceases to be effective as a proxy. Here we are asking for “too much”. The peak of function B is at a different place than the peak of function A, and as we raise the threshold high enough we exclude the peak of A entirely. What we end up with is a world highly optimized for B and not so well optimized for A—a “marketing world”.
So, we must conclude, like you and Stuart Armstrong, that asking for “too much” is bad and we’d better set a lower threshold. Case closed, right?
Wrong.
The problem is that the above line of reasoning provides no reason to believe that the “marketing world” at the peak of function B is any worse than a random world at any lower threshold. As we relax the threshold on B, we include more worlds that are better in terms of A but also more that are worse. There’s no particular reason to believe, simply because the peak of B is at a different place than the peak of A, that the peak of B is at a valley of A. In fact, if B represents our best available estimate of A, it would seem that, even though the peak of B is predictably a marketing world, it’s still our best bet at getting a good value of A. A random world at any lower threshold should have a lower expected value of A.
The problem is that the above line of reasoning provides no reason to believe that the “marketing world” at the peak of function B is any worse than a random world at any lower threshold.
True. Which is why I added arguments pointing that a marketing world will likely be bad. Even on your terms, a peak of B will probably involve a diversion of effort/energy that could have contributed to A, away from A. eg if A is apples and B is bananas, the world with the most bananas is likely to contain no apples at all.
It sounds like, “the better you do maximizing your utility function, the more likely you are to get a bad result,” which can’t be true with the ordinary meanings of all those words. The only ways I can see for this to be true is if you aren’t actually maximizing your utility function, or your true utility function is not the same as the one you’re maximizing. But then you’re just plain old maximizing the wrong thing.
Absolutely, granted. I guess I just found this post to be an extremely convoluted way to make the point of “if you maximize the wrong thing, you’ll get something that you don’t want, and the more effectively you achieve the wrong goal, the more you diverge from the right goal.” I don’t see that the existence of “marketing worlds” makes maximizing the wrong thing more dangerous than it already was.
Additionally, I’m kinda horrified about the class of fixes (of which the proposal is a member) which involve doing the wrong thing less effectively. Not that I have an actual fix in mind. It just sounds like a terrible idea—”we’re pretty sure that our specification is incomplete in an important, unknown way. So we’re going to satisfice instead of maximize when we take over the world.”
Implication: the higher you set your threshold of acceptability, the more likely you are to get a horrific world. Counter-intuitive to say the least.
Why? This agrees with my intuition, ask for too much and you wind up with nothing.
“ask for too much and you wind up with nothing” is a fine fairy tale moral. Does it actually hold in these particular circumstances?
Imagine that there’s a landscape of possible words. There is a function (A) on this landscape, we don’t know how to define it, but it is how much we truly would prefer a world if only we knew. Somewhere this function has a peak, the most ideal “eutopia”. There is another function. This one we do define. It is intended to approximate the first function, but it does not do so perfectly. Our “acceptability criteria” is to require that this second function (B) has a value at least some threshold.
Now as we raise the acceptability criteria (threshold for function B), we might expect there to be two different regimes. In a first regime with low acceptability criteria, Function B is not that bad a proxy for function A, and raising the threshold increases the average true desirability of the worlds that meet it. In a second regime with high acceptability criteria, function B ceases to be effective as a proxy. Here we are asking for “too much”. The peak of function B is at a different place than the peak of function A, and as we raise the threshold high enough we exclude the peak of A entirely. What we end up with is a world highly optimized for B and not so well optimized for A—a “marketing world”.
So, we must conclude, like you and Stuart Armstrong, that asking for “too much” is bad and we’d better set a lower threshold. Case closed, right?
Wrong.
The problem is that the above line of reasoning provides no reason to believe that the “marketing world” at the peak of function B is any worse than a random world at any lower threshold. As we relax the threshold on B, we include more worlds that are better in terms of A but also more that are worse. There’s no particular reason to believe, simply because the peak of B is at a different place than the peak of A, that the peak of B is at a valley of A. In fact, if B represents our best available estimate of A, it would seem that, even though the peak of B is predictably a marketing world, it’s still our best bet at getting a good value of A. A random world at any lower threshold should have a lower expected value of A.
True. Which is why I added arguments pointing that a marketing world will likely be bad. Even on your terms, a peak of B will probably involve a diversion of effort/energy that could have contributed to A, away from A. eg if A is apples and B is bananas, the world with the most bananas is likely to contain no apples at all.
It sounds like, “the better you do maximizing your utility function, the more likely you are to get a bad result,” which can’t be true with the ordinary meanings of all those words. The only ways I can see for this to be true is if you aren’t actually maximizing your utility function, or your true utility function is not the same as the one you’re maximizing. But then you’re just plain old maximizing the wrong thing.
Er, yes? But we don’t exactly have the right thing lying around, unless I’ve missed some really exciting FAI news...
Absolutely, granted. I guess I just found this post to be an extremely convoluted way to make the point of “if you maximize the wrong thing, you’ll get something that you don’t want, and the more effectively you achieve the wrong goal, the more you diverge from the right goal.” I don’t see that the existence of “marketing worlds” makes maximizing the wrong thing more dangerous than it already was.
Additionally, I’m kinda horrified about the class of fixes (of which the proposal is a member) which involve doing the wrong thing less effectively. Not that I have an actual fix in mind. It just sounds like a terrible idea—”we’re pretty sure that our specification is incomplete in an important, unknown way. So we’re going to satisfice instead of maximize when we take over the world.”