simon comments on Siren worlds and the perils of over-optimised search

simon 8 Apr 2014 3:35 UTC
0 points
I don’t see how that changes the probability of getting a siren world v. an acceptable world at all (ex ante).

If the expected number of siren worlds in the class we look through is less than one, then sometimes there will be none, but sometimes there will be one or more and on average we still get the same expected number and on average the first element we find is a siren world with probability equal to the expected proportion of siren worlds.
- Stuart_Armstrong 8 Apr 2014 9:32 UTC
  0 points
  Parent
  The scenario is: we draw X worlds, and pick the top ranking one. If there is a siren world or marketing world, it will come top; otherwise if there is are acceptable worlds, one of them will come top. Depending on how much we value acceptable worlds over non-acceptable and over siren/marketing worlds, and depending on the proportions of each, there is an X that maximises our outcome. (trivial example: if all worlds are acceptable, picking X=1 beats all other alternatives, as higher X simply increases the chance of getting a siren/marketing world).
  - simon 8 Apr 2014 16:22 UTC
    3 points
    Parent
    Thanks, this clarified your argument to me a lot. However, I still don’t see any good reasons provided to believe that, merely because a world is highly optimized on utility function B, it is less likely to be well-optimized on utility function A as compared to a random member of a broader class.
    
    That is, let’s classify worlds (within the broader, weakly optimized set) as highly optimized or weakly optimized, and as acceptable or unacceptable. You claim that being highly optimized reduces the probability of being acceptable. But your arguments in favour of this proposition seem to be:
    
    a) it is possible for a world to be highly optimized and unacceptable
    
    (but all the other combinations are also possible)
    
    and
    
    b) “Genuine eutopias are unlikely to be marketing worlds, because they are optimised for being good rather than seeming good.”
    
    (In other words, the peak of function B is unlikely to coincide with the peak of function A. But why should the chance that the peak of function B and the peak of function A randomly coincide, given that they are both within the weakly optimized space, be any lower than the chance of a random element of the weakly optimized space coinciding with the peak of function A? And this argument doesn’t seem to support a lower chance of the peak of function B being acceptable, either.)
    
    Here’s my attempt to come up with some kind of argument that might work to support your conclusion:
    
    1) maybe the fact that a world is highly optimized for utility function B means that it is simpler than an average world, and this simplicity results in it likely being relatively unlikely to be a decent world in terms of utility function A.
    
    2) maybe the fact that a world is highly optimized for utility function B means that it is more complex than an average world, in a way that is probably bad for utility function A.
    
    Or something.
    
    ETA:
    
    I had not read http://lesswrong.com/lw/jao/siren_worlds_and_the_perils_of_overoptimised/asdf when I wrote this comment, this looks like it could be an actual argument like what I was looking for, will consider it when I have time.
    
    ETA 2:
    
    The comment linked seems to be another statement that function A (our true global utility function) and function B (some precise utility function we are using as a proxy for A) are likely to have different peaks.
    
    As I mentioned, the fact that A and B are likely to have different peaks does not imply that the peak of B has less than average values of A.
    
    Still, I’ve been thinking of possible hidden assumptions that might lead towards your conclusion.
    
    FIRST, AN APOLOGY: It seems I completely skipped over or ineffectively skimmed your paragraph on “realistic worlds”. The supposed “hidden assumption” I suggest below on weighting by plausibility is quite explicit in this paragraph, which I hadn’t noticed, sorry. Nonetheless I am still including the below paragraphs as the “realistic worlds” paragraph’s assumptions seem specific to the paragraph and not to the whole post.
    
    One possibility is that when you say “Then assume we selected randomly among the acceptable worlds.” You actually mean something along the lines of “Then assume we selected randomly among the acceptable worlds weighting by plausibility.” Now if you weight by plausibility you import human utility functions because worlds are more likely to happen if humans having human utility functions would act to bring them about. The highly constrained peak of function B doesn’t benefit from that importation. So this provides a reason to believe that the peak of function B might be worse than the plausibility-weighted average of the broader set. Of course, it is not the narrowness per se that’s at issue but the fact that there is a hidden utility function in the weighting of the broader set.
    
    Another possibility is that you are finding the global maximum of B instead of the maximum of B within the set meeting the acceptability criteria. In this case as well, it’s the fact that you have different, more reliable utility function in the broader set that makes the more constrained search comparatively worse, rather than the narrowness of the constrained search.
    
    Another possibility is that you are assuming that the acceptability criteria are in some sense a compromise between function B and true utility function A. In this case, we might expect a world high in function B within the acceptability criteria to be low in A, because it was likely only included in the acceptability criteria because it was high in B. Again, the problem in this case would be that function B failed to include information about A that was built into the broader set.
    
    A note: the reason I am looking for hidden assumptions is that with what I see as your explicit assumptions there is a simple model, namely, that function A and function B are uncorrelated within the acceptable set, that seems to be compatible with your assumptions and incompatible with your conclusions. In this model, maximizing B can lead to any value of A including low values, but the effect of maximizing B on A should on average be the same as taking a random member of the set. If anything, this model should be expected to be pessimistic, since B is explicitly designed to approximate A.