Vlad Mikulik comments on Clarifying Consequentialists in the Solomonoff Prior

Vlad Mikulik 11 Jul 2018 19:55 UTC
LW: 1 AF: 1
AF
I agree that this probably happens when you set out to mess with an arbitrary particular S, I.e. try to make some S’ that shares a prefix with S as likely as S.

However, some S are special, in the sense that their prefixes are being used to make very important decisions. If you, as a malicious TM in the prior, perform an exhaustive search of universes, you can narrow down your options to only a few prefixes used to make pivotal decisions, selecting one of those to mess with is then very cheap to specify. I use S to refer to those strings that are the ‘natural’ continuation of those cheap-to-specify prefixes.

There are, it seems to me, a bunch of other equally-complex TMs that want to make other strings that share that prefix more likely, including some that promote S itself. What the resulting balance looks like is unclear to me, but what’s clear is that the prior is malign with respect to that prefix—conditioning on that prefix gives you a distribution almost entirely controlled by these malign TMs. The ‘natural’ complexity of S, or of other strings that share the prefix, play almost no role in their priors.

The above is of course conditional on this exhaustive search being possible, which also relies on there being anyone in any universe that actually uses the prior to make decisions. Otherwise, we can’t select the prefixes that can be messed with.
- AlexMennen 11 Jul 2018 21:22 UTC
  LW: 2 AF: 1
  AF Parent
  This reasoning seems to rely on there being such strings S that are useful to predict far out of proportion to what you would expect from their complexity. But a description of the circumstance in which predicting S is so useful should itself give you a way of specifying S, so I doubt that this is possible.
  - Vlad Mikulik 11 Jul 2018 22:16 UTC
    LW: 1 AF: 1
    AF Parent
    I agree. That’s what I meant when I wrote there will be TMs that artificially promote S itself. However, this would still mean that most of S’s mass in the prior would be due to these TMs, and not due to the natural generator of the string.
    
    Furthermore, it’s unclear how many TMs would promote S vs S’ or other alternatives. Because of this, I don’t now whether the prior would be higher for S or S’ from this reasoning alone. Whichever is the case, the prior no longer reflects meaningful information about the universe that generates S and whose inhabitants are using the prefix to choose what to do; it’s dominated by these TMs that search for prefixes they can attempt to influence.
    - AlexMennen 12 Jul 2018 0:15 UTC
      LW: 5 AF: 2
      AF Parent
      I didn’t mean that an agenty Turing machine would find S and then decide that it wants you to correctly predict S. I meant that to the extent that predicting S is commonly useful, there should be a simple underlying reason why it is commonly useful, and this reason should give you a natural way of computing S that does not have the overhead of any agency that decides whether or not it wants you to correctly predict S.
      - paulfchristiano 12 Jul 2018 1:48 UTC
        LW: 2 AF: 1
        AF Parent
        How many bits do you think it takes to specify the property “people’s predictions about S, using universal prior P, are very important”?
        (I think you’ll need to specify the universal prior P by reference to the universal prior that is actually used in the world containing the string S, if you spell out the prior P explicitly you are already sunk just from the ambiguity in the choice of language.)
        It seems relatively unlikely to me that this will be cheaper than specifying some arbitrary degree of freedom in a computationally rich universe that life can control (+ the extra log(fraction of degrees of freedom the consequentialists actually choose to control)). Of course it might.
        I agree that the entire game is in the constants—what is the cheapest way to pick out important strings.
        AlexMennen 12 Jul 2018 3:49 UTC
        LW: 2 AF: 1
        AF Parent
        I don’t think that specifying the property of importance is simple and helps narrow down S. I think that in order for predicting S to be important, S must be generated by a simple process. Processes that take large numbers of bits to specify are correspondingly rarely occurring, and thus less useful to predict.
        paulfchristiano 12 Jul 2018 5:59 UTC
        LW: 6 AF: 3
        AF Parent
        I don’t buy it. A camera that some robot is using to make decisions is no simpler than any other place on Earth, just more important.
        (This already gives the importance-weighted predictor a benefit of ~log(quadrillion))
        Clearly you need to e.g. make the anthropic update and do stuff like that before you have any chance of competing with the consequentialist. This might just be a quantitative difference about how simple is simple—like I said elsewhere, all the action is in the additive constants, I agree that the important things are “simple” in some sense.
        AlexMennen 12 Jul 2018 16:49 UTC
        LW: 2 AF: 1
        AF Parent
        Ok, I see what you’re getting at now.