AlexMennen comments on Clarifying Consequentialists in the Solomonoff Prior

AlexMennen 11 Jul 2018 18:24 UTC
LW: 3 AF: 2
0
AF
I’m not convinced that the probability of S’ could be pushed up to anything near the probability of S. Specifying an agent that wants to trick you into predicting S’ rather than S with high probability when you see their common prefix requires specifying the agency required to plan this type of deception (which should be quite complicated), and specifying the common prefix of S and S’ as the particular target for the deception (which, insofar as it makes sense to say that S is the “correct” continuation of the prefix, should have about the same “natural” complexity as S). That is, specifying such an agent requires all the information required to specify S, plus a bunch of overhead to specify agency, which adds up to much more complexity than S itself.
- paulfchristiano 12 Jul 2018 1:44 UTC
  LW: 7 AF: 3
  0
  AF Parent
  specifying the agency required to plan this type of deception (which should be quite complicated)
  Suppose that I just specify a generic feature of a simulation that can support life + expansion (the complexity of specifying “a simulation that can support life” is also paid by the intended hypothesis, so we can factor it out). Over a long enough time such a simulation will produce life, that life will spread throughout the simulation, and eventually have some control over many features of that simulation.
  And specifying the common prefix of S and S’ as the particular target for the deception (which, insofar as it makes sense to say that S is the “correct” continuation of the prefix, should have about the same “natural” complexity as S)
  Once you’ve specified the agent, it just samples randomly from the distribution of “strings I want to influence.” That has a way lower probability than the “natural” complexity of a string I want to influence. For example, if 1/quadrillion strings are important to influence, then the attackers are able to save log(quadrillion) bits.
  - AlexMennen 12 Jul 2018 3:46 UTC
    LW: 2 AF: 1
    0
    AF Parent
    Suppose that I just specify a generic feature of a simulation that can support life + expansion (the complexity of specifying “a simulation that can support life” is also paid by the intended hypothesis, so we can factor it out). Over a long enough time such a simulation will produce life, that life will spread throughout the simulation, and eventually have some control over many features of that simulation.
    Oh yes, I see. That does cut the complexity overhead down a lot.
    Once you’ve specified the agent, it just samples randomly from the distribution of “strings I want to influence.” That has a way lower probability than the “natural” complexity of a string I want to influence. For example, if 1/quadrillion strings are important to influence, then the attackers are able to save log(quadrillion) bits.
    I don’t understand what you’re saying here.
- Vlad Mikulik 11 Jul 2018 19:55 UTC
  LW: 1 AF: 1
  0
  AF Parent
  I agree that this probably happens when you set out to mess with an arbitrary particular S, I.e. try to make some S’ that shares a prefix with S as likely as S.
  
  However, some S are special, in the sense that their prefixes are being used to make very important decisions. If you, as a malicious TM in the prior, perform an exhaustive search of universes, you can narrow down your options to only a few prefixes used to make pivotal decisions, selecting one of those to mess with is then very cheap to specify. I use S to refer to those strings that are the ‘natural’ continuation of those cheap-to-specify prefixes.
  
  There are, it seems to me, a bunch of other equally-complex TMs that want to make other strings that share that prefix more likely, including some that promote S itself. What the resulting balance looks like is unclear to me, but what’s clear is that the prior is malign with respect to that prefix—conditioning on that prefix gives you a distribution almost entirely controlled by these malign TMs. The ‘natural’ complexity of S, or of other strings that share the prefix, play almost no role in their priors.
  
  The above is of course conditional on this exhaustive search being possible, which also relies on there being anyone in any universe that actually uses the prior to make decisions. Otherwise, we can’t select the prefixes that can be messed with.
  - AlexMennen 11 Jul 2018 21:22 UTC
    LW: 2 AF: 1
    0
    AF Parent
    This reasoning seems to rely on there being such strings S that are useful to predict far out of proportion to what you would expect from their complexity. But a description of the circumstance in which predicting S is so useful should itself give you a way of specifying S, so I doubt that this is possible.
    - Vlad Mikulik 11 Jul 2018 22:16 UTC
      LW: 1 AF: 1
      0
      AF Parent
      I agree. That’s what I meant when I wrote there will be TMs that artificially promote S itself. However, this would still mean that most of S’s mass in the prior would be due to these TMs, and not due to the natural generator of the string.
      
      Furthermore, it’s unclear how many TMs would promote S vs S’ or other alternatives. Because of this, I don’t now whether the prior would be higher for S or S’ from this reasoning alone. Whichever is the case, the prior no longer reflects meaningful information about the universe that generates S and whose inhabitants are using the prefix to choose what to do; it’s dominated by these TMs that search for prefixes they can attempt to influence.
      - AlexMennen 12 Jul 2018 0:15 UTC
        LW: 5 AF: 2
        0
        AF Parent
        I didn’t mean that an agenty Turing machine would find S and then decide that it wants you to correctly predict S. I meant that to the extent that predicting S is commonly useful, there should be a simple underlying reason why it is commonly useful, and this reason should give you a natural way of computing S that does not have the overhead of any agency that decides whether or not it wants you to correctly predict S.
        paulfchristiano 12 Jul 2018 1:48 UTC
        LW: 2 AF: 1
        0
        AF Parent
        How many bits do you think it takes to specify the property “people’s predictions about S, using universal prior P, are very important”?
        (I think you’ll need to specify the universal prior P by reference to the universal prior that is actually used in the world containing the string S, if you spell out the prior P explicitly you are already sunk just from the ambiguity in the choice of language.)
        It seems relatively unlikely to me that this will be cheaper than specifying some arbitrary degree of freedom in a computationally rich universe that life can control (+ the extra log(fraction of degrees of freedom the consequentialists actually choose to control)). Of course it might.
        I agree that the entire game is in the constants—what is the cheapest way to pick out important strings.
        AlexMennen 12 Jul 2018 3:49 UTC
        LW: 2 AF: 1
        0
        AF Parent
        I don’t think that specifying the property of importance is simple and helps narrow down S. I think that in order for predicting S to be important, S must be generated by a simple process. Processes that take large numbers of bits to specify are correspondingly rarely occurring, and thus less useful to predict.
        paulfchristiano 12 Jul 2018 5:59 UTC
        LW: 6 AF: 3
        0
        AF Parent
        I don’t buy it. A camera that some robot is using to make decisions is no simpler than any other place on Earth, just more important.
        (This already gives the importance-weighted predictor a benefit of ~log(quadrillion))
        Clearly you need to e.g. make the anthropic update and do stuff like that before you have any chance of competing with the consequentialist. This might just be a quantitative difference about how simple is simple—like I said elsewhere, all the action is in the additive constants, I agree that the important things are “simple” in some sense.
        AlexMennen 12 Jul 2018 16:49 UTC
        LW: 2 AF: 1
        0
        AF Parent
        Ok, I see what you’re getting at now.