Stuart_Armstrong comments on Occam’s Razor May Be Sufficient to Infer the Preferences of Irrational Agents: A reply to Armstrong & Mindermann

Stuart_Armstrong 28 Oct 2019 12:28 UTC
LW: 2 AF: 1
AF
I’m not sure the physics analogy is getting us very far—I feel there is a very natural way of decomposing physics into laws+initial conditions, while there is no such natural way of doing so for preferences and rationality. But if we have different intuitions on that, then discussing the analogy doesn’t isn’t going to help us converge!

So then every p,R pair compatible with the policy contains more information than the policy. Thus even the simplest p,R pair compatible with the policy contains more information than the policy.

Agreed (though the extra information may be tiny—a few extra symbols).

By analogous reasoning, every algorithm for constructing the policy contains more information than the policy.

That does not follow; the simplest algorithm for building a policy does not go via decomposing into two pieces and then recombining them. We are comparing algorithms that produce a planner-reward pair (two outputs) with algorithms that produce a policy (one output). (but your whole argument shows you may be slightly misunderstanding complexity in this context).

Now, though all pairs are slightly more complex than the policy itself, the bias argument shows that the “proper” pair is considerably more complex. To use an analogy: suppose file1 and file2 are both maximally zipped files. When you unzip file1, you produce image1 (and maybe a small, blank, image2). When you unzip file2, you also produce the same image1, and a large, complex, image2′. Then, as long as image1 and image2′ are at least slightly independent, file2 has to be larger than file1. The more complex image2′ is, and the more independent it is from image1, the larger file2 has to be.

Does that make sense?
- Daniel Kokotajlo 6 Nov 2019 0:15 UTC
  LW: 1 AF: 1
  AF Parent
  I agree that the decomposition of physics into laws+IC is much simpler than the decomposition of a human policy into p,R. (Is that what you mean by “more natural?”) But this is not relevant to my argument, I think.
  I feel that our conversation now has branched into too many branches, some of which have been abandoned. In the interest of re-focusing the conversation, I’m going to answer the questions you asked and then ask a few new ones of my own.
  To your questions: For me to understand your argument better I’d like to know more about what the pieces represent. Is file1 the degenerate pair and file2 the intended pair, and image1 the policy and image2 the bias-facts? Then what is the “unzip” function? Pairs don’t unzip to anything. You can apply the function “apply the first element of the pair to the second” or you can apply the function “do that, and then apply the MAXIMIZE function to the second element of the pair and compute the difference.” Or there are infinitely many other things you can do with the pair. But the pair itself doesn’t tell you what to do with it, unlike a zipped file which is like an algorithm—it tells you “run me.”
  I have two questions. 1. My central claim—which I still uphold as not-ruled-out-by-your-arguments (though of course I don’t actually believe it) is the Occam Sufficiency Hypothesis: “The ‘intended’ pair is the simplest way to generate the policy.” So, basically, what OSH says is that within each degenerate pair is a term, pi (the policy), and when you crack open that term and see what it is made of, you see p(R), the intended policy applied to the intended reward function! Thus, a simplicity-based search will stumble across <p,R> before it stumbles across any of the degenerate pairs, because it needs p and R to construct the degenerate pairs. What part of this do you object to?
  2. Earlier you said “given reasonable assumptions, the human policy is simpler than all pairs” What are those assumptions?
  Once again, thanks for taking the time to engage with me on this! Sorry it took me so long to reply, I got busy with family stuff.
  - Stuart_Armstrong 6 Nov 2019 12:50 UTC
    LW: 2 AF: 1
    AF Parent
    
    Is file1 the degenerate pair and file2 the intended pair, and image1 the policy and image2 the bias-facts?
    
    Yes.
    
    Then what is the “unzip” function?
    
    The “shortest algorithm generating BLAH” is the maximally compressed way of expressing BLAH—the “zipped” version of BLAH.
    
    Ignoring unzip, which isn’t very relevant, we know that the degenerate pairs are just above the policy in complexity.
    
    So zip(degenerate pair) $\approx$ zip(policy), while zip(reasonable pair) > zip(policy+complex bias facts) (and zip(policy+complex bias facts) > zip(policy)).
    
    Does that help?
    - Daniel Kokotajlo 6 Nov 2019 22:16 UTC
      LW: 1 AF: 1
      AF Parent
      It helps me to understand more clearly your argument. I still disagree with it though. I object to this:
      zip(reasonable pair) > zip(policy+complex bias facts)
      I claim this begs the question against OSH. If OSH is true, then zip(reasonable pair) ≈ zip(policy).
      - Stuart_Armstrong 8 Nov 2019 13:59 UTC
        LW: 2 AF: 1
        AF Parent
        Indeed. It might be possible to construct that complex bias function, from the policy, in a simple way. But that claim needs to be supported, and the fact that it hasn’t been found so far (I repeat that it has to be simple) is evidence against it.