Daniel Kokotajlo comments on Occam’s Razor May Be Sufficient to Infer the Preferences of Irrational Agents: A reply to Armstrong & Mindermann

Daniel Kokotajlo 26 Oct 2019 3:23 UTC
LW: 1 AF: 1
AF
Thanks again! I still disagree, surprise surprise.

I think I agree with you that the (p,R) decomposition is not a natural fact about the world, but I’m not so sure. Anyhow I don’t think it matters for our purposes.
No, the situation is very different. Physicists are trying to model and predict what is happening in the world (and in counterfactual worlds). This is equivalent with trying to figure out the human policy (which can be predicted from observations, as long as you include counterfactual ones). The decomposition of the policy into preferences and rationality is a separate step, very unlike what physicists are doing (quick way to check this: if physicists were unboundedly rational with infinite data, they could solve their problem; whereas we couldn’t, we’d still have to make decisions).
(if you want to talk about situations where we know some things but not all about the human policy, then the treatment is more complex, but ultimately the same arguments apply).
Physicists are trying to do many things. Yes, one thing they are trying to do is predict what it happening in the world. But another thing they are trying to do is figure out stuff about counterfactuals, and for that they need to have a Laws+IC decomposition to work with. So they take their data and they look for a simple Laws+IC decomposition that fits it. They would still do this even if they already knew the results of all the experiments ever, and had no more need to predict things. (Extending the symmetry, humans also typically use the intentional stance on incomplete data about a target human’s policy, for the purpose of predicting the rest of the policy. But this isn’t what you concern yourself with; you assume for the sake of argument that we already have the whole policy and point out that we’d still want to use the intentional stance to get a decomposition so that we could make judgments about rationality. I say yes, true, now apply the same reasoning to physics: assume for the sake of argument that we already know everything that will happen, all the events, and notice that we’d still want to have a Laws+IC decomposition, perhaps to figure out counterfactuals.)
Well, it depends. Suppose there are multiple TL (true laws) + IC that could generate E. In that case, TL+IC has more complexity than E, since you need to choose among the possible options. But if there is only one feasible TL+IC that generates E, then you can work backwards from E to get that TL+IC, and now you have all the counterfactual info, from E, as well.
I was assuming there were multiple Law+IC pairs that would generate E… well actually no, the example degenerate pairs I gave prove that there are, no need to assume it!
That argument shows that if you look into the algorithm, you can get other differences. But I’m not looking into the algorithm; I’m just using the decomposition into (p, R), and playing around with the p and R pieces, without looking inside.
I don’t see the difference between what you are doing and what I did. You started with a policy and said “But what about bias-facts? The policy by itself doesn’t tell us these facts. So let’s look at the various decompositions of the policy into p,R pairs; they tell us the bias facts.” I start with a number and say “But what about how-to-spell-the-word-that-refers-to-the-parts-of-a-written-number facts? The number doesn’t tell us that. Let’s look at the various decompositions of the number into strings of symbols that represent it; they tell us those facts.”
Among the degenerate pairs, the one with the indifferent planner has a bias of zero, the greedy planner has a bias of zero, and the anti-greedy planner has a bias of −1 at every timestep. So they do define bias functions, but particularly simple ones. Nothing like the complexity of the biases generated by the “proper” pair.
Thanks for the clarification—that’s what I suspected. So then every p,R pair compatible with the policy contains more information than the policy. Thus even the simplest p,R pair compatible with the policy contains more information than the policy. By analogous reasoning, every algorithm for constructing the policy contains more information than the policy. So even the simplest algorithm for constructing the policy contains more information than the policy. So (by your reasoning) even the simplest algorithm for constructing the policy is more complex than the policy. But this isn’t so; the simplest algorithm for constructing the policy is length L and so has complexity L, and the policy has complexity L too… That’s my argument at least. Again, maybe I’m misunderstanding how complexity works. But now that I’ve laid it out step-by-step, which step do you disagree with?

The relevance of information for complexity is this: given reasonable assumptions, the human policy is simpler than all pairs, …
Wait what? This is what I was objecting to in the original post. The “Occam Sufficiency Hypothesis” is that the human policy is not simpler than all pairs; in particular, it is precisely the simplicity of the intended pair, because the intended pair is the simplest way to construct the policy.
What are the reasonable assumptions that lead to the OSH being false?
My objection to your paper, in a nutshell, was that you didn’t discuss this part—you didn’t give any reason to think OSH was false. The three reasons you gave in Step 2 were reasons to think the intended pair is complex, not reasons to think it is more complex than the policy. Or so I argued.
--If it were true that Occam’s Razor can’t distinguish between P,R and -P,-R, then… isn’t that a pretty general argument against Occam’s Razor, not just in this domain but in other domains too?
No, because Occam’s razor works in other domains. This is a strong illustration that this domain is actually different.
My argument is that if you are right, Occam’s Razor would be generally useless, but i’s not, so you are wrong. In more detail: If Occam’s Razor can’t distinguish between P,R and -P,-R, then (by analogy) it an arbitrary domain it won’t be able to distinguish between theory X and theory b(X) where b() is some simple bizzaro function that negates or inverts the parts of X in such a way as to make it the changes cancel out.
- Stuart_Armstrong 28 Oct 2019 12:28 UTC
  LW: 2 AF: 1
  AF Parent
  I’m not sure the physics analogy is getting us very far—I feel there is a very natural way of decomposing physics into laws+initial conditions, while there is no such natural way of doing so for preferences and rationality. But if we have different intuitions on that, then discussing the analogy doesn’t isn’t going to help us converge!
  
  So then every p,R pair compatible with the policy contains more information than the policy. Thus even the simplest p,R pair compatible with the policy contains more information than the policy.
  
  Agreed (though the extra information may be tiny—a few extra symbols).
  
  By analogous reasoning, every algorithm for constructing the policy contains more information than the policy.
  
  That does not follow; the simplest algorithm for building a policy does not go via decomposing into two pieces and then recombining them. We are comparing algorithms that produce a planner-reward pair (two outputs) with algorithms that produce a policy (one output). (but your whole argument shows you may be slightly misunderstanding complexity in this context).
  
  Now, though all pairs are slightly more complex than the policy itself, the bias argument shows that the “proper” pair is considerably more complex. To use an analogy: suppose file1 and file2 are both maximally zipped files. When you unzip file1, you produce image1 (and maybe a small, blank, image2). When you unzip file2, you also produce the same image1, and a large, complex, image2′. Then, as long as image1 and image2′ are at least slightly independent, file2 has to be larger than file1. The more complex image2′ is, and the more independent it is from image1, the larger file2 has to be.
  
  Does that make sense?
  - Daniel Kokotajlo 6 Nov 2019 0:15 UTC
    LW: 1 AF: 1
    AF Parent
    I agree that the decomposition of physics into laws+IC is much simpler than the decomposition of a human policy into p,R. (Is that what you mean by “more natural?”) But this is not relevant to my argument, I think.
    I feel that our conversation now has branched into too many branches, some of which have been abandoned. In the interest of re-focusing the conversation, I’m going to answer the questions you asked and then ask a few new ones of my own.
    To your questions: For me to understand your argument better I’d like to know more about what the pieces represent. Is file1 the degenerate pair and file2 the intended pair, and image1 the policy and image2 the bias-facts? Then what is the “unzip” function? Pairs don’t unzip to anything. You can apply the function “apply the first element of the pair to the second” or you can apply the function “do that, and then apply the MAXIMIZE function to the second element of the pair and compute the difference.” Or there are infinitely many other things you can do with the pair. But the pair itself doesn’t tell you what to do with it, unlike a zipped file which is like an algorithm—it tells you “run me.”
    I have two questions. 1. My central claim—which I still uphold as not-ruled-out-by-your-arguments (though of course I don’t actually believe it) is the Occam Sufficiency Hypothesis: “The ‘intended’ pair is the simplest way to generate the policy.” So, basically, what OSH says is that within each degenerate pair is a term, pi (the policy), and when you crack open that term and see what it is made of, you see p(R), the intended policy applied to the intended reward function! Thus, a simplicity-based search will stumble across <p,R> before it stumbles across any of the degenerate pairs, because it needs p and R to construct the degenerate pairs. What part of this do you object to?
    2. Earlier you said “given reasonable assumptions, the human policy is simpler than all pairs” What are those assumptions?
    Once again, thanks for taking the time to engage with me on this! Sorry it took me so long to reply, I got busy with family stuff.
    - Stuart_Armstrong 6 Nov 2019 12:50 UTC
      LW: 2 AF: 1
      AF Parent
      
      Is file1 the degenerate pair and file2 the intended pair, and image1 the policy and image2 the bias-facts?
      
      Yes.
      
      Then what is the “unzip” function?
      
      The “shortest algorithm generating BLAH” is the maximally compressed way of expressing BLAH—the “zipped” version of BLAH.
      
      Ignoring unzip, which isn’t very relevant, we know that the degenerate pairs are just above the policy in complexity.
      
      So zip(degenerate pair) $\approx$ zip(policy), while zip(reasonable pair) > zip(policy+complex bias facts) (and zip(policy+complex bias facts) > zip(policy)).
      
      Does that help?
      - Daniel Kokotajlo 6 Nov 2019 22:16 UTC
        LW: 1 AF: 1
        AF Parent
        It helps me to understand more clearly your argument. I still disagree with it though. I object to this:
        zip(reasonable pair) > zip(policy+complex bias facts)
        I claim this begs the question against OSH. If OSH is true, then zip(reasonable pair) ≈ zip(policy).
        
        Stuart_Armstrong 8 Nov 2019 13:59 UTC
        LW: 2 AF: 1
        AF Parent
        Indeed. It might be possible to construct that complex bias function, from the policy, in a simple way. But that claim needs to be supported, and the fact that it hasn’t been found so far (I repeat that it has to be simple) is evidence against it.