I agree that the decomposition of physics into laws+IC is much simpler than the decomposition of a human policy into p,R. (Is that what you mean by “more natural?”) But this is not relevant to my argument, I think.
I feel that our conversation now has branched into too many branches, some of which have been abandoned. In the interest of re-focusing the conversation, I’m going to answer the questions you asked and then ask a few new ones of my own.
To your questions: For me to understand your argument better I’d like to know more about what the pieces represent. Is file1 the degenerate pair and file2 the intended pair, and image1 the policy and image2 the bias-facts? Then what is the “unzip” function? Pairs don’t unzip to anything. You can apply the function “apply the first element of the pair to the second” or you can apply the function “do that, and then apply the MAXIMIZE function to the second element of the pair and compute the difference.” Or there are infinitely many other things you can do with the pair. But the pair itself doesn’t tell you what to do with it, unlike a zipped file which is like an algorithm—it tells you “run me.”
I have two questions. 1. My central claim—which I still uphold as not-ruled-out-by-your-arguments (though of course I don’t actually believe it) is the Occam Sufficiency Hypothesis: “The ‘intended’ pair is the simplest way to generate the policy.” So, basically, what OSH says is that within each degenerate pair is a term, pi (the policy), and when you crack open that term and see what it is made of, you see p(R), the intended policy applied to the intended reward function! Thus, a simplicity-based search will stumble across <p,R> before it stumbles across any of the degenerate pairs, because it needs p and R to construct the degenerate pairs. What part of this do you object to?
2. Earlier you said “given reasonable assumptions, the human policy is simpler than all pairs” What are those assumptions?
Once again, thanks for taking the time to engage with me on this! Sorry it took me so long to reply, I got busy with family stuff.
Indeed. It might be possible to construct that complex bias function, from the policy, in a simple way. But that claim needs to be supported, and the fact that it hasn’t been found so far (I repeat that it has to be simple) is evidence against it.
I agree that the decomposition of physics into laws+IC is much simpler than the decomposition of a human policy into p,R. (Is that what you mean by “more natural?”) But this is not relevant to my argument, I think.
I feel that our conversation now has branched into too many branches, some of which have been abandoned. In the interest of re-focusing the conversation, I’m going to answer the questions you asked and then ask a few new ones of my own.
To your questions: For me to understand your argument better I’d like to know more about what the pieces represent. Is file1 the degenerate pair and file2 the intended pair, and image1 the policy and image2 the bias-facts? Then what is the “unzip” function? Pairs don’t unzip to anything. You can apply the function “apply the first element of the pair to the second” or you can apply the function “do that, and then apply the MAXIMIZE function to the second element of the pair and compute the difference.” Or there are infinitely many other things you can do with the pair. But the pair itself doesn’t tell you what to do with it, unlike a zipped file which is like an algorithm—it tells you “run me.”
I have two questions. 1. My central claim—which I still uphold as not-ruled-out-by-your-arguments (though of course I don’t actually believe it) is the Occam Sufficiency Hypothesis: “The ‘intended’ pair is the simplest way to generate the policy.” So, basically, what OSH says is that within each degenerate pair is a term, pi (the policy), and when you crack open that term and see what it is made of, you see p(R), the intended policy applied to the intended reward function! Thus, a simplicity-based search will stumble across <p,R> before it stumbles across any of the degenerate pairs, because it needs p and R to construct the degenerate pairs. What part of this do you object to?
2. Earlier you said “given reasonable assumptions, the human policy is simpler than all pairs” What are those assumptions?
Once again, thanks for taking the time to engage with me on this! Sorry it took me so long to reply, I got busy with family stuff.
Yes.
The “shortest algorithm generating BLAH” is the maximally compressed way of expressing BLAH—the “zipped” version of BLAH.
Ignoring unzip, which isn’t very relevant, we know that the degenerate pairs are just above the policy in complexity.
So zip(degenerate pair) ≈ zip(policy), while zip(reasonable pair) > zip(policy+complex bias facts) (and zip(policy+complex bias facts) > zip(policy)).
Does that help?
It helps me to understand more clearly your argument. I still disagree with it though. I object to this:
I claim this begs the question against OSH. If OSH is true, then zip(reasonable pair) ≈ zip(policy).
Indeed. It might be possible to construct that complex bias function, from the policy, in a simple way. But that claim needs to be supported, and the fact that it hasn’t been found so far (I repeat that it has to be simple) is evidence against it.