I thought about this more and re-read the A&M paper, and I now have a different line of thinking compared to my previous comments.
I still think A&M’s No Free Lunch theorem goes through, but now I think A&M are proving the wrong theorem. A&M try to find the simplest (planner, reward) decomposition that is compatible with the human policy, but it seems like we instead additionally want compatibility with all the evidence we have observed, including sensory data of humans saying things like “if I was more rational, I would be exercising right now instead of watching TV” and “no really, my reward function is not empty”. The important point is that such sensory data gives us information not just about the human policy, but also about the decomposition. Forcing compatibility with this sensory data seems to rule out degenerate pairs. This makes me feel like Occam’s Razor would work for inferring preferences up to a certain point (i.e. as long as the situations are all “in-distribution”).
If we are trying to find the (planner, reward) decomposition of non-human minds: I think if we were randomly handed a mind from all of mind design space, then A&M’s No Free Lunch theorem would apply, because the simplest explanation really is that the mind has a degenerate decomposition. But if we were randomly handed an alien mind from our universe, then we would be able to use all the facts we have learned about our universe, including how the aliens likely evolved, any statements they seem to be making about what they value, and so on.
Does this line of thinking also apply to the case of science? I think not, because we wouldn’t be able to use our observations to get information about the decomposition. Unlike the case of values, the natural world isn’t making statements like “actually, the laws are empty and all the complexity is in the initial conditions”. I still don’t think the No Free Lunch theorem works for science either, because of my previous comments.
The problem is that the non-subjective evidence does not map onto facts about the decomposition. A human claims X; well, that’s a speech act; are they telling the truth or not, and how do we know? Same for sensory data, which is mainly data about the brain correlated with facts about the outside world; to interpret that, we need to solve human symbol grounding.
All these ideas are in the research agenda (especially section 2). Just as you need something to bridge the is-ought gap, you need some assumptions to make evidence in the world (eg speech acts) correspond to preference-relevant facts.
Hmm, I like that. I wonder what A&M would say in response. And I agree this is an important and relevant difference between the case of preferences and the case of science.
I still don’t think A&M show that the simplest explanation is a degenerate decomposition. They show that if it is, then Occam’s Razor won’t be sufficient, and moreover that there are some degenerate decompositions pretty close to maximally simple. But they don’t do much to rule out the possibility that the simplest explanation is the intended one.
I thought about this more and re-read the A&M paper, and I now have a different line of thinking compared to my previous comments.
I still think A&M’s No Free Lunch theorem goes through, but now I think A&M are proving the wrong theorem. A&M try to find the simplest (planner, reward) decomposition that is compatible with the human policy, but it seems like we instead additionally want compatibility with all the evidence we have observed, including sensory data of humans saying things like “if I was more rational, I would be exercising right now instead of watching TV” and “no really, my reward function is not empty”. The important point is that such sensory data gives us information not just about the human policy, but also about the decomposition. Forcing compatibility with this sensory data seems to rule out degenerate pairs. This makes me feel like Occam’s Razor would work for inferring preferences up to a certain point (i.e. as long as the situations are all “in-distribution”).
If we are trying to find the (planner, reward) decomposition of non-human minds: I think if we were randomly handed a mind from all of mind design space, then A&M’s No Free Lunch theorem would apply, because the simplest explanation really is that the mind has a degenerate decomposition. But if we were randomly handed an alien mind from our universe, then we would be able to use all the facts we have learned about our universe, including how the aliens likely evolved, any statements they seem to be making about what they value, and so on.
Does this line of thinking also apply to the case of science? I think not, because we wouldn’t be able to use our observations to get information about the decomposition. Unlike the case of values, the natural world isn’t making statements like “actually, the laws are empty and all the complexity is in the initial conditions”. I still don’t think the No Free Lunch theorem works for science either, because of my previous comments.
That is the whole point of my research agenda: https://www.lesswrong.com/posts/CSEdLLEkap2pubjof/research-agenda-v0-9-synthesising-a-human-s-preferences-into
The problem is that the non-subjective evidence does not map onto facts about the decomposition. A human claims X; well, that’s a speech act; are they telling the truth or not, and how do we know? Same for sensory data, which is mainly data about the brain correlated with facts about the outside world; to interpret that, we need to solve human symbol grounding.
All these ideas are in the research agenda (especially section 2). Just as you need something to bridge the is-ought gap, you need some assumptions to make evidence in the world (eg speech acts) correspond to preference-relevant facts.
This video may also illustrate the issues: https://www.youtube.com/watch?v=1M9CvESSeVc&t=1s
Hmm, I like that. I wonder what A&M would say in response. And I agree this is an important and relevant difference between the case of preferences and the case of science.
I still don’t think A&M show that the simplest explanation is a degenerate decomposition. They show that if it is, then Occam’s Razor won’t be sufficient, and moreover that there are some degenerate decompositions pretty close to maximally simple. But they don’t do much to rule out the possibility that the simplest explanation is the intended one.