The witch hypothesis is going to do relatively well at explaining strange, rare events that could have been produced by humans casting low-complexity magic spells. It will do relatively badly at concisely explaining observations that contain a whole lot of mundane, redundant information (e.g. a video taken by a person walking through a building); there are other hypotheses that assign much higher probabilities to these.
In particular, given the pigeonhole principle, the witch hypothesis isn’t going to help at all for compressing random noise, and for similar reasons, for any simple stochastic model (e.g. a probabilistic program), it isn’t going produce better codes (on average) for stuff that model would have produced than the model itself would have, as the KL divergence between the distributions is positive.
Quoting Eliezer (from the page you linked):
This lets us see clearly the problem with using “The lady down the street is a witch; she did it” to explain the pattern in the sequence “0101010101″. If you’re sending a message to a friend, trying to describe the sequence you observed, you would have to say: “The lady down the street is a witch; she made the sequence come out 0101010101.” Your accusation of witchcraft wouldn’t let you shorten the rest of the message; you would still have to describe, in full detail, the data which her witchery caused.
There is a somewhat general problem with MDL-type frameworks, which is that they will often posit magic in the absence of having a good causal model, as opposed to being “confused” and thinking that some good causal model exists but is currently unknown. (This is mainly a problem for bounded approximations to MDL, rather than unbounded MDL)
EDIT: additional thoughts copied from the thread:
If the human can execute arbitrary programs and is computable, and can interpret messages of the form “run this program on the rest of the message”, then by definition the human brain based computer is a UTM, so it can be used in Solomonoff induction, weirdly enough. However, there is a concern in that the UTM is meant to be a prior, whereas the brain is more of a representation of a posterior. So it will be able to overfit things you already know. This might not be a problem if you were using CDT but would be a problem for UDT, since UDT isn’t supposed to change its prior as it gets more observations (this would make it stop caring about non-actual worlds in e.g. counterfactual mugging, leading to dynamic inconsistency).
In general if you’re using UDT then your choice of prior is a choice of which possible worlds you care about and how much. There won’t be universally compelling arguments for a particular prior the same way there aren’t universally compelling arguments for particular values.
Hmm, so I agree that if your message is a short binary string, then obviously the description length of that binary string will be much shorter in Python than it is on a brain.
But if (as is usual in my daily experience) the message is a bunch of human visual sensory input, then I expect a human brain to have a much shorter description length of that input than a Python script. A python script would probably have to include a simulation of a human brain in order to produce the sensory input, which is likely going to be extremely complex.
At the end, the human somehow encodes the sensory data (on paper?), so this data needs an explicit representation, such as a video. I don’t know what you mean for “the human brain to have a much shorter description length of that input”; in the procedure you described, this would seem to mean “the human does not need a long instruction string to uniquely write down the video of the sensory input”, and this runs into the same issue as with binary strings. This is essentially a data compression problem.
There will never be a huge penalty for brain-encoding instead of python-encoding if the human is very long-lived, stable, and good at following instructions, since the instructions can say “run the following Python program: def foo(x): …”; in fact this is necessary for this setup to be a UTM in the formal sense required for Solomonoff induction.
Oh, yeah. I think I was a bit confused in what I said. I wanted to highlight the difference between a short binary string, and a really complicated video feed, which probably requires a pretty decent model of the environment and which would probably benefit a lot from the knowledge that a human brain has.
I think the crux for me is less whether a specific human brain is a good choice for the UTM, and more that for any given input-history I have, I can construct a UTM such that the description length of that input-history is arbitrarily short, and so the choice of UTM is really really important in any “practical” scenario.
Given that, there must be some other argument for what we should choose as the UTM, probably so that short inputs into that UTM roughly correspond with our intuitions for simplicity. The two choices here that I feel tend to result in things that roughly match my natural intuition for elegance is either a programming-language interpreter, or a human brain, though the later one feels weirdly circular. Hence the question of whether that’s even a valid construction.
(Note: This has already been helpful in helping me think through this, so thank you! :) )
I see how the human brain based computer would give an advantage in encoding a video feed. I still don’t see how this relates to the witch hypothesis. You still have to say what the witch did and why that got you the video feed you got, right? The witch hypothesis will most likely only beat the best possible causal model if there actually is magic going on (though, there is a problem in that you might not know the best possible causal model).
If the human can execute arbitrary programs and is computable, and can interpret messages of the form “run this program on the rest of the message”, then by definition the human brain based computer is a UTM, so it can be used in Solomonoff induction, weirdly enough. However, there is a concern in that the UTM is meant to be a prior, whereas the brain is more of a representation of a posterior. So it will be able to overfit things you already know. This might not be a problem if you were using CDT but would be a problem for UDT, since UDT isn’t supposed to change its prior as it gets more observations (this would make it stop caring about non-actual worlds in e.g. counterfactual mugging, leading to dynamic inconsistency).
In general if you’re using UDT then your choice of prior is a choice of which possible worlds you care about and how much. There won’t be universally compelling arguments for a particular prior the same way there aren’t universally compelling arguments for particular values.
(also, you’re welcome, this was useful to think about from my end too!)
I think the last paragraph was the most clarifying to me in the exchange so far. If you would be up for it, I think it would be great if you could edit your top-level comment to include that paragraph and maybe also some of the other things said in this thread (though obviously no obligation, just seems better for future people who might have a similar question, to have everything in one top-level place).
The witch hypothesis is going to do relatively well at explaining strange, rare events that could have been produced by humans casting low-complexity magic spells. It will do relatively badly at concisely explaining observations that contain a whole lot of mundane, redundant information (e.g. a video taken by a person walking through a building); there are other hypotheses that assign much higher probabilities to these.
In particular, given the pigeonhole principle, the witch hypothesis isn’t going to help at all for compressing random noise, and for similar reasons, for any simple stochastic model (e.g. a probabilistic program), it isn’t going produce better codes (on average) for stuff that model would have produced than the model itself would have, as the KL divergence between the distributions is positive.
Quoting Eliezer (from the page you linked):
There is a somewhat general problem with MDL-type frameworks, which is that they will often posit magic in the absence of having a good causal model, as opposed to being “confused” and thinking that some good causal model exists but is currently unknown. (This is mainly a problem for bounded approximations to MDL, rather than unbounded MDL)
EDIT: additional thoughts copied from the thread:
If the human can execute arbitrary programs and is computable, and can interpret messages of the form “run this program on the rest of the message”, then by definition the human brain based computer is a UTM, so it can be used in Solomonoff induction, weirdly enough. However, there is a concern in that the UTM is meant to be a prior, whereas the brain is more of a representation of a posterior. So it will be able to overfit things you already know. This might not be a problem if you were using CDT but would be a problem for UDT, since UDT isn’t supposed to change its prior as it gets more observations (this would make it stop caring about non-actual worlds in e.g. counterfactual mugging, leading to dynamic inconsistency).
In general if you’re using UDT then your choice of prior is a choice of which possible worlds you care about and how much. There won’t be universally compelling arguments for a particular prior the same way there aren’t universally compelling arguments for particular values.
Hmm, so I agree that if your message is a short binary string, then obviously the description length of that binary string will be much shorter in Python than it is on a brain.
But if (as is usual in my daily experience) the message is a bunch of human visual sensory input, then I expect a human brain to have a much shorter description length of that input than a Python script. A python script would probably have to include a simulation of a human brain in order to produce the sensory input, which is likely going to be extremely complex.
At the end, the human somehow encodes the sensory data (on paper?), so this data needs an explicit representation, such as a video. I don’t know what you mean for “the human brain to have a much shorter description length of that input”; in the procedure you described, this would seem to mean “the human does not need a long instruction string to uniquely write down the video of the sensory input”, and this runs into the same issue as with binary strings. This is essentially a data compression problem.
There will never be a huge penalty for brain-encoding instead of python-encoding if the human is very long-lived, stable, and good at following instructions, since the instructions can say “run the following Python program: def foo(x): …”; in fact this is necessary for this setup to be a UTM in the formal sense required for Solomonoff induction.
Oh, yeah. I think I was a bit confused in what I said. I wanted to highlight the difference between a short binary string, and a really complicated video feed, which probably requires a pretty decent model of the environment and which would probably benefit a lot from the knowledge that a human brain has.
I think the crux for me is less whether a specific human brain is a good choice for the UTM, and more that for any given input-history I have, I can construct a UTM such that the description length of that input-history is arbitrarily short, and so the choice of UTM is really really important in any “practical” scenario.
Given that, there must be some other argument for what we should choose as the UTM, probably so that short inputs into that UTM roughly correspond with our intuitions for simplicity. The two choices here that I feel tend to result in things that roughly match my natural intuition for elegance is either a programming-language interpreter, or a human brain, though the later one feels weirdly circular. Hence the question of whether that’s even a valid construction.
(Note: This has already been helpful in helping me think through this, so thank you! :) )
I see how the human brain based computer would give an advantage in encoding a video feed. I still don’t see how this relates to the witch hypothesis. You still have to say what the witch did and why that got you the video feed you got, right? The witch hypothesis will most likely only beat the best possible causal model if there actually is magic going on (though, there is a problem in that you might not know the best possible causal model).
If the human can execute arbitrary programs and is computable, and can interpret messages of the form “run this program on the rest of the message”, then by definition the human brain based computer is a UTM, so it can be used in Solomonoff induction, weirdly enough. However, there is a concern in that the UTM is meant to be a prior, whereas the brain is more of a representation of a posterior. So it will be able to overfit things you already know. This might not be a problem if you were using CDT but would be a problem for UDT, since UDT isn’t supposed to change its prior as it gets more observations (this would make it stop caring about non-actual worlds in e.g. counterfactual mugging, leading to dynamic inconsistency).
In general if you’re using UDT then your choice of prior is a choice of which possible worlds you care about and how much. There won’t be universally compelling arguments for a particular prior the same way there aren’t universally compelling arguments for particular values.
(also, you’re welcome, this was useful to think about from my end too!)
I think the last paragraph was the most clarifying to me in the exchange so far. If you would be up for it, I think it would be great if you could edit your top-level comment to include that paragraph and maybe also some of the other things said in this thread (though obviously no obligation, just seems better for future people who might have a similar question, to have everything in one top-level place).
What does MDL stand for?
Edit: Figured it out. It’s Minimum Description Length.