If it is also true that hypotheses which are easier to locate make more predictions
This doesn’t sound right to me. Imagine you’re tossing a coin repeatedly. Hypothesis 1 says the coin is fair. Hypothesis 2 says the coin repeats the sequence HTTTHHTHTHTTTT over and over in a loop. The second hypothesis is harder to locate, but makes a stronger prediction.
The proper formalization for your concept of locate-ability is the Solomonoff prior. Unfortunately we can’t do inference based on it because it’s uncomputable.
Maxent and friends aren’t motivated by a desire to formalize locate-ability. Maxent is the “most uniform” distribution on a space of hypotheses; the “Jeffreys rule” is a means of constructing priors that are invariant under reparameterizations of the space of hypotheses; “matching priors” give you frequentist coverage guarantees, and so on.
Please don’t take my words for gospel just because I sound knowledgeable! At this point I recommend you to actually study the math and come to your own conclusions. Maybe contact user Cyan, he’s a professional statistician who inspired me to learn this stuff. IMO, discussing Bayesianism as some kind of philosophical system without digging into the math is counterproductive, though people around here do that a lot.
I’m in the process of digging into the math, so hopefully some point soon I’ll be able to back up my suspicions in a more rigorous way.
This doesn’t sound right to me. Imagine you’re tossing a coin repeatedly. Hypothesis 1 says the coin is fair. Hypothesis 2 says the coin repeats the sequence HTTTHHTHTHTTTT over and over in a loop. The second hypothesis is harder to locate, but makes a stronger prediction.
I was talking about the number of predictions, not their strength. So Hypothesis 1 predicts any sequence of coin-flips that converges on 50%, and Hypothesis 2 predicts only sequences that repeat HTTTHHTHTHTTTT. Hypothesis 1 explains many more possible worlds than Hypothesis 2, and so without evidence as to which world we inhabit, Hypothesis 1 is much more likely.
Since I’ve already conceded that being a Perfect Bayesian is impossible, I’m not surprised to hear that measuring locate-ability is likewise impossible (especially because the one reduces to the other). It just means that we should determine prior probabilities by approximating Solomonoff complexity as best we can.
Thanks for taking the time to comment, by the way.
Then let’s try this. Hypothesis 1 says the sequence will consist of only H repeated forever. Hypothesis 2 says the sequence will be either HTTTHHTHTHTTTT repeated forever, or TTHTHTTTHTHHHHH repeated forever. The second one is harder to locate, but describes two possible worlds rather than one.
Maybe your idea can be fixed somehow, but I see no way yet. Keep digging.
I’ve just reread Eliezer’s post on Occam’s Razor and it seems to have clarified my thinking a little.
I originally said:
If it is also true that hypotheses which are easier to locate make more predictions… then we are perfectly justified in assigning a probability to a hypothesis based on it’s locate-ability.
But I would now say:
If it is also true that hypotheses with a shorter minimum message length make more predictions relative to that minimum message length than do hypotheses with longer MMLs… then we are perfectly justified in assigning a probability to a hypothesis based on MML.
This solves the problem your counterexample presents: Hypothesis 1 describes only one possible world, but Hypothesis 2 requires say, ~30 more bits of information (for those particular strings of results, plus a disjunction) to describe only two possible worlds, making it 2^30 / 2 times less likely.
Then let’s try this. Hypothesis 1 says the sequence will consist of only H repeated forever. Hypothesis 2 says the sequence will be HTTTHHTHTHTTTT repeated forever, where the can take different values on each repetition. The second hypothesis is harder to locate but describes an infinite number of possible worlds :-)
The problem with this counterexample is that you can’t actually repeat something forever.
Even taking the case where we repeat each sequence 1000 times, which seems like it should be similar, you’ll end up with 1000 coin flips and 15000 coin flips for Hypothesis 1 and Hypothesis 2, respectively. So the odds of being in a world where Hypothesis 1 is true are 1 in 2^1000, but the odds of being in a world where Hypothesis 2 is true are 1 in 2^15000.
It’s an apples to balloons comparison, basically.
(I spent about twenty minutes staring at an empty comment box and sweating blood before I figured this out, for the record.)
I think this is still wrong. Take the finite case where both hypotheses are used to explain sequences of a billion throws. Then the first hypothesis describes one world, and the second one describes an exponentially huge number of worlds. You seem to think that the length of the sequence should depend on the length of the hypothesis, and I don’t understand why.
This doesn’t sound right to me. Imagine you’re tossing a coin repeatedly. Hypothesis 1 says the coin is fair. Hypothesis 2 says the coin repeats the sequence HTTTHHTHTHTTTT over and over in a loop. The second hypothesis is harder to locate, but makes a stronger prediction.
The proper formalization for your concept of locate-ability is the Solomonoff prior. Unfortunately we can’t do inference based on it because it’s uncomputable.
Maxent and friends aren’t motivated by a desire to formalize locate-ability. Maxent is the “most uniform” distribution on a space of hypotheses; the “Jeffreys rule” is a means of constructing priors that are invariant under reparameterizations of the space of hypotheses; “matching priors” give you frequentist coverage guarantees, and so on.
Please don’t take my words for gospel just because I sound knowledgeable! At this point I recommend you to actually study the math and come to your own conclusions. Maybe contact user Cyan, he’s a professional statistician who inspired me to learn this stuff. IMO, discussing Bayesianism as some kind of philosophical system without digging into the math is counterproductive, though people around here do that a lot.
I’m in the process of digging into the math, so hopefully some point soon I’ll be able to back up my suspicions in a more rigorous way.
I was talking about the number of predictions, not their strength. So Hypothesis 1 predicts any sequence of coin-flips that converges on 50%, and Hypothesis 2 predicts only sequences that repeat HTTTHHTHTHTTTT. Hypothesis 1 explains many more possible worlds than Hypothesis 2, and so without evidence as to which world we inhabit, Hypothesis 1 is much more likely.
Since I’ve already conceded that being a Perfect Bayesian is impossible, I’m not surprised to hear that measuring locate-ability is likewise impossible (especially because the one reduces to the other). It just means that we should determine prior probabilities by approximating Solomonoff complexity as best we can.
Thanks for taking the time to comment, by the way.
Then let’s try this. Hypothesis 1 says the sequence will consist of only H repeated forever. Hypothesis 2 says the sequence will be either HTTTHHTHTHTTTT repeated forever, or TTHTHTTTHTHHHHH repeated forever. The second one is harder to locate, but describes two possible worlds rather than one.
Maybe your idea can be fixed somehow, but I see no way yet. Keep digging.
I’ve just reread Eliezer’s post on Occam’s Razor and it seems to have clarified my thinking a little.
I originally said:
But I would now say:
This solves the problem your counterexample presents: Hypothesis 1 describes only one possible world, but Hypothesis 2 requires say, ~30 more bits of information (for those particular strings of results, plus a disjunction) to describe only two possible worlds, making it 2^30 / 2 times less likely.
Then let’s try this. Hypothesis 1 says the sequence will consist of only H repeated forever. Hypothesis 2 says the sequence will be HTTTHHTHTHTTTT repeated forever, where the can take different values on each repetition. The second hypothesis is harder to locate but describes an infinite number of possible worlds :-)
If at first you don’t succeed, try, try again!
The problem with this counterexample is that you can’t actually repeat something forever.
Even taking the case where we repeat each sequence 1000 times, which seems like it should be similar, you’ll end up with 1000 coin flips and 15000 coin flips for Hypothesis 1 and Hypothesis 2, respectively. So the odds of being in a world where Hypothesis 1 is true are 1 in 2^1000, but the odds of being in a world where Hypothesis 2 is true are 1 in 2^15000.
It’s an apples to balloons comparison, basically.
(I spent about twenty minutes staring at an empty comment box and sweating blood before I figured this out, for the record.)
I think this is still wrong. Take the finite case where both hypotheses are used to explain sequences of a billion throws. Then the first hypothesis describes one world, and the second one describes an exponentially huge number of worlds. You seem to think that the length of the sequence should depend on the length of the hypothesis, and I don’t understand why.
That is an awesome counter-example, thank you. I think I may wait to ponder this further until I have a better grasp of the math involved.