My impression is that Solomonoff induction starts by assuming the Occam’s Razor.
no matter what probabilities you assign, almost all very complex hypotheses have to be very improbable because otherwise your total probability has to be infinite.
That’s not a problem—all simple hypotheses can be just as improbable.
Again, I am not saying that Occam’s Razor is not a useful heuristic. It is. But it is not evidence.
I am not saying that Occam’s Razor is not a useful heuristic. It is. But it is not evidence.
Can you restate what you consider the use of Occam’s Razor to be, and what you consider evidence to be for?
Because from my perspective the purpose of evidence is to increase/decrease my confidence in various statements, and it seems to me that Occam’s Razor is useful for doing precisely that. So this distinction doesn’t make a lot of sense to me, and rereading the thread doesn’t clarify matters.
My impression is that Solomonoff induction starts by assuming the Occam’s Razor.
The fact that it buys you something interesting without making that assumption was the whole point of the paragraph you were commenting on.
That’s not a problem—all simple hypotheses can be just as improbable.
I don’t believe that is true. Perhaps I’ve been insufficiently clear by trying to be brief (the difficulty being that “very complex” is really shorthand for something involving a limiting process), so let me be less brief.
First: Suppose you have a list of mutually exclusive hypotheses H1, H2, etc., with probabilities p1, p2, etc. List them in increasing order of complexity. Then the sum of all the pj is finite, and therefore as j → infinity pj → zero. Hence, “very complex hypotheses (in this list) have to be very improbable” in the following sense: for any probability p, however small, there’s a level of complexity C such that every hypothesis from your list whose complexity is at least C has probability smaller than p.
That doesn’t quite mean that very complex hypotheses have to be improbable. Indeed, you can construct very complex high-probability hypotheses as very long disjunctions. And since p and ~p have about the same complexity for any p, it must in some sense be true that about as many very complex propositions have high probabilities as have low probabilities. (So what I said certainly wasn’t quite right.)
However, I bet something along the following lines is true. Suppose you have a probability distribution over propositions (this is for generating them, and isn’t meant to have anything directly to do with the probability that each proposition is true), and suppose we also assign all the propositions probabilities in a way consistent with the laws of probability theory. (I’m assuming here that our class of propositions is closed under the usual logical operations.) And suppose we also assign all the propositions complexities in any reasonable way. Define the essential complexity of a proposition to be the infimum of the complexities of propositions that imply it. (I’m pretty sure it’s always attained.) Then I conjecture that something like this is both true and fairly easy to prove: for any fixed probability level q, as C → oo, if you generate a proposition at random (according to the “generating” distribution) conditional on its essential complexity being at least C, then Pr(its probability >= q) tends to 0.
My impression is that Solomonoff induction starts by assuming the Occam’s Razor.
That’s not a problem—all simple hypotheses can be just as improbable.
Again, I am not saying that Occam’s Razor is not a useful heuristic. It is. But it is not evidence.
Can you restate what you consider the use of Occam’s Razor to be, and what you consider evidence to be for?
Because from my perspective the purpose of evidence is to increase/decrease my confidence in various statements, and it seems to me that Occam’s Razor is useful for doing precisely that. So this distinction doesn’t make a lot of sense to me, and rereading the thread doesn’t clarify matters.
The fact that it buys you something interesting without making that assumption was the whole point of the paragraph you were commenting on.
I don’t believe that is true. Perhaps I’ve been insufficiently clear by trying to be brief (the difficulty being that “very complex” is really shorthand for something involving a limiting process), so let me be less brief.
First: Suppose you have a list of mutually exclusive hypotheses H1, H2, etc., with probabilities p1, p2, etc. List them in increasing order of complexity. Then the sum of all the pj is finite, and therefore as j → infinity pj → zero. Hence, “very complex hypotheses (in this list) have to be very improbable” in the following sense: for any probability p, however small, there’s a level of complexity C such that every hypothesis from your list whose complexity is at least C has probability smaller than p.
That doesn’t quite mean that very complex hypotheses have to be improbable. Indeed, you can construct very complex high-probability hypotheses as very long disjunctions. And since p and ~p have about the same complexity for any p, it must in some sense be true that about as many very complex propositions have high probabilities as have low probabilities. (So what I said certainly wasn’t quite right.)
However, I bet something along the following lines is true. Suppose you have a probability distribution over propositions (this is for generating them, and isn’t meant to have anything directly to do with the probability that each proposition is true), and suppose we also assign all the propositions probabilities in a way consistent with the laws of probability theory. (I’m assuming here that our class of propositions is closed under the usual logical operations.) And suppose we also assign all the propositions complexities in any reasonable way. Define the essential complexity of a proposition to be the infimum of the complexities of propositions that imply it. (I’m pretty sure it’s always attained.) Then I conjecture that something like this is both true and fairly easy to prove: for any fixed probability level q, as C → oo, if you generate a proposition at random (according to the “generating” distribution) conditional on its essential complexity being at least C, then Pr(its probability >= q) tends to 0.
Sorry, will put this on hold for a bit—it requires some thinking and I don’t have time for it at the moment...
No problem!