Let n be an integer. Knowing nothing else about n, would you assign 50% probability to n being odd? To n being positive? To n being greater than 3? You see how fast you get into trouble.
You need a prior distribution on n. Without a prior, these probabilities are not 50%. They are undefined.
The particular mathematical problem is that you can’t define a uniform distribution over an unbounded domain. This doesn’t apply to the biased coin: in that case, you know the bias is somewhere between 0 and 1, and for every distribution that favors heads, there’s one that favors tails, so you can actually perform the integration.
Finally, on an empirical level, it seems like there are more false n-bit statements than true n-bit statements. Like, if you took the first N Godel numbers, I’d expect more falsehoods than truths. Similarly for statements like “Obama is the 44th president”: so many ways to go wrong, just a few ways to go right.
Edit: that last paragraph isn’t right. For every true proposition, there’s a false one of equal complexity.
Finally, on an empirical level, it seems like there are more false n-bit statements than true n-bit statements.
I’m pretty certain this intuition is false. It feels true because it’s much harder to come up with a true statement from N bits if you restrict yourself to positive claims about reality. If you get random statements like “the frooble fuzzes violently” they’re bound to be false, right? But for every nonsensical or false statement you also get the negation of a nonsensical or false statement. “not( the frooble fuzzes violiently)”. It’s hard to arrive at a statement like “Obama is the 44th president” and be correct, but it’s very easy to enumerate a million things that do not orbit Pluto (and be correct).
(FYI: somewhere below there is a different discussion about whether there are more n-bit statements about reality that are false than true)
There’s a 1-to-1 correspondence between any true statement and its negation, and the sets aren’t overlapping, so there’s an equal number of true and false statements—and they can be coded in the identical amount of bits, as the interpreting machine can always be made to consider the negation of the statement you’ve written to it.
You just need to add the term ‘...NOT!’ at the end. As in ’The Chudley Cannons are a great team… NOT!”
Or we may call it the “He loves me, he loves me not” principle.
Doesn’t it take more bits to specify NOT P than to specify P? I mean, I can take any proposition and add ”..., and I like pudding” but this doesn’t mean that half of all n-bit propositions are about me liking pudding.
Doesn’t it take more bits to specify NOT P than to specify P?
No. If “NOT P” took more bits to specify than “P”, this would also mean that “NOT NOT P” would take more bits to specify than “NOT P”. But NOT NOT P is identical to P, so it would mean that P takes more bits to specify than itself.
With actual propositions now, instead of letters:
If you have the proposition “The Moon is Earth’s satellite”, and the proposition “The Moon isn’t Earth’s satellite”, each is the negation of the other. If a proposition’s negation takes more bits to specify than the proposition, then you’re saying that each statement takes more bits to specify than the other.
Even simpler—can you think any reason why it would necessarily take more bits to codify “x != 5” than “x == 5″?
We’re talking about minimum message length, and the minimum message of NOT NOT P is simply P.
Once you consider double negation, I don’t have any problem with saying that
“the Moon is Earth’s satellite”
is a simpler proposition than
“The following statement is false: the Moon is Earth’s satellite”
The abstract syntax tree for “x != 5” is bigger than the AST of “x == 5“. One of them uses numeric equality only, the other uses numeric equality and negation. I expect, though I haven’t verified, that the earliest, simplest compilers generated more processor instructions to compute “x != 5” than they did to compute “x == 5”
Aris is right. NOT is just an operator that flips a bit. Take a single bit: 1. Now apply NOT. You get 0. Or you could have a bit that is 0. Now apply NOT. You get 1. Same number of bits. Truth tables for A and ~A are the same size.
We’re talking about minimum message length, and the minimum message of NOT NOT P is simply P.
That’s what i said. But you also said that NOT P takes more bits to specify than P. You can’t have it both ways.
You don’t understand this point. If I’ve already communicated P to you—do you need any further bits of info to calculate NOT P? No: Once you know P, NOT P is also perfectly well defined, which means that NOT P by necessity has the SAME message length as P.
The abstract syntax tree for “x != 5” is bigger than the AST of “x == 5″.
You aren’t talking about minimum message length anymore, you’re talking about human conventions. One might just as well reply that since “No” is a two-letter word that means rejection takes less bits to encode than the confirmation of “Yes” which is a three-letter word.
I expect, though I haven’t verified, that the earliest, simplest compilers generated more processor instructions to compute “x != 5” than they did to compute “x == 5″
If we have a computer that evaluates statements and returns 1 for true and 0 for false—we can just as well imagine that it returns 0 for true and 1 for false and calculates the negation of those statements. In fact you wouldn’t be able to KNOW whether the computer calculates the statements or their negation, which means when you’re inputting a statement, it’s the same as inputting its negation.
I think I get it. You need n bits of evidence to evaluate a statement whose MML is n bits long. Once you know the truth value of P, you don’t need any more evidence to compute NOT(P), so MML(P) has to equal MML(NOT(P)). In the real world we tend to care about true statements more than false statements, so human formalisms make it easier to talk about truths rather than falsehoods. But for every such formalism, there is an equivalent one that makes it easier to talk about false statements.
I think I had confused the statement of a problem with the amount of evidence needed to evaluate it. Thanks for the correction!
I read the rest of this discussion but did not understand the conclusion. Do you now think that the first N Godel numbers would be expected to have the same number of truths as falsehoods?
It turns out not to matter. Consider a formalism G’, identical to Godel numbering, but that reverses the sign, such that G(N) is true iff G’(N) is false. In the first N numbers in G+G’, there are an equal number of truths and falsehoods.
For every formalism that makes it easy to encode true statements, there’s an isomorphic one that does the same for false statements, and vice versa. This is why the set of statements of a given complexity can never be unbalanced.
We agree that we can’t assign a probability to a property of a number without a prior distribution. And yet it seems like you’re saying that it is nonetheless correct to assign a probability of truth to a statement without a prior distribution, and that the probability is 50% true, 50% false.
Doesn’t the second statement follow from the first? Something like this:
For any P, a nontrivial predicate on integers, and an integer n, Pr(P(n)) is undefined without a distribution on n.
Define X(n), a predicate on integers, true if and only if the nth Godel number is true.
Pr(X(n)) is undefined without a distribution on n.
Integers and statements are isomorphic. If you’re saying that you can assign a probability to a statement without knowing anything about the statement, then you’re saying that you can assign a probability to a property of a number without knowing anything about the number.
We agree that we can’t assign a probability to a property of a number without a prior distribution. And yet it seems like you’re saying that it is nonetheless correct to assign a probability of truth to a statement without a prior distribution,
That is not what I claim. I take it for granted that all probability statements require a prior distribution. What I claim is that if the prior probability of a hypothesis evaluates to something other than 50%, then the prior distribution cannot be said to represent “total ignorance” of whether the hypothesis is true.
This is only important at the meta-level, where one is regarding the probability function as a variable—such as in the context of modeling logical uncertainty, for example. It allows one to regard “calculating the prior probability” as a special case of “updating on evidence”.
I think I see what you’re saying. You’re saying that if you do the math out, Pr(S) comes out to 0.5, just like 0! = 1 or a^0 = 1, even though the situation is rare where you’d actually want to calculate those things (permutations of zero elements or the empty product, respectively). Do I understand you, at least?
I expect Pr(S) to come out to be undefined, but I’ll work through it and see. Anyway, I’m not getting any karma for these comments, so I guess nobody wants to see them. I won’t fill the channel with any more noise.
Let n be an integer. Knowing nothing else about n, would you assign 50% probability to n being odd? To n being positive? To n being greater than 3? You see how fast you get into trouble.
You need a prior distribution on n. Without a prior, these probabilities are not 50%. They are undefined.
The particular mathematical problem is that you can’t define a uniform distribution over an unbounded domain. This doesn’t apply to the biased coin: in that case, you know the bias is somewhere between 0 and 1, and for every distribution that favors heads, there’s one that favors tails, so you can actually perform the integration.
Finally, on an empirical level, it seems like there are more false n-bit statements than true n-bit statements. Like, if you took the first N Godel numbers, I’d expect more falsehoods than truths. Similarly for statements like “Obama is the 44th president”: so many ways to go wrong, just a few ways to go right.
Edit: that last paragraph isn’t right. For every true proposition, there’s a false one of equal complexity.
I’m pretty certain this intuition is false. It feels true because it’s much harder to come up with a true statement from N bits if you restrict yourself to positive claims about reality. If you get random statements like “the frooble fuzzes violently” they’re bound to be false, right? But for every nonsensical or false statement you also get the negation of a nonsensical or false statement. “not( the frooble fuzzes violiently)”. It’s hard to arrive at a statement like “Obama is the 44th president” and be correct, but it’s very easy to enumerate a million things that do not orbit Pluto (and be correct).
(FYI: somewhere below there is a different discussion about whether there are more n-bit statements about reality that are false than true)
There’s a 1-to-1 correspondence between any true statement and its negation, and the sets aren’t overlapping, so there’s an equal number of true and false statements—and they can be coded in the identical amount of bits, as the interpreting machine can always be made to consider the negation of the statement you’ve written to it.
You just need to add the term ‘...NOT!’ at the end. As in ’The Chudley Cannons are a great team… NOT!”
Or we may call it the “He loves me, he loves me not” principle.
Doesn’t it take more bits to specify NOT P than to specify P? I mean, I can take any proposition and add ”..., and I like pudding” but this doesn’t mean that half of all n-bit propositions are about me liking pudding.
No. If “NOT P” took more bits to specify than “P”, this would also mean that “NOT NOT P” would take more bits to specify than “NOT P”. But NOT NOT P is identical to P, so it would mean that P takes more bits to specify than itself.
With actual propositions now, instead of letters:
If you have the proposition “The Moon is Earth’s satellite”, and the proposition “The Moon isn’t Earth’s satellite”, each is the negation of the other. If a proposition’s negation takes more bits to specify than the proposition, then you’re saying that each statement takes more bits to specify than the other.
Even simpler—can you think any reason why it would necessarily take more bits to codify “x != 5” than “x == 5″?
We’re talking about minimum message length, and the minimum message of NOT NOT P is simply P.
Once you consider double negation, I don’t have any problem with saying that
“the Moon is Earth’s satellite”
is a simpler proposition than
“The following statement is false: the Moon is Earth’s satellite”
The abstract syntax tree for “x != 5” is bigger than the AST of “x == 5“. One of them uses numeric equality only, the other uses numeric equality and negation. I expect, though I haven’t verified, that the earliest, simplest compilers generated more processor instructions to compute “x != 5” than they did to compute “x == 5”
Aris is right. NOT is just an operator that flips a bit. Take a single bit: 1. Now apply NOT. You get 0. Or you could have a bit that is 0. Now apply NOT. You get 1. Same number of bits. Truth tables for A and ~A are the same size.
That’s what i said. But you also said that NOT P takes more bits to specify than P. You can’t have it both ways.
You don’t understand this point. If I’ve already communicated P to you—do you need any further bits of info to calculate NOT P? No: Once you know P, NOT P is also perfectly well defined, which means that NOT P by necessity has the SAME message length as P.
You aren’t talking about minimum message length anymore, you’re talking about human conventions. One might just as well reply that since “No” is a two-letter word that means rejection takes less bits to encode than the confirmation of “Yes” which is a three-letter word.
If we have a computer that evaluates statements and returns 1 for true and 0 for false—we can just as well imagine that it returns 0 for true and 1 for false and calculates the negation of those statements. In fact you wouldn’t be able to KNOW whether the computer calculates the statements or their negation, which means when you’re inputting a statement, it’s the same as inputting its negation.
I think I get it. You need n bits of evidence to evaluate a statement whose MML is n bits long. Once you know the truth value of P, you don’t need any more evidence to compute NOT(P), so MML(P) has to equal MML(NOT(P)). In the real world we tend to care about true statements more than false statements, so human formalisms make it easier to talk about truths rather than falsehoods. But for every such formalism, there is an equivalent one that makes it easier to talk about false statements.
I think I had confused the statement of a problem with the amount of evidence needed to evaluate it. Thanks for the correction!
A big thumbs up for you, and you’re very welcome! :-)
I read the rest of this discussion but did not understand the conclusion. Do you now think that the first N Godel numbers would be expected to have the same number of truths as falsehoods?
It turns out not to matter. Consider a formalism G’, identical to Godel numbering, but that reverses the sign, such that G(N) is true iff G’(N) is false. In the first N numbers in G+G’, there are an equal number of truths and falsehoods.
For every formalism that makes it easy to encode true statements, there’s an isomorphic one that does the same for false statements, and vice versa. This is why the set of statements of a given complexity can never be unbalanced.
Gotcha, thanks.
Who said anything about not having a prior distribution? “Let n be a [randomly selected] integer” isn’t even a meaningful statement without one!
What gave you the impression that I thought probabilities could be assigned to non-hypotheses?
This is irrelevant: once you have made an observation like this, you are no longer in a state of total ignorance.
We agree that we can’t assign a probability to a property of a number without a prior distribution. And yet it seems like you’re saying that it is nonetheless correct to assign a probability of truth to a statement without a prior distribution, and that the probability is 50% true, 50% false.
Doesn’t the second statement follow from the first? Something like this:
For any P, a nontrivial predicate on integers, and an integer n, Pr(P(n)) is undefined without a distribution on n.
Define X(n), a predicate on integers, true if and only if the nth Godel number is true.
Pr(X(n)) is undefined without a distribution on n.
Integers and statements are isomorphic. If you’re saying that you can assign a probability to a statement without knowing anything about the statement, then you’re saying that you can assign a probability to a property of a number without knowing anything about the number.
That is not what I claim. I take it for granted that all probability statements require a prior distribution. What I claim is that if the prior probability of a hypothesis evaluates to something other than 50%, then the prior distribution cannot be said to represent “total ignorance” of whether the hypothesis is true.
This is only important at the meta-level, where one is regarding the probability function as a variable—such as in the context of modeling logical uncertainty, for example. It allows one to regard “calculating the prior probability” as a special case of “updating on evidence”.
I think I see what you’re saying. You’re saying that if you do the math out, Pr(S) comes out to 0.5, just like 0! = 1 or a^0 = 1, even though the situation is rare where you’d actually want to calculate those things (permutations of zero elements or the empty product, respectively). Do I understand you, at least?
I expect Pr(S) to come out to be undefined, but I’ll work through it and see. Anyway, I’m not getting any karma for these comments, so I guess nobody wants to see them. I won’t fill the channel with any more noise.
[ replied to the wrong person ]