I don’t believe that complex mathematical statements should have lower probability than simple statements. It’s not just about A vs not-A. If A is a conjunction of many parts, its probability should be low. If A is a disjunction of many parts, its probability should be high.
The part about “mutually exclusive and exhaustive set of sentences” is a little similar to the idea from an old post of mine. Namely, if you have some set of sentences S_1 … S_n, then for each i you can define the probability that a random proof, sampled from a length-based prior, happens to prove S_i, conditional on the fact that it proves one of S_1 … S_n. The probabilities defined this way always sum to 1, provided that at least one S_i is true. Such an approach is helpful when S_i are logical counterfactuals, which look “mutually exclusive” to the agent but are actually all true. Also note that it agrees with our intuition about conjunctions and disjunctions, because proofs of disjunctions will tend to be shorter.
In thought experiment 2, the subsequences of pi, it seems like simpler sentences really are favored (because sequences more K-complex than the description are ruled out). Do you agree? Do you think this constitutes an empirical prediction that the all-zeroes subsequence will be overrepresented in pi?
How do you think a logical prior should behave in the limit that it’s applying to a countable set of sentences? After all, you can’t have a uniform distribution on the integers. Should it stay uniform for every finite number of sentences but be nonuniform if we go up a meta-level and consider an infinite set rather than individual sentences?
Well, computable normal numbers exist. if you replace pi with such a number, then we know that strings of zeroes aren’t overrepresented in the sense of asymptotic frequency. They might be overrepresented in some other sense, though. Can you define that precisely?
I don’t see why the prior should be uniform. None of the proposed priors are. Maybe I misunderstand your question?
Asymptotic frequency doesn’t explain this effect away, because the allowed complexity of subsequences of pi grows logarithmically, and so for any m there is an n such that there are no restrictions on complexity past that n. That is, it’s totally possible for a number to be asymptotically normal but still favor simpler sequences. Not sure how to formalize this or how to formalize the alternatives.
Hmm, it might be provable that some simple sequence shows up more often than 10^-m in Chaitlin’s constant in some range that depends predictably on its complexity.
I don’t see why the prior should be uniform. None of the proposed priors are. Maybe I misunderstand your question?
Basically, thought experiment 1. Suppose you have N mutually exclusive and exhaustive sentences, you know their complexities K(n), and you don’t know anything else. A uniform prior here just means assigning each of these sentences logical probability 1/N.
Next suppose you have a countable set of of mutually exclusive and exhaustive sentences, you know their complexities K(n) (alternately you could have some privileged ordering of the sentences), and you don’t know anything else. There’s no uniform prior now because it would be everywhere 0, and that’s not even a little normalizable.
I think P(eval(random_expression(n)) = 42) will go to a positive limit when n goes to infinity, because expressions like (short expression that evaluates to 42) + 0 * (arbitrary long expression) will tend to be a constant fraction.
That’s a good point. I’m not sure if it goes through, though, because rather than length, we want to order sentences by apparent complexity, and the sentences you describe are only (apparently) as complex as the prefix.
There might still be tricks that exploit the weakness in our method for evaluating apparent complexity [...]
EDIT: Okay, I thought of a trick—rather than “0”, multiply the suffix by an expression that evaluates to 0 but is too complicated for the checker to evaluate. This allows the positive limit of probabilities to be fairly big (well, relative to 0).
Anyhow, great, the probability that some apparently-complex expression equals 42 doesn’t go to zero. What happens when the expression is just as complex as “42”? Is the probability that it equals 42 higher then? Lower?
I think you’re confusing the complexity of the sentence with the complexity of what the sentence evaluates to. The sentence “0 * random_expression” is still roughly as complex as random_expression, even though it evaluates to 0.
I don’t believe that complex mathematical statements should have lower probability than simple statements. It’s not just about A vs not-A. If A is a conjunction of many parts, its probability should be low. If A is a disjunction of many parts, its probability should be high.
The part about “mutually exclusive and exhaustive set of sentences” is a little similar to the idea from an old post of mine. Namely, if you have some set of sentences S_1 … S_n, then for each i you can define the probability that a random proof, sampled from a length-based prior, happens to prove S_i, conditional on the fact that it proves one of S_1 … S_n. The probabilities defined this way always sum to 1, provided that at least one S_i is true. Such an approach is helpful when S_i are logical counterfactuals, which look “mutually exclusive” to the agent but are actually all true. Also note that it agrees with our intuition about conjunctions and disjunctions, because proofs of disjunctions will tend to be shorter.
Can I ask you some more questions?
In thought experiment 2, the subsequences of pi, it seems like simpler sentences really are favored (because sequences more K-complex than the description are ruled out). Do you agree? Do you think this constitutes an empirical prediction that the all-zeroes subsequence will be overrepresented in pi?
How do you think a logical prior should behave in the limit that it’s applying to a countable set of sentences? After all, you can’t have a uniform distribution on the integers. Should it stay uniform for every finite number of sentences but be nonuniform if we go up a meta-level and consider an infinite set rather than individual sentences?
Well, computable normal numbers exist. if you replace pi with such a number, then we know that strings of zeroes aren’t overrepresented in the sense of asymptotic frequency. They might be overrepresented in some other sense, though. Can you define that precisely?
I don’t see why the prior should be uniform. None of the proposed priors are. Maybe I misunderstand your question?
Asymptotic frequency doesn’t explain this effect away, because the allowed complexity of subsequences of pi grows logarithmically, and so for any m there is an n such that there are no restrictions on complexity past that n. That is, it’s totally possible for a number to be asymptotically normal but still favor simpler sequences. Not sure how to formalize this or how to formalize the alternatives.
Hmm, it might be provable that some simple sequence shows up more often than 10^-m in Chaitlin’s constant in some range that depends predictably on its complexity.
Basically, thought experiment 1. Suppose you have N mutually exclusive and exhaustive sentences, you know their complexities K(n), and you don’t know anything else. A uniform prior here just means assigning each of these sentences logical probability 1/N.
Next suppose you have a countable set of of mutually exclusive and exhaustive sentences, you know their complexities K(n) (alternately you could have some privileged ordering of the sentences), and you don’t know anything else. There’s no uniform prior now because it would be everywhere 0, and that’s not even a little normalizable.
See my post below for one way of framing it more precisely.
So what do you think happens in thought experiment 3?
I think P(eval(random_expression(n)) = 42) will go to a positive limit when n goes to infinity, because expressions like (short expression that evaluates to 42) + 0 * (arbitrary long expression) will tend to be a constant fraction.
That’s a good point. I’m not sure if it goes through, though, because rather than length, we want to order sentences by apparent complexity, and the sentences you describe are only (apparently) as complex as the prefix.
There might still be tricks that exploit the weakness in our method for evaluating apparent complexity [...]
EDIT: Okay, I thought of a trick—rather than “0”, multiply the suffix by an expression that evaluates to 0 but is too complicated for the checker to evaluate. This allows the positive limit of probabilities to be fairly big (well, relative to 0).
Anyhow, great, the probability that some apparently-complex expression equals 42 doesn’t go to zero. What happens when the expression is just as complex as “42”? Is the probability that it equals 42 higher then? Lower?
I think you’re confusing the complexity of the sentence with the complexity of what the sentence evaluates to. The sentence “0 * random_expression” is still roughly as complex as random_expression, even though it evaluates to 0.
Whoops, you’re right.