Thanks for writing this, I found it helpful for understanding the motivation behind Benja’s proposal and to develop some intuitions about it. I’d like to see some explanation of how statements involving quantifiers are supposed to work in the proposal. Here’s my understanding, and please let me know if it’s correct.
A statement with a “for all” quantifier is also supposed to start with 1⁄2 probability. Consider “For all X, Q(X)” for some equation Q. This implies Q(0) and Q(1), etc., each of which also starts with 1⁄2 probability. What happens as the robot proves individual implications of the form “(For all X, Q(X)) implies Q(0)”? The probability of the quantified statement is reduced by half each time. Proving that individual statements like Q(0) are true can only bring the probability of the quantified statement back up to 1⁄2 but not above that.
Since proving Q(i) tends to be harder (take longer) than proving “(For all X, Q(X)) implies Q(i)”, probability of the quantified statement goes to 0 as we increase the time/length limit even if all of the individual Q(i) are provable. This doesn’t seem to be desirable, if my understanding is correct.
Oh, since I was recently thinking about this, I figured I’d write down how the robot actually ends up raising probabilities above 1⁄2 (without just going to 1): if we can make the probability of Q go down, then we just have to make the probability of ¬Q go down the same way. The probability of Q goes down when we learn that Q implies X but we don’t see X. The probability of Q goes up when we learn that ¬Q implies Y but we don’t see Y.
Hm. Yeah, you’re right. The probability goes like 1/(1+2^number of things implied). You’d think that “3 is prime, 5 is prime, 7 is prime” would at least make the robot guess that all odd numbers were prime.
On the other hand, if we think of all possible statements “for all X, Q(X)”, there are more was to be false than true. Infinity more ways, even, and this is directly related to how many things are implied by the statement. So in that way, the robot is functioning exactly as intended and assigning the maximum entropy value.
Which makes this problem sort of like my suspicions about magfrump’s idea below—related to the fact that we expect simple things, we don’t actually expect maximum entropy—and we’re often right.
I thought of it some more… Don’t think of probability as a way to represent ignorance. Probability represents a kind of knowledge. For example when you say “probability that coin fell heads is 1⁄2 ”, that represents a sort of a model of what bounces do to the coin direction. When you multiply probabilities for a pair of coin throws, that represents independence of the throws. The 1⁄2 for Q(i) is not representative of that, and even if it was representative of something (statistics on the Q), product of these numbers is not probability but lower bound on the probability assuming independence; assuming they aren’t independent, the probability is still 1⁄2 .
Probabilistic methods certainly are very useful in time bounded calculations—they are used very heavily in computer graphics and simulations of all kinds—but that is usually through use of random number source or sufficiently good random number generator, as in the probabilistic primality tests such as Miller-Rabin for example. The time-bounded bot, far from evaluating and updating “probabilities”, would have to do symbolic mathematics searching for an imperfect solution, and make use of random number sources whenever appropriate, striving for bounds on probability of being wrong. (e.g. when I implemented Miller-Rabin I just made it run enough calculations so that it wouldn’t fail till heat death of universe assuming that my prng wouldn’t somehow return a huge string of bad witnesses, which would be very interesting if it happened).
Well, that doesn’t seem very useful as per Wei Dai example...
Why not look at the methods people actually employ to approximate and guess math when they can’t quite compute something? Applied mathematics is an enormously huge field.
On the other hand, if we think of all possible statements “for all X, Q(X)”, there are more was to be false than true. Infinity more ways, even,
If you consider all syntactically valid Q in some sort of math notation, no matter the length, there’s the fraction that is simply 0=0*( ….....… X somewhere here........) , or your favourite less trivial statement of choice. Ditto for Turing machine tapes et cetera. There’s certainly a nonzero probability of constructing Q(X) that holds for all X.
That is a more detailed model than the robot uses. What it means by the ways to be false or true is more like
“true for 1, true for 2, true for 3, false for 4, true for 5...” The robot can’t look inside the statements while it’s doing probabilistic logic, it can only look at truth values and relationships.
On the other hand, the power of doing that is certainly a good reason to upgrade the robot :)
You’d think that since P(Q1|~∀xQx) = 1⁄2 and P(Q1|∀xQx) = 1, observing Q1 is evidence in favor of ∀xQx.
And it is, but the hidden catch is that this depends on the implication that ∀xQx->Q1, and that implication is exactly the same amount of evidence against ∀xQx.
It’s also an amusing answer to the end of part 1 exercise.
Thanks for writing this, I found it helpful for understanding the motivation behind Benja’s proposal and to develop some intuitions about it. I’d like to see some explanation of how statements involving quantifiers are supposed to work in the proposal. Here’s my understanding, and please let me know if it’s correct.
A statement with a “for all” quantifier is also supposed to start with 1⁄2 probability. Consider “For all X, Q(X)” for some equation Q. This implies Q(0) and Q(1), etc., each of which also starts with 1⁄2 probability. What happens as the robot proves individual implications of the form “(For all X, Q(X)) implies Q(0)”? The probability of the quantified statement is reduced by half each time. Proving that individual statements like Q(0) are true can only bring the probability of the quantified statement back up to 1⁄2 but not above that.
Since proving Q(i) tends to be harder (take longer) than proving “(For all X, Q(X)) implies Q(i)”, probability of the quantified statement goes to 0 as we increase the time/length limit even if all of the individual Q(i) are provable. This doesn’t seem to be desirable, if my understanding is correct.
Oh, since I was recently thinking about this, I figured I’d write down how the robot actually ends up raising probabilities above 1⁄2 (without just going to 1): if we can make the probability of Q go down, then we just have to make the probability of ¬Q go down the same way. The probability of Q goes down when we learn that Q implies X but we don’t see X. The probability of Q goes up when we learn that ¬Q implies Y but we don’t see Y.
Hm. Yeah, you’re right. The probability goes like 1/(1+2^number of things implied). You’d think that “3 is prime, 5 is prime, 7 is prime” would at least make the robot guess that all odd numbers were prime.
On the other hand, if we think of all possible statements “for all X, Q(X)”, there are more was to be false than true. Infinity more ways, even, and this is directly related to how many things are implied by the statement. So in that way, the robot is functioning exactly as intended and assigning the maximum entropy value.
Which makes this problem sort of like my suspicions about magfrump’s idea below—related to the fact that we expect simple things, we don’t actually expect maximum entropy—and we’re often right.
I thought of it some more… Don’t think of probability as a way to represent ignorance. Probability represents a kind of knowledge. For example when you say “probability that coin fell heads is 1⁄2 ”, that represents a sort of a model of what bounces do to the coin direction. When you multiply probabilities for a pair of coin throws, that represents independence of the throws. The 1⁄2 for Q(i) is not representative of that, and even if it was representative of something (statistics on the Q), product of these numbers is not probability but lower bound on the probability assuming independence; assuming they aren’t independent, the probability is still 1⁄2 .
Probabilistic methods certainly are very useful in time bounded calculations—they are used very heavily in computer graphics and simulations of all kinds—but that is usually through use of random number source or sufficiently good random number generator, as in the probabilistic primality tests such as Miller-Rabin for example. The time-bounded bot, far from evaluating and updating “probabilities”, would have to do symbolic mathematics searching for an imperfect solution, and make use of random number sources whenever appropriate, striving for bounds on probability of being wrong. (e.g. when I implemented Miller-Rabin I just made it run enough calculations so that it wouldn’t fail till heat death of universe assuming that my prng wouldn’t somehow return a huge string of bad witnesses, which would be very interesting if it happened).
I’ll be specific. Shannon entropy represents ignorance. The bigger it is, the more ignorant you are.
Well, that doesn’t seem very useful as per Wei Dai example...
Why not look at the methods people actually employ to approximate and guess math when they can’t quite compute something? Applied mathematics is an enormously huge field.
If you consider all syntactically valid Q in some sort of math notation, no matter the length, there’s the fraction that is simply 0=0*( ….....… X somewhere here........) , or your favourite less trivial statement of choice. Ditto for Turing machine tapes et cetera. There’s certainly a nonzero probability of constructing Q(X) that holds for all X.
That is a more detailed model than the robot uses. What it means by the ways to be false or true is more like
“true for 1, true for 2, true for 3, false for 4, true for 5...” The robot can’t look inside the statements while it’s doing probabilistic logic, it can only look at truth values and relationships.
On the other hand, the power of doing that is certainly a good reason to upgrade the robot :)
This is counterintuitive in an interesting way.
You’d think that since P(Q1|~∀xQx) = 1⁄2 and P(Q1|∀xQx) = 1, observing Q1 is evidence in favor of ∀xQx.
And it is, but the hidden catch is that this depends on the implication that ∀xQx->Q1, and that implication is exactly the same amount of evidence against ∀xQx.
It’s also an amusing answer to the end of part 1 exercise.