A statement, any statement, starts out with a 50% probability of being true, and then you adjust that percentage based on the evidence you come into contact with.
Suppose we have a statement X, and the only thing we know about X is that it was randomly selected from the set S of statements of 100 characters, with the alphabet consisting of the digits 0 to 9 and the symbols + and =. If
A = ‘X is true’
B = ‘X was randomly selected from the set S of statements of 100 characters, with the alphabet consisting of the digits 0 to 9 and the symbols + and =’
then P(A|B) can be straightforwardly computed by enumerating the set S and checking how many true statements it contains (or some cleverer variation of this). The above quote, on the other hand, suggests that we start with P(A)=0.5, and then… do what? By Bayes’ Theorem, P(A|B) = P(A)*P(B|A)/P(B), but it’s hard to see how that helps.
(In case I’ve chosen a pathological example, what is a good example of starting with .5 probability of a statement being true, and then adjusting that?)
Recall that logical non-omniscience is an open problem. That is, often we get ‘evidence’ in the form of someone pointing out some feature of the hypothesis that, while deducible from it, we were not aware of. For example, if H = “3542423580 is composite” someone might be stumped until they are reminded that integers ending in the digit 0 are all composite. Of course, this fact is deducible from the definition of composite, we just had forgotten it. P(H) now approaches 1, but we don’t have a Bayesian way of talking about what just happened.
Hypothesis specification is just a special case of this problem. The only difference is that instead of pointing out something that is deducible by assuming the hypothesis (think: lines toward the bottom of a proof) we’re stipulating what it means to assume the hypothesis (like reading off the assumptions at the top of a proof). The reason why “any statement starts out with a 50% probability of being true” sounds silly and is confusing people is that for any particular hypothesis the prior will be set, in part, by stipulating the content of the hypothesis—which is a deductive process. And we don’t know how to handle that with Bayesian math.
In your example before we have any information we’d assume P(A) = 0.5 and after we have information about the alphabet and how X is constructed from the alphabet we can just calculate the exact value for P(A|B). So the “update” here just consists of replacing the initial estimate with the correct answer. I think this is also what you’re saying so I agree that in situations like these using P(A) = 0.5 as starting point does not affect the final answer (but I’d still start out with a prior of 0.5).
I’ll propose a different example. It’s a bit contrived (well, really contrived, but OK).
Frank and his buddies (of which you are one) decide to rob a bank.
Frank goes: “Alright men, in order for us to pull this off 4 things have to go perfectly according to plan.”
(you think: conjunction of 4 things: 0.0625 prior probability of success)
Frank continues: the first thing we need to do is beat the security system (… long explanation follows).
(you think: that plan is genius and almost certain to work (0.9 probability of success follows from Bayesian estimate). I’m updating my confidence to 0.1125)
Frank continues: the second thing we we need to do is break into the safe (… again a long explanation follows).
(you think: wow, that’s a clever solution − 0.7 probability of success. Total probability of success 0.1575)
Frank continues: So! Are you in or are you out?
At this point you have to decide immediately. You don’t have the time to work out the plausibility of the remaining two factors, you just have to make a decision. But just by knowing that there are two more things that have to go right you can confidently say “Sorry Frank, but I’m out.”.
If you had more time to think you could come up with a better estimate of success. But you don’t have time. You have to go with your prior of total ignorance for the last two factors of your estimate.
If we were to plot the confidence over time I think it should start at 0.5, then go to 0.0625 when we understand a estimate of a conjunction of 4 parts is to be calculated and after that more nuanced Bayesian reasoning follows. So if I were to build an AI then I would make it start out with the universal prior of total ignorance and go from there. So I don’t think the prior is a purely mathematical trick that has no bearing on we way we reason.
(At the risk of stating the obvious: you’re strictly speaking never adjusting based on the prior of 0.5. The moment you have evidence you replace the prior with the estimate based on evidence. When you get more evidence you can update based on that. The prior of 0.5 completely vaporizes the moment evidence enters the picture. Otherwise you would be doing an update on non-evidence.)
Suppose we have a statement X, and the only thing we know about X is that it was randomly selected from the set S of statements of 100 characters, with the alphabet consisting of the digits 0 to 9 and the symbols + and =. If
A = ‘X is true’
B = ‘X was randomly selected from the set S of statements of 100 characters, with the alphabet consisting of the digits 0 to 9 and the symbols + and =’
then P(A|B) can be straightforwardly computed by enumerating the set S and checking how many true statements it contains (or some cleverer variation of this). The above quote, on the other hand, suggests that we start with P(A)=0.5, and then… do what? By Bayes’ Theorem, P(A|B) = P(A)*P(B|A)/P(B), but it’s hard to see how that helps.
(In case I’ve chosen a pathological example, what is a good example of starting with .5 probability of a statement being true, and then adjusting that?)
Recall that logical non-omniscience is an open problem. That is, often we get ‘evidence’ in the form of someone pointing out some feature of the hypothesis that, while deducible from it, we were not aware of. For example, if H = “3542423580 is composite” someone might be stumped until they are reminded that integers ending in the digit 0 are all composite. Of course, this fact is deducible from the definition of composite, we just had forgotten it. P(H) now approaches 1, but we don’t have a Bayesian way of talking about what just happened.
Hypothesis specification is just a special case of this problem. The only difference is that instead of pointing out something that is deducible by assuming the hypothesis (think: lines toward the bottom of a proof) we’re stipulating what it means to assume the hypothesis (like reading off the assumptions at the top of a proof). The reason why “any statement starts out with a 50% probability of being true” sounds silly and is confusing people is that for any particular hypothesis the prior will be set, in part, by stipulating the content of the hypothesis—which is a deductive process. And we don’t know how to handle that with Bayesian math.
In your example before we have any information we’d assume P(A) = 0.5 and after we have information about the alphabet and how X is constructed from the alphabet we can just calculate the exact value for P(A|B). So the “update” here just consists of replacing the initial estimate with the correct answer. I think this is also what you’re saying so I agree that in situations like these using P(A) = 0.5 as starting point does not affect the final answer (but I’d still start out with a prior of 0.5).
I’ll propose a different example. It’s a bit contrived (well, really contrived, but OK).
Frank and his buddies (of which you are one) decide to rob a bank.
Frank goes: “Alright men, in order for us to pull this off 4 things have to go perfectly according to plan.”
(you think: conjunction of 4 things: 0.0625 prior probability of success)
Frank continues: the first thing we need to do is beat the security system (… long explanation follows).
(you think: that plan is genius and almost certain to work (0.9 probability of success follows from Bayesian estimate). I’m updating my confidence to 0.1125)
Frank continues: the second thing we we need to do is break into the safe (… again a long explanation follows).
(you think: wow, that’s a clever solution − 0.7 probability of success. Total probability of success 0.1575)
Frank continues: So! Are you in or are you out?
At this point you have to decide immediately. You don’t have the time to work out the plausibility of the remaining two factors, you just have to make a decision. But just by knowing that there are two more things that have to go right you can confidently say “Sorry Frank, but I’m out.”.
If you had more time to think you could come up with a better estimate of success. But you don’t have time. You have to go with your prior of total ignorance for the last two factors of your estimate.
If we were to plot the confidence over time I think it should start at 0.5, then go to 0.0625 when we understand a estimate of a conjunction of 4 parts is to be calculated and after that more nuanced Bayesian reasoning follows. So if I were to build an AI then I would make it start out with the universal prior of total ignorance and go from there. So I don’t think the prior is a purely mathematical trick that has no bearing on we way we reason.
(At the risk of stating the obvious: you’re strictly speaking never adjusting based on the prior of 0.5. The moment you have evidence you replace the prior with the estimate based on evidence. When you get more evidence you can update based on that. The prior of 0.5 completely vaporizes the moment evidence enters the picture. Otherwise you would be doing an update on non-evidence.)