There are two kinds of probabilities: Frequencies and credences.
The frequency of an event is the fraction of times it occurs in the long run. If you’re wearing Bayesian goggles then you refuse to call this a probability. You just treat it as a physical quantity, like the mass of a person, or the height of a building. A fair coin comes heads with a frequency of 50%, a die rolls 2 with a frequency of 1⁄6.
The credence you have in some fact is the degree to which you believe it. This is what Bayesians call “probability”. If you had a biased coin you might believe that the next toss will be heads, and the strength of your belief could be 70%, but this doesn’t mean that you think the long run frequency of heads will be 70%.
So when you’re assessing the space shuttle, you treat it as if there is some fixed frequency with which shuttles crashes. But you don’t know what this frequency is. So you have a probability distribution over the possible values of the frequency. Maybe you have a 20% credence that the frequency is 0.5%, a 15% credence that the frequency is 1%, a 10% credence that the frequency is 1.5%, and so on...
In symbols this looks like this. Let f be the true frequency. Then for each θ between 0 and 1 we have some credence that f = θ. We write this credence as P(f = θ). This expression is a function that depends on θ. It expresses our belief that the frequency is equal to θ.
Now, we want to calculate the probability (our credence) that the shuttle will crash. If we knew that f was 3%, then our credence that the shuttle would crash would also be 3%. That is, if we know the frequency then our degree of belief that the shuttle will crash is equal to that frequency. In symbols, this looks like this: Let A be the event that the shuttle crashes, then P(A | f = θ) = θ: the probability of the shuttle crashing given that f is equal to θ, is exactly θ.
But we don’t know what f is. So what we do is we take an average over all possible values of f, weighted by how likely we think it is that f really does take that value.
P(A) = “Sum from θ=0 to θ=1”( P(f = θ) P(A | f = θ) ) = “Sum from θ=0 to θ=1″( P(f = θ) θ )
So we get a value for the probability of A, but this isn’t necessarily equal to the true frequency f.
Right. Now we watch a successful shuttle launch. A has not occurred. Call this event is called ¬A, or not-A. This changes our credences about what f could be equal to. Given that one launch has been successful, it is more likely that f is small. The way we determine our new belief, P(f = θ | ¬A), is to use Bayes theorem:
P(f = θ | ¬A) = P(¬A | f = θ) * P(f = θ) / P(¬A)
But we know that P(A | f = θ) = θ, so we have P(¬A | f = θ) = 1- θ. We put that in and get:
P(f = θ | ¬A) = θ * P(f = θ) / (1 - P(A))
and luckily we see that we know all of the things on the right hand side.
Having updated to our new credences for the value of f, given by P(f = θ | ¬A), we could now calculate the probability of the shuttle crashing on its second launch. Call this event B. That is, we want the probability of B, given that ¬A has occurred, P(B | ¬A). We do exactly what we did before, take a weighted average of P(B | f = θ , ¬A) weighting by the chance that f really is θ.
P(B | ¬A) = “Sum from θ=0 to θ=1”( P(f = θ | ¬A) P(B | f = θ , ¬A) ) = “Sum from θ=0 to θ=1″( P(f = θ | ¬A) θ )
In brief:
There are two kinds of probabilities: Frequencies and credences.
The frequency of an event is the fraction of times it occurs in the long run. If you’re wearing Bayesian goggles then you refuse to call this a probability. You just treat it as a physical quantity, like the mass of a person, or the height of a building. A fair coin comes heads with a frequency of 50%, a die rolls 2 with a frequency of 1⁄6.
The credence you have in some fact is the degree to which you believe it. This is what Bayesians call “probability”. If you had a biased coin you might believe that the next toss will be heads, and the strength of your belief could be 70%, but this doesn’t mean that you think the long run frequency of heads will be 70%.
So when you’re assessing the space shuttle, you treat it as if there is some fixed frequency with which shuttles crashes. But you don’t know what this frequency is. So you have a probability distribution over the possible values of the frequency. Maybe you have a 20% credence that the frequency is 0.5%, a 15% credence that the frequency is 1%, a 10% credence that the frequency is 1.5%, and so on...
In symbols this looks like this. Let f be the true frequency. Then for each θ between 0 and 1 we have some credence that f = θ. We write this credence as P(f = θ). This expression is a function that depends on θ. It expresses our belief that the frequency is equal to θ.
Now, we want to calculate the probability (our credence) that the shuttle will crash. If we knew that f was 3%, then our credence that the shuttle would crash would also be 3%. That is, if we know the frequency then our degree of belief that the shuttle will crash is equal to that frequency. In symbols, this looks like this: Let A be the event that the shuttle crashes, then P(A | f = θ) = θ: the probability of the shuttle crashing given that f is equal to θ, is exactly θ.
But we don’t know what f is. So what we do is we take an average over all possible values of f, weighted by how likely we think it is that f really does take that value.
So we get a value for the probability of A, but this isn’t necessarily equal to the true frequency f.
Right. Now we watch a successful shuttle launch. A has not occurred. Call this event is called ¬A, or not-A. This changes our credences about what f could be equal to. Given that one launch has been successful, it is more likely that f is small. The way we determine our new belief, P(f = θ | ¬A), is to use Bayes theorem:
But we know that P(A | f = θ) = θ, so we have P(¬A | f = θ) = 1- θ. We put that in and get:
and luckily we see that we know all of the things on the right hand side.
Having updated to our new credences for the value of f, given by P(f = θ | ¬A), we could now calculate the probability of the shuttle crashing on its second launch. Call this event B. That is, we want the probability of B, given that ¬A has occurred, P(B | ¬A). We do exactly what we did before, take a weighted average of P(B | f = θ , ¬A) weighting by the chance that f really is θ.
Inconsistent.
Gah!
EDIT: Fixed.