I disagree that there is a difference between “Bayesian” and “Frequentist;” or at least, that it has anything to do with what is mentioned in this article. The field of Probability has the unfortunate property of appearing to be a very simple, well defined topic. But it actually is complex enough to be indefinable. Those labels are used by people who want to argue in favor of one definition—of the indefinable—over another. The only difference I see is where they fail to completely address a problem.
Take the biased coin problem as an example. If either label applies to me, it is Frequentist, but my answer is that the one Eliezer_Yudkowsky says is the Bayesian’s. He gets the wrong Frequentist solution because he only allows the Frequentist to acknowledge one uncertainty—one random variable—in the problem. Whether the coin came up heads or tails. If a Frequentist says the question is unanswerable, (s)he is wrong because (s)he is using an incomplete solution. The bias b—of a coin already selected—is just as much a random variable as the side s that came up in a coin already flipped. If you claim the answer must be based on the actual value of b for the coins, it must also be based on the actual value of s for this flip. That means the probability is either 0 or 1, which is absurd. (Technically, this error is one of confusing an outcome and an event. An outcome is the specific result of a specific trial, and has no probability. An event is a set of possible outcomes, and is what a probability is assigned to. Eliezer_Yudkowsky’s Frequentist is treating the choice of a coin as an outcome, and the result of the flip as an event.)
We can answer the question without knowing anything more about b, than that it is not 1⁄2. For any 0<=b1<1/2, since we have no other information, b=b1 and b=1-b1 must be treated as equally likely. Regardless of what the distribution of b1 is, this makes the probability the coin landed on heads 1⁄2.
The classic Two Child Problem has a similar issue, but Eliezer_Yudkowsky did not ask the classic one. I find it best to explain this one in the manner Joseph Bertrand used for his famous Box Paradox. I have two children. What is the probability they share the same gender? That’s easy: 1⁄2. Now I secretly write one gender on a note card. I then show the card to you, and tell you one of my children has that gender. If it says “boy,” does the answer change to 1/3? What if it says “girl”? The answers can’t be different for the two words you might see; but whatever that answer is, it has to be the same as the answer to the original question (proof by Bayes Theorem). So if the answer does change, we have a paradox.
Yet if presented with the information all at once, “I have two, and one is a boy,” Frequentist and Bayesian alike will usually answer “1/3.” And they usually will say that anybody who answers 1⁄2 is addressing the “I have two, and one specific child, by age, is a boy” version Eliezer_Yudkowsky mentioned. But that is not how I get 1⁄2. There are three random variables, not two: the older child’s gender, the younger child’s gender, and which gender I will mention if I have the choice of two. Allowing all three to be split 50⁄50 between “boy” and “girl” makes the answer 1⁄2, and there is no paradox.
Ironically, my reasoning is what the same mathematicians will use for either the Monty Hall Problem, or the identical Three Prisoners Problem. Two cases that were originally equally likely remain possible. But they are no longer equally possible, because the provider of information had a choice of two in one case, but no choice in the other. Bayesians may claim the difference is a property of the information, and Frequentists (if they use a complete solution) will say there is an additional, implicit random variable. Both work out the same, just by different methods. It is ironic, because while Bertrand’s Box Paradox is often compared to these two problems because it is mathematically equivalent to them. The Two Child Problem is closer to being logically equivalent because of the way the information is provided, yet never gets compared. In fact, it is identical if you add a fourth box.
We can answer the question without knowing anything more about b, than that it is not 1⁄2. For any 0<=b1<1/2, since we have no other information, b=b1 and b=1-b1 must be treated as equally likely. Regardless of what the distribution of b1 is, this makes the probability the coin landed on heads 1⁄2.
is pretty clearly wrong. (In fact, it looks a lot like you’re establishing a prior distribution, and that’s uniquely a Bayesian feature.) The probability of an event (the result of the flip is surely an event, though I can’t tell if you’re claiming to the contrary or not) to a frequentist is the limit of the proportion of times the event occurred in independent trials as the number of trials tends to infinity. The probability the coin landed on heads is the one thing in the problem statement that can’t be 1⁄2, because we know that the coin is biased. Your calculation above seems mostly ad hoc, as is your introduction of additional random variables elsewhere.
Say a bag contains 100 unique coins that have been carefully tuned to be unfair when flipped. Each is stamped with an integer in the range 0 to 100 (50 is missing) representing its probability, in percent, of landing on heads. A single coin is withdrawn without revealing its number, and flipped. What is the probability that the result will be heads?
You are claiming that anybody who calls himself a Frequentist needs to know the number on the coin to answer this question. And that any attempt to represent the probability of drawing coin N is specifying a prior distribution, an act that is strictly prohibited for a Frequentist. Both claims are absurd. Prior distributions are a fact of the mathematics of probability, and belong to Frequentist and Bayesian alike. The only differences are (1) the Bayesian may use information differently to determine a prior, sometimes in situations where a Frequentist wouldn’t see one at all; (2) The Bayesian will prefer solutions based explicitly on that prior, while the Frequentist will prefer solutions based on the how the prior affects repeated experiments; and (3) Some Frequentists might not realize when they have enough information to determine a prior, and/or its effects, that should satisfy them.
If both get answers, and they don’t agree, somebody did something wrong.
The answer is 50%. The Bayesian says that, based on available information, neither result can be favored over the other so they must both have probability 50%. The Frequentist says that if you repeat the experiment 100^2 times, including the part where you draw a coin from the bag of 100 coins, you should count on getting each coin 100 times. And you should also count, for each coin, on getting heads in proportion to its probability. That way, you will count 5,000 heads in 10,000 trials, making the answer 50%. Both solutions are based on the same facts and assumptions, just organized differently.
The answer Eliezer_Yudkowsky attributes to Frequentists, for the simpler problem without the bag and stamped coins, is an incorrect Frequentist solution. Or at least, a correct solution to a different problem. One that corresponds to the different question “What proportion of the time will this coin come up heads?” I agree that some who claim to be Frequentists will answer that question. But the true Frequentist will answer the question that was asked: “What proportion of the time will the process of flipping a coin with unknown bias come up heads?” His repetitions must represent the bias for each flip as independent of any other flips, not the same bias each time. The bias B will come up just as often as the bias (1-B), so the number of heads will always be half the number of trials.
The random process a frequentist should repeat is flipping arandom biased coin, and getting a random bias b and either heads or tails. You are assuming it is flipping the same* biased coin with fixed bias B, and getting heads or tails.
The probability arandom biased coins lands heads is 1⁄2, from either point of view. And for nshepperd, the point is that a Frequentist doesn’t need to know what the bias is. As long as we can’t assume it is different for b1 and 1-b1, when you integrate over the unknown distribution (yes, you can do that in this case) the answer is 1⁄2.
I think they are arguing that the “independent trials” that are happening here are instances of “being given a ‘randomly’ biased coin and seeing if a single flip turns up heads”. But of course the techniques they are using are bayesian, because I’d expect a frequentist to say at this point “well, I don’t know who’s giving me the coins, how am I supposed to know the probability distribution for the coins?”.
My first post, so be gentle. :)
I disagree that there is a difference between “Bayesian” and “Frequentist;” or at least, that it has anything to do with what is mentioned in this article. The field of Probability has the unfortunate property of appearing to be a very simple, well defined topic. But it actually is complex enough to be indefinable. Those labels are used by people who want to argue in favor of one definition—of the indefinable—over another. The only difference I see is where they fail to completely address a problem.
Take the biased coin problem as an example. If either label applies to me, it is Frequentist, but my answer is that the one Eliezer_Yudkowsky says is the Bayesian’s. He gets the wrong Frequentist solution because he only allows the Frequentist to acknowledge one uncertainty—one random variable—in the problem. Whether the coin came up heads or tails. If a Frequentist says the question is unanswerable, (s)he is wrong because (s)he is using an incomplete solution. The bias b—of a coin already selected—is just as much a random variable as the side s that came up in a coin already flipped. If you claim the answer must be based on the actual value of b for the coins, it must also be based on the actual value of s for this flip. That means the probability is either 0 or 1, which is absurd. (Technically, this error is one of confusing an outcome and an event. An outcome is the specific result of a specific trial, and has no probability. An event is a set of possible outcomes, and is what a probability is assigned to. Eliezer_Yudkowsky’s Frequentist is treating the choice of a coin as an outcome, and the result of the flip as an event.)
We can answer the question without knowing anything more about b, than that it is not 1⁄2. For any 0<=b1<1/2, since we have no other information, b=b1 and b=1-b1 must be treated as equally likely. Regardless of what the distribution of b1 is, this makes the probability the coin landed on heads 1⁄2.
The classic Two Child Problem has a similar issue, but Eliezer_Yudkowsky did not ask the classic one. I find it best to explain this one in the manner Joseph Bertrand used for his famous Box Paradox. I have two children. What is the probability they share the same gender? That’s easy: 1⁄2. Now I secretly write one gender on a note card. I then show the card to you, and tell you one of my children has that gender. If it says “boy,” does the answer change to 1/3? What if it says “girl”? The answers can’t be different for the two words you might see; but whatever that answer is, it has to be the same as the answer to the original question (proof by Bayes Theorem). So if the answer does change, we have a paradox.
Yet if presented with the information all at once, “I have two, and one is a boy,” Frequentist and Bayesian alike will usually answer “1/3.” And they usually will say that anybody who answers 1⁄2 is addressing the “I have two, and one specific child, by age, is a boy” version Eliezer_Yudkowsky mentioned. But that is not how I get 1⁄2. There are three random variables, not two: the older child’s gender, the younger child’s gender, and which gender I will mention if I have the choice of two. Allowing all three to be split 50⁄50 between “boy” and “girl” makes the answer 1⁄2, and there is no paradox.
Ironically, my reasoning is what the same mathematicians will use for either the Monty Hall Problem, or the identical Three Prisoners Problem. Two cases that were originally equally likely remain possible. But they are no longer equally possible, because the provider of information had a choice of two in one case, but no choice in the other. Bayesians may claim the difference is a property of the information, and Frequentists (if they use a complete solution) will say there is an additional, implicit random variable. Both work out the same, just by different methods. It is ironic, because while Bertrand’s Box Paradox is often compared to these two problems because it is mathematically equivalent to them. The Two Child Problem is closer to being logically equivalent because of the way the information is provided, yet never gets compared. In fact, it is identical if you add a fourth box.
I can’t speak for the rest of your post, but
is pretty clearly wrong. (In fact, it looks a lot like you’re establishing a prior distribution, and that’s uniquely a Bayesian feature.) The probability of an event (the result of the flip is surely an event, though I can’t tell if you’re claiming to the contrary or not) to a frequentist is the limit of the proportion of times the event occurred in independent trials as the number of trials tends to infinity. The probability the coin landed on heads is the one thing in the problem statement that can’t be 1⁄2, because we know that the coin is biased. Your calculation above seems mostly ad hoc, as is your introduction of additional random variables elsewhere.
However, I’m not a statistician.
Say a bag contains 100 unique coins that have been carefully tuned to be unfair when flipped. Each is stamped with an integer in the range 0 to 100 (50 is missing) representing its probability, in percent, of landing on heads. A single coin is withdrawn without revealing its number, and flipped. What is the probability that the result will be heads?
You are claiming that anybody who calls himself a Frequentist needs to know the number on the coin to answer this question. And that any attempt to represent the probability of drawing coin N is specifying a prior distribution, an act that is strictly prohibited for a Frequentist. Both claims are absurd. Prior distributions are a fact of the mathematics of probability, and belong to Frequentist and Bayesian alike. The only differences are (1) the Bayesian may use information differently to determine a prior, sometimes in situations where a Frequentist wouldn’t see one at all; (2) The Bayesian will prefer solutions based explicitly on that prior, while the Frequentist will prefer solutions based on the how the prior affects repeated experiments; and (3) Some Frequentists might not realize when they have enough information to determine a prior, and/or its effects, that should satisfy them.
If both get answers, and they don’t agree, somebody did something wrong.
The answer is 50%. The Bayesian says that, based on available information, neither result can be favored over the other so they must both have probability 50%. The Frequentist says that if you repeat the experiment 100^2 times, including the part where you draw a coin from the bag of 100 coins, you should count on getting each coin 100 times. And you should also count, for each coin, on getting heads in proportion to its probability. That way, you will count 5,000 heads in 10,000 trials, making the answer 50%. Both solutions are based on the same facts and assumptions, just organized differently.
The answer Eliezer_Yudkowsky attributes to Frequentists, for the simpler problem without the bag and stamped coins, is an incorrect Frequentist solution. Or at least, a correct solution to a different problem. One that corresponds to the different question “What proportion of the time will this coin come up heads?” I agree that some who claim to be Frequentists will answer that question. But the true Frequentist will answer the question that was asked: “What proportion of the time will the process of flipping a coin with unknown bias come up heads?” His repetitions must represent the bias for each flip as independent of any other flips, not the same bias each time. The bias B will come up just as often as the bias (1-B), so the number of heads will always be half the number of trials.
The random process a frequentist should repeat is flipping a random biased coin, and getting a random bias b and either heads or tails. You are assuming it is flipping the same* biased coin with fixed bias B, and getting heads or tails.
The probability a random biased coins lands heads is 1⁄2, from either point of view. And for nshepperd, the point is that a Frequentist doesn’t need to know what the bias is. As long as we can’t assume it is different for b1 and 1-b1, when you integrate over the unknown distribution (yes, you can do that in this case) the answer is 1⁄2.
I think they are arguing that the “independent trials” that are happening here are instances of “being given a ‘randomly’ biased coin and seeing if a single flip turns up heads”. But of course the techniques they are using are bayesian, because I’d expect a frequentist to say at this point “well, I don’t know who’s giving me the coins, how am I supposed to know the probability distribution for the coins?”.