If you have an epistemic conception of probability, then it makes sense to talk about the probability distribution of a theoretical parameter, such as the mean of some variable in a population.
Please go on with an example of how it is practically relevant, so that the frequentism fails.
The Bayesianism here with the Solomonoff induction as a prior, is identical to frequentism over Turing machines anyway (or at least should be; if you make mistakes it won’t be)
With the local trope, the local trope seem to be a complete misunderstanding of books such as the one you linked.
Please go on with an example of how it is practically relevant, so that the frequentism fails.
For an actual scientific example of Bayesian and frequentist methods yielding different results when applied to the same problem, see Wagenmakers et al.’s criticisms [PDF] of Bem’s precognition experiments.
Here’s a toy example that (according to Bayesians, at least) illustrates a defect of frequentist methodology. You draw two random values from a uniform distribution with unknown mean m and known width 1. Let these values be v1 and v2, with v1 < v2. If you did this experiment repeatedly, then you would expect that 50% of the time, the interval (v1, v2) would include the population mean m. So according to the frequentist, this is the 50% confidence interval.
Suppose that on a particular run of the experiment, you get v1 = 0.1 and v2 = 1.0. For this particular data, the Bayesian would say that given our model, there is a 100% chance of the mean lying in the interval (v1, v2). The consistent frequentist, however, cannot say this. She can’t talk about the probability of the mean lying in the interval, she can only talk about the relative frequency with which the interval (considered as a random variable) will contain the mean, and this remains 50%. So she will say that the interval (0.1, 0.9) is a 50% confidence interval. The Bayesian charge is that by refusing to conditionalize on the actual data available to her, the frequentist has missed important information: specifically, that the mean of the distribution is definitely between 0.1 and 1.0.
The Bayesian charge is that by refusing to conditionalize on the actual data available to her, the frequentist has missed important information: specifically, that the mean of the distribution is definitely between 0.1 and 1.0.
Still sounds like silly terminology collision. Like if in the physics you had right-hand-rulers and left-hand-rulers and some would charge that the direction of magnetic field is all wrong, while either party simply means different thing by ‘magnetic field’ (and a few people associated with insane clown posse sometimes straight calculate it wrong)
edit: ohh what you wrote is even worse than sense I accidentally read into it (misreading uniform as normal getting confused afterwards). Picking people screw up math as strawman. Stupid, very stupid. And boring.
Nowhere does it follow from seeing the probability as a limit in the infinite number of trials (frequentism), that the mean of that distribution with unknown mean wouldn’t be restricted to specific range. Say, you draw 1 number from this distribution with width 1. It immediately follows that the values of unknown parameter of the generator of the number, that fall outside x-0.5 , x+0.5 are not possible. You keep drawing more, you narrow down the set of possible values. [i am banning the ‘mean’ because there’s the mean that is the property of the system we are studying, and there is the mean of the model we are creating, 2 different things]
Nowhere does it follow from seeing the probability as a limit in the infinite number of trials (frequentism), that the mean of that distribution with unknown mean wouldn’t be restricted to specific range.
In the particular case I gave, of course frequentists could produce an argument that the mean must be in the given range. But this could not be a statistical argument, it would have to be a deductive logical argument. And the only reason a deductive argument works here is that the posterior of the mean being in the given range is 1. If it were only slightly less than 1, 0.99 say, there would be no logical argument either. In that case, the frequentist could not account for the fact that we should be extremely confident the mean is in that range without implicitly employing Bayesian methods. Frequentist methods neglect an important part of our ordinary inductive practices.
Now look, I don’t think frequentists are idiots. If they encountered the situation I describe in my toy example, they would of course conclude that the mean is in the interval [0.1, 1.0]. My point is that this is not a conclusion that follows from frequentist statistics. This doesn’t mean frequentist methodology precludes this conclusion, it just does not deliver it. A frequentist who came to this conclusion would implicitly be employing Bayesian methods. In fact, I don’t think there is any such creature as a pure frequentist (at least, not anymore). There are people who have a preference for frequentist methodology, but I doubt any of them would steadfastly refuse to assign probabilites to theoretical parameters in all contexts.
I expect most scientists and statisticians are pluralists, willing to apply whichever method is pragmatically suited to a particular problem. I’m in the same boat. I’m hardly a Bayesian zealot. I’m not arguing that actual applied statisticians divide into perfect frequentists and perfect Bayesians. What I’m arguing is against your initial claim that no significant methodological distinction follows from conceiving of probabilities as epistemic vs. conceiving of them as relative frequencies. There are significant methodological distinctions, and these distinctions are acknowledged by virtually all practicing statisticians. Applying the different methodologies can lead to different conclusions in certain situations.
If your objection to LW is that Bayesianism shouldn’t be regarded as the one true logic of induction, to the exclusion of all others, then I’m with you brother. I don’t agree with Eliezer’s Bayesian exclusionism either. But this is not the objection you raised. You seemed to be claiming that the distinction between Bayesian and frequentist methods is somehow idiosyncratic to this community, and this is just wrong.
(Incidentally, I am sorry you find my example boring and stupid, but your quarrel is not with me. It is with Morris DeGroot, in whose textbook I first encountered the example. I mention this as a counter to the view you seem to be espousing, that all respectable statisticians agree with you and only “sloppy philosophers” disagree.)
In the particular case I gave, of course frequentists could produce an argument that the mean must be in the given range. But this could not be a statistical argument, it would have to be a deductive logical argument.
The frequentists do have an out here: conditional inference. Obviously, (v2+v1)/2 is sufficient for m, so they don’t need any other information for their inference. But it might occur to them to condition on the ancillary statistic v2-v1. In repeated trials where v2-v1 = 0.9, the interval (v1,v2) always contains m.
Edit: As pragmatist mentions below, this is wrong wrong wrong. The minimal sufficient statistic is (v1,v2), although it is true that v2-v1 is ancillary and moreover it is the ancillary complement to the sample mean. That I was working with order statistics (and the uniform distribution!) is a sign that I shouldn’t just grope for the sample mean and say good enough.
True, but is there any motivation for the frequentist to condition on the ancillary statistic, besides relying on Bayesian intuitions? My understanding is that the usual mathematical motivation for conditioning on the ancillary statistic is that there is no sufficient statistic of the same dimension as the parameter. That isn’t true in this case.
ETA: Wait, that isn’t right… I made the same assumption you did, that the sample mean is obviously sufficient for m in this example. But that isn’t true! I’m pretty sure in this case the minimal sufficient statistic is actually two-dimensional, so according to what I wrote above, there is a mathematical motivation to condition on the observed value of the ancillary statistic. So I guess the frequentist does have an out in this case.
Where did you get your ideas of statistics, may I ask? The “what frequentists do” and “what bayesians do” isn’t even part of mathematics, in the mathematics you learn the formulae, where those come from, and actually see how either works. Can’t teach that in a post, you’ll need several years studying.
You learn “what frequentists do” and “what bayesians do” predominantly from people whom can’t actually do any interesting math and instead resort to some sort of confused meta for amusement.
The frequentism is seeing the probability as the limit of infinitely many trials. Nothing more, nothing less. You can do trials on Turing machine if you wish. If you read LessWrong you’d be thinking frequentism is some blatant rejection of the Bayes rule or something. The LessWrong seem to be predominantly focussed on meta of claiming itself to be less wrong than someone else, usually wrongly.
Please go on with an example of how it is practically relevant, so that the frequentism fails.
The Bayesianism here with the Solomonoff induction as a prior, is identical to frequentism over Turing machines anyway (or at least should be; if you make mistakes it won’t be)
With the local trope, the local trope seem to be a complete misunderstanding of books such as the one you linked.
For an actual scientific example of Bayesian and frequentist methods yielding different results when applied to the same problem, see Wagenmakers et al.’s criticisms [PDF] of Bem’s precognition experiments.
Here’s a toy example that (according to Bayesians, at least) illustrates a defect of frequentist methodology. You draw two random values from a uniform distribution with unknown mean m and known width 1. Let these values be v1 and v2, with v1 < v2. If you did this experiment repeatedly, then you would expect that 50% of the time, the interval (v1, v2) would include the population mean m. So according to the frequentist, this is the 50% confidence interval.
Suppose that on a particular run of the experiment, you get v1 = 0.1 and v2 = 1.0. For this particular data, the Bayesian would say that given our model, there is a 100% chance of the mean lying in the interval (v1, v2). The consistent frequentist, however, cannot say this. She can’t talk about the probability of the mean lying in the interval, she can only talk about the relative frequency with which the interval (considered as a random variable) will contain the mean, and this remains 50%. So she will say that the interval (0.1, 0.9) is a 50% confidence interval. The Bayesian charge is that by refusing to conditionalize on the actual data available to her, the frequentist has missed important information: specifically, that the mean of the distribution is definitely between 0.1 and 1.0.
A similar example is given in Wei_Dai’s post Frequentist Magic vs. Bayesian Magic.
...
Still sounds like silly terminology collision. Like if in the physics you had right-hand-rulers and left-hand-rulers and some would charge that the direction of magnetic field is all wrong, while either party simply means different thing by ‘magnetic field’ (and a few people associated with insane clown posse sometimes straight calculate it wrong)
edit: ohh what you wrote is even worse than sense I accidentally read into it (misreading uniform as normal getting confused afterwards). Picking people screw up math as strawman. Stupid, very stupid. And boring.
Nowhere does it follow from seeing the probability as a limit in the infinite number of trials (frequentism), that the mean of that distribution with unknown mean wouldn’t be restricted to specific range. Say, you draw 1 number from this distribution with width 1. It immediately follows that the values of unknown parameter of the generator of the number, that fall outside x-0.5 , x+0.5 are not possible. You keep drawing more, you narrow down the set of possible values. [i am banning the ‘mean’ because there’s the mean that is the property of the system we are studying, and there is the mean of the model we are creating, 2 different things]
In the particular case I gave, of course frequentists could produce an argument that the mean must be in the given range. But this could not be a statistical argument, it would have to be a deductive logical argument. And the only reason a deductive argument works here is that the posterior of the mean being in the given range is 1. If it were only slightly less than 1, 0.99 say, there would be no logical argument either. In that case, the frequentist could not account for the fact that we should be extremely confident the mean is in that range without implicitly employing Bayesian methods. Frequentist methods neglect an important part of our ordinary inductive practices.
Now look, I don’t think frequentists are idiots. If they encountered the situation I describe in my toy example, they would of course conclude that the mean is in the interval [0.1, 1.0]. My point is that this is not a conclusion that follows from frequentist statistics. This doesn’t mean frequentist methodology precludes this conclusion, it just does not deliver it. A frequentist who came to this conclusion would implicitly be employing Bayesian methods. In fact, I don’t think there is any such creature as a pure frequentist (at least, not anymore). There are people who have a preference for frequentist methodology, but I doubt any of them would steadfastly refuse to assign probabilites to theoretical parameters in all contexts.
I expect most scientists and statisticians are pluralists, willing to apply whichever method is pragmatically suited to a particular problem. I’m in the same boat. I’m hardly a Bayesian zealot. I’m not arguing that actual applied statisticians divide into perfect frequentists and perfect Bayesians. What I’m arguing is against your initial claim that no significant methodological distinction follows from conceiving of probabilities as epistemic vs. conceiving of them as relative frequencies. There are significant methodological distinctions, and these distinctions are acknowledged by virtually all practicing statisticians. Applying the different methodologies can lead to different conclusions in certain situations.
If your objection to LW is that Bayesianism shouldn’t be regarded as the one true logic of induction, to the exclusion of all others, then I’m with you brother. I don’t agree with Eliezer’s Bayesian exclusionism either. But this is not the objection you raised. You seemed to be claiming that the distinction between Bayesian and frequentist methods is somehow idiosyncratic to this community, and this is just wrong.
(Incidentally, I am sorry you find my example boring and stupid, but your quarrel is not with me. It is with Morris DeGroot, in whose textbook I first encountered the example. I mention this as a counter to the view you seem to be espousing, that all respectable statisticians agree with you and only “sloppy philosophers” disagree.)
The frequentists do have an out here: conditional inference. Obviously, (v2+v1)/2 is sufficient for m, so they don’t need any other information for their inference. But it might occur to them to condition on the ancillary statistic v2-v1. In repeated trials where v2-v1 = 0.9, the interval (v1,v2) always contains m.
Edit: As pragmatist mentions below, this is wrong wrong wrong. The minimal sufficient statistic is (v1,v2), although it is true that v2-v1 is ancillary and moreover it is the ancillary complement to the sample mean. That I was working with order statistics (and the uniform distribution!) is a sign that I shouldn’t just grope for the sample mean and say good enough.
True, but is there any motivation for the frequentist to condition on the ancillary statistic, besides relying on Bayesian intuitions? My understanding is that the usual mathematical motivation for conditioning on the ancillary statistic is that there is no sufficient statistic of the same dimension as the parameter. That isn’t true in this case.
ETA: Wait, that isn’t right… I made the same assumption you did, that the sample mean is obviously sufficient for m in this example. But that isn’t true! I’m pretty sure in this case the minimal sufficient statistic is actually two-dimensional, so according to what I wrote above, there is a mathematical motivation to condition on the observed value of the ancillary statistic. So I guess the frequentist does have an out in this case.
That’s not what frequentists actually do. See e.g. Probability Theory: The Logic of Science by E.T. Jaynes.
What is not what frequentists actually do?
Reasoning “over Turing machines” and thence getting the same results (or even using the same tools) as Bayesians.
Where did you get your ideas of statistics, may I ask? The “what frequentists do” and “what bayesians do” isn’t even part of mathematics, in the mathematics you learn the formulae, where those come from, and actually see how either works. Can’t teach that in a post, you’ll need several years studying.
You learn “what frequentists do” and “what bayesians do” predominantly from people whom can’t actually do any interesting math and instead resort to some sort of confused meta for amusement.
Also: nobody actually uses Solomonoff induction, it’s uncomputable.
The frequentism is seeing the probability as the limit of infinitely many trials. Nothing more, nothing less. You can do trials on Turing machine if you wish. If you read LessWrong you’d be thinking frequentism is some blatant rejection of the Bayes rule or something. The LessWrong seem to be predominantly focussed on meta of claiming itself to be less wrong than someone else, usually wrongly.