Perhaps we can try an experiment? We have here, apparently, both Bayesians and frequentists; or at a minimum, people knowledgeable enough to be able to apply both methods. Suppose I generate 25 data points from some distribution whose nature I do not disclose, and ask for estimates of the true mean and standard deviation, from a Bayesian and a frequentist? The underlying analysis would also be welcome. If necessary we could extend this to 100 sets of data points, ask for 95% confidence intervals, and see if the methods are well calibrated. (This does probably require some better method of transferring data than blog comments, though.)
It is possible that this task does not have sufficient difficulty to distinguish between the approaches. If so, how can we add constraints to get different answers?
There’s a difficulty with your experimental setup in that you implicitly are invoking a probability distribution over probability distributions (since you represent a random choice of a distribution). The results are going to be highly dependent upon how you construct your distribution over distributions. If your outcome space for probability distributions is infinite (which is what I would expect), and you sampled from a broad enough class of distributions then a sampling of 25 data points is not enough data to say anything substantive.
A friend of yours who knows what distributions you’re going to select from, though, could incorporate that knowledge into a prior and then use that to win.
So, I predict that for your setup there exists a Bayesian who would be able to consistently win.
But, if you gave much more data and you sampled from a rich enough set of probability distributions that priors would become hard to specify a frequentist procedure would probably win out.
Hmm. I don’t know if I’m a very random source of distributions; humans are notoriously bad at randomness, and there are only so many distributions readily available in standard libraries. But in any case, I don’t see this as a difficulty; a real-world problem is under no obligation to give you an easily recognised distribution. If Bayesians do better when the distribution is unknown, good for them. And if not, tough beans. That is precisely the sort of thing we’re trying to measure!
I don’t think, though, that the existence of a Bayesian who can win, based on knowing what distributions I’m likely to use, is a very strong statement. Similarly there exists a frequentist who can win based on watching over my shoulder when I wrote the program! You can always win by invoking special knowledge. This does not say anything about what would happen in a real-world problem, where special knowledge is not available.
You can actually simulate a tremendous number of distributions (and theoretically any to an arbitrary degree of accuracy) by doing an approximate inverse CDF applied to a standard uniform random variable see here for example. So the space of distributions from which you could select to do your test is potentially infinite. We can then think of your selection of a probability distribution as being a random experiment and model your selection process using a probability distribution.
The issue is that since the outcome space is the space of all computable probability distributions Bayesians will have consistency problems (another good paper on the topic is here), i.e. the posterior distribution won’t converge to the true distribution. So in this particular set up I think Bayesian methods are inferior unless one could devise a good prior over what distributions, I suppose if I knew that you didn’t know how to sample from arbitrary probability distributions then if I put that in my prior then I may be able to use Bayesian methods to successfully estimate the probability distribution (the discussion of the Bayesian who knew you personally was meant to be tongue-in-cheek).
In the frequentist case there is a known procedure due to Parzen from the 60′s .
All of these are asymptotic results, however, your experiment seems to be focused on very small samples. To the best of my knowledge there aren’t many results in this case except under special conditions. I would state that without more constraints on the experimental design I don’t think you’ll get very interesting results. Although I am actually really in favor of such evaluations because people in statistics and machine learning for a variety of reasons don’t do them, or don’t do them on a broad enough scale. Anyway if you actually are interested in such things you may want to start looking here, since statistics and machine learning both have the tools to properly design such experiments.
The small samples are a constraint imposed by the limits of blog comments; there’s a limit to how many numbers I would feel comfortable spamming this place with. If we got some volunteers, we might do a more serious sample size using hosted ROOT ntuples or zipping up some plain ASCII.
I do know how to sample from arbitrary distributions; I should have specified that the space of distributions is those for which I don’t have to think for more than a minute or so, or in other words, someone has already coded the CDF in a library I’ve already got installed. It’s not knowledge but work that’s the limiting factor. :) Presumably this limits your prior quite a lot already, there being only so many commonly used math libraries.
Ha ha—this is a Bayesian problem drawn from a Bayesian perspective!
Surely a frequentist would have a different perspective and propose a different kind of solution. Instead of designing an experiment to determine which is better, how about extrapolating from the evidence we already have. Humans have made a certain amount of progress in mathematics—has this mathematics been mainly developed by frequentists or Bayesians?
(Case closed, I think.)
I roughly consider Bayesians the experimental scientists and frequentists the theoretical scientists. Mathematics is theoretical, which is why the frequentists cluster there. Do you disagree with this?
You could use the same argument against the use of computers in science—after all, Newton didn’t have a computer, and neither did Einstein. Case closed, I think.
Ha ha—this is a Bayesian problem drawn from a Bayesian perspective!
Surely a frequentist would have a different perspective and propose a different kind of solution. Instead of designing an experiment to determine which is better, how about extrapolating from the evidence we already have. Humans have made a certain amount of progress in mathematics—has this mathematics been mainly developed by frequentists or Bayesians?
(Case closed, I think.)
I roughly consider Bayesians the experimental scientists and frequentists the theoretical scientists. Mathematics is theoretical, which is why the frequentists cluster there. Do you disagree with this?
(Nevertheless, the challenge sounds fun.)
My response to Nominull: the cases aren’t really parallel, but I do need to emphasize that I don’t think the Bayesian perspective is wrong; it just hasn’t been the perspective, historically, of most mathematicians.
… but, finally, when I think of Baysian mathematics being a new or under-utilised thing, I see an analogy with computers. Perhaps Bayesian theory could be a power-horse for new mathematics. I guess my perspective was that mathematicians will use whichever tools available to them, and they used frequentist theory instead. But perhaps they didn’t understand Bayesian tools or it wasn’t the time for them yet.
Voted the courtesy repost back up to zero. I most likely downvoted the original post for blatant silliness but really, why penalise politeness? In fact, I’d upvote the deleted great grandparent for demonstrating changing one’s mind (on the applicability of a particular point), in defiance of rather strong biases against doing that.
I roughly consider Bayesians the experimental scientists and frequentists the theoretical scientists. Mathematics is theoretical, which is why the frequentists cluster there. Do you disagree with this?
I consider frequentist experimental scientists to be potentially competent in what they do. After all, available frequentist techniques are good enough that the significant problems with the application of stastics are in the misuse of frequentist tools, more so than them being used at all. As for theoretical frequentists… I suggest that anyone who makes a serious investigation into developments in probability theory and statistics will not remain a frequentist. I claim that what ‘theoretical frequentists’ do is orthoganal to theory (but often precisely in line with what academia is really about).
Perhaps we can try an experiment? We have here, apparently, both Bayesians and frequentists; or at a minimum, people knowledgeable enough to be able to apply both methods. Suppose I generate 25 data points from some distribution whose nature I do not disclose, and ask for estimates of the true mean and standard deviation, from a Bayesian and a frequentist? The underlying analysis would also be welcome. If necessary we could extend this to 100 sets of data points, ask for 95% confidence intervals, and see if the methods are well calibrated. (This does probably require some better method of transferring data than blog comments, though.)
As a start, here is one data set:
617.91 16.8539 83.4021 141.504 545.112 215.863 553.168 414.435 4.71129 609.623 117.189 −102.648 647.449 283.57 286.838 710.811 505.826 79.3366 171.816 105.332 540.313 429.298 −314.32 255.93 382.471
It is possible that this task does not have sufficient difficulty to distinguish between the approaches. If so, how can we add constraints to get different answers?
There’s a difficulty with your experimental setup in that you implicitly are invoking a probability distribution over probability distributions (since you represent a random choice of a distribution). The results are going to be highly dependent upon how you construct your distribution over distributions. If your outcome space for probability distributions is infinite (which is what I would expect), and you sampled from a broad enough class of distributions then a sampling of 25 data points is not enough data to say anything substantive.
A friend of yours who knows what distributions you’re going to select from, though, could incorporate that knowledge into a prior and then use that to win.
So, I predict that for your setup there exists a Bayesian who would be able to consistently win.
But, if you gave much more data and you sampled from a rich enough set of probability distributions that priors would become hard to specify a frequentist procedure would probably win out.
Hmm. I don’t know if I’m a very random source of distributions; humans are notoriously bad at randomness, and there are only so many distributions readily available in standard libraries. But in any case, I don’t see this as a difficulty; a real-world problem is under no obligation to give you an easily recognised distribution. If Bayesians do better when the distribution is unknown, good for them. And if not, tough beans. That is precisely the sort of thing we’re trying to measure!
I don’t think, though, that the existence of a Bayesian who can win, based on knowing what distributions I’m likely to use, is a very strong statement. Similarly there exists a frequentist who can win based on watching over my shoulder when I wrote the program! You can always win by invoking special knowledge. This does not say anything about what would happen in a real-world problem, where special knowledge is not available.
You can actually simulate a tremendous number of distributions (and theoretically any to an arbitrary degree of accuracy) by doing an approximate inverse CDF applied to a standard uniform random variable see here for example. So the space of distributions from which you could select to do your test is potentially infinite. We can then think of your selection of a probability distribution as being a random experiment and model your selection process using a probability distribution.
The issue is that since the outcome space is the space of all computable probability distributions Bayesians will have consistency problems (another good paper on the topic is here), i.e. the posterior distribution won’t converge to the true distribution. So in this particular set up I think Bayesian methods are inferior unless one could devise a good prior over what distributions, I suppose if I knew that you didn’t know how to sample from arbitrary probability distributions then if I put that in my prior then I may be able to use Bayesian methods to successfully estimate the probability distribution (the discussion of the Bayesian who knew you personally was meant to be tongue-in-cheek).
In the frequentist case there is a known procedure due to Parzen from the 60′s .
All of these are asymptotic results, however, your experiment seems to be focused on very small samples. To the best of my knowledge there aren’t many results in this case except under special conditions. I would state that without more constraints on the experimental design I don’t think you’ll get very interesting results. Although I am actually really in favor of such evaluations because people in statistics and machine learning for a variety of reasons don’t do them, or don’t do them on a broad enough scale. Anyway if you actually are interested in such things you may want to start looking here, since statistics and machine learning both have the tools to properly design such experiments.
The small samples are a constraint imposed by the limits of blog comments; there’s a limit to how many numbers I would feel comfortable spamming this place with. If we got some volunteers, we might do a more serious sample size using hosted ROOT ntuples or zipping up some plain ASCII.
I do know how to sample from arbitrary distributions; I should have specified that the space of distributions is those for which I don’t have to think for more than a minute or so, or in other words, someone has already coded the CDF in a library I’ve already got installed. It’s not knowledge but work that’s the limiting factor. :) Presumably this limits your prior quite a lot already, there being only so many commonly used math libraries.
Ha ha—this is a Bayesian problem drawn from a Bayesian perspective!
Surely a frequentist would have a different perspective and propose a different kind of solution. Instead of designing an experiment to determine which is better, how about extrapolating from the evidence we already have. Humans have made a certain amount of progress in mathematics—has this mathematics been mainly developed by frequentists or Bayesians?
(Case closed, I think.)
I roughly consider Bayesians the experimental scientists and frequentists the theoretical scientists. Mathematics is theoretical, which is why the frequentists cluster there. Do you disagree with this?
(Nevertheless, the challenge sounds fun.)
You could use the same argument against the use of computers in science—after all, Newton didn’t have a computer, and neither did Einstein. Case closed, I think.
This is the comment Nominull was referring to:
Ha ha—this is a Bayesian problem drawn from a Bayesian perspective!
Surely a frequentist would have a different perspective and propose a different kind of solution. Instead of designing an experiment to determine which is better, how about extrapolating from the evidence we already have. Humans have made a certain amount of progress in mathematics—has this mathematics been mainly developed by frequentists or Bayesians?
(Case closed, I think.)
I roughly consider Bayesians the experimental scientists and frequentists the theoretical scientists. Mathematics is theoretical, which is why the frequentists cluster there. Do you disagree with this?
(Nevertheless, the challenge sounds fun.)
My response to Nominull: the cases aren’t really parallel, but I do need to emphasize that I don’t think the Bayesian perspective is wrong; it just hasn’t been the perspective, historically, of most mathematicians.
… but, finally, when I think of Baysian mathematics being a new or under-utilised thing, I see an analogy with computers. Perhaps Bayesian theory could be a power-horse for new mathematics. I guess my perspective was that mathematicians will use whichever tools available to them, and they used frequentist theory instead. But perhaps they didn’t understand Bayesian tools or it wasn’t the time for them yet.
Voted the courtesy repost back up to zero. I most likely downvoted the original post for blatant silliness but really, why penalise politeness? In fact, I’d upvote the deleted great grandparent for demonstrating changing one’s mind (on the applicability of a particular point), in defiance of rather strong biases against doing that.
I consider frequentist experimental scientists to be potentially competent in what they do. After all, available frequentist techniques are good enough that the significant problems with the application of stastics are in the misuse of frequentist tools, more so than them being used at all. As for theoretical frequentists… I suggest that anyone who makes a serious investigation into developments in probability theory and statistics will not remain a frequentist. I claim that what ‘theoretical frequentists’ do is orthoganal to theory (but often precisely in line with what academia is really about).