I don’t agree about Bayesian vs. Frequentist, in the sense that I think frequentist = complex+slow.
Right now, most common models can be set up Bayesianly in a probabilistic programming system like PyMC or Stan, and be fit much more comfortably than the frequentist equivalent. In particular, it’s easier and straightforward to extract uncertainties from the posterior samples.
When that was not the case, I still think that common models were more easily derived in the Bayesian framework, e.g., classic ANOVA up to Scheffé intervals (pages of proofs, usual contrivance of confidence intervals and multiple testing), versus doing the Bayesian version (prior x posterior = oh! it’s a Student! done.) (ref. Berger&Casella for the frequentist ANOVA)
If you complain “maximum likelihood easier than posterior”, I answer “Laplace approximation”, and indeed it’s metis that observed Fisher information better than expected Fisher information. In HEP with time they somehow learned empirically to plot contour curves of the likelihood. Bayes was within yourself all along.
If you say “IPW”, I answer IT’S NOT EFFICIENT IN FINITE SAMPLES DAMN YOU WHY ARE YOU USING IT WHY THE SAME PEOPLE WHO CORRECTLY OBSERVE IT’S NOT EFFICIENT INSIST ON USING IT I DON’T KNOW
Ridge regression is more simply introduced and understood as Normal prior.
Regularization of histogram deconvolution in HEP is more simply understood as prior. (ref. Cowan)
Every regularization is more simply understood as prior I guess.
Simulated inference is more simply understood as Approximate Bayesian Computation, and also easier. Guess which field died? (Ok, they are not really the same thing, but a SI folk I met does think ABC is the Bayesian equivalent and that Bayesians stole their limelight)
Random effects models+accessories are more straightforward in every aspect in the Bayesian formulation. (Poor frequentist student: “Why do I have to use the Bayesian estimator in the frequentist model for such and such quantity? Why are there all these variants of the frequentist version and each fails badly in weird ways depending on what I’m doing? Why are there two types of uncertainty around? REM or not REM? (depends, are you doing a test?) p-value or halved p-value? (depends...) Do I have to worry about multiple comparisons?”)
Did you know that state of the art in causal inference is Bayesian? (BART, see ACIC challenge)
Did you know that Bayesian tree methods blow the shit out of frequentist ones? (BART vs. random forest) And as usual it’s easier to compute the uncertainties of anything.
If you are frequentist the multiple comparisons problem will haunt you, or more probably you’ll stop caring. Unless you have money, in which case you’ll hire a team of experts to deal with it. (Have you heard about the pinnacle of multiple testing correction theory, graphical alpha-propagation? A very sophisticate method, indeed, worth a look.)
Maybe you can make a case that Frequentism is still time-shorter because there are no real rules so you can pull a formula out of your hat, say lo!, and compute it, but I think this is stretching it. For example, you can decide you can take the arithmetic mean of your data because yes and be done. I’d say that’s not a fair comparison because you have to be at a comparable performance level for the K/T prior to weigh in, and if you want to get right the statistical properties of arbitrary estimators, it starts getting more complicated.
Epistemic status: Bayesian rant.
I don’t agree about Bayesian vs. Frequentist, in the sense that I think frequentist = complex+slow.
Right now, most common models can be set up Bayesianly in a probabilistic programming system like PyMC or Stan, and be fit much more comfortably than the frequentist equivalent. In particular, it’s easier and straightforward to extract uncertainties from the posterior samples.
When that was not the case, I still think that common models were more easily derived in the Bayesian framework, e.g., classic ANOVA up to Scheffé intervals (pages of proofs, usual contrivance of confidence intervals and multiple testing), versus doing the Bayesian version (prior x posterior = oh! it’s a Student! done.) (ref. Berger&Casella for the frequentist ANOVA)
If you complain “maximum likelihood easier than posterior”, I answer “Laplace approximation”, and indeed it’s metis that observed Fisher information better than expected Fisher information. In HEP with time they somehow learned empirically to plot contour curves of the likelihood. Bayes was within yourself all along.
If you say “IPW”, I answer IT’S NOT EFFICIENT IN FINITE SAMPLES DAMN YOU WHY ARE YOU USING IT WHY THE SAME PEOPLE WHO CORRECTLY OBSERVE IT’S NOT EFFICIENT INSIST ON USING IT I DON’T KNOW
Ridge regression is more simply introduced and understood as Normal prior.
Regularization of histogram deconvolution in HEP is more simply understood as prior. (ref. Cowan)
Every regularization is more simply understood as prior I guess.
Simulated inference is more simply understood as Approximate Bayesian Computation, and also easier. Guess which field died? (Ok, they are not really the same thing, but a SI folk I met does think ABC is the Bayesian equivalent and that Bayesians stole their limelight)
Random effects models+accessories are more straightforward in every aspect in the Bayesian formulation. (Poor frequentist student: “Why do I have to use the Bayesian estimator in the frequentist model for such and such quantity? Why are there all these variants of the frequentist version and each fails badly in weird ways depending on what I’m doing? Why are there two types of uncertainty around? REM or not REM? (depends, are you doing a test?) p-value or halved p-value? (depends...) Do I have to worry about multiple comparisons?”)
Did you know that state of the art in causal inference is Bayesian? (BART, see ACIC challenge)
Did you know that Bayesian tree methods blow the shit out of frequentist ones? (BART vs. random forest) And as usual it’s easier to compute the uncertainties of anything.
If you are frequentist the multiple comparisons problem will haunt you, or more probably you’ll stop caring. Unless you have money, in which case you’ll hire a team of experts to deal with it. (Have you heard about the pinnacle of multiple testing correction theory, graphical alpha-propagation? A very sophisticate method, indeed, worth a look.)
Maybe you can make a case that Frequentism is still time-shorter because there are no real rules so you can pull a formula out of your hat, say lo!, and compute it, but I think this is stretching it. For example, you can decide you can take the arithmetic mean of your data because yes and be done. I’d say that’s not a fair comparison because you have to be at a comparable performance level for the K/T prior to weigh in, and if you want to get right the statistical properties of arbitrary estimators, it starts getting more complicated.