I’ve never seen any good general justification for parsimony...
This is a strange statement for a Bayesian to make. Perhaps he means that there is no reason to require absolute parsimony, which is true; sometimes if you have enough data you can justify the use of complex models. But Bayesian methods certainly require relative parsimony, in the sense that the model complexity needs to be small compared to the quantity of information being modeled. Formally, let A be the entropy of the prior distribution, and B be the mutual information between the observed data and the model parameter(s). Then unless A is small compared to B (relative parsimony), Bayesian updates won’t substantially shift belief away from the prior, and the posterior will be just a minor modification of the prior, so the whole process of obtaining data and performing inference will have produced no actual change in belief.
The difference between the MDL philosophy and the Bayesian philosophy is actually quite minor. There are some esoteric technical arguments about things like whether one method or the other converges in the limit of infinite data, but at the end of the day the two philosophies say almost exactly the same thing.
Bayesian methods certainly require relative parsimony, in the sense that the model complexity needs to be small compared to the quantity of information being modeled.
Not really. Bayesian methods can model random noise. Then the model is of the same size as the data being modeled.
This is a strange statement for a Bayesian to make. Perhaps he means that there is no reason to require absolute parsimony, which is true; sometimes if you have enough data you can justify the use of complex models. But Bayesian methods certainly require relative parsimony, in the sense that the model complexity needs to be small compared to the quantity of information being modeled. Formally, let A be the entropy of the prior distribution, and B be the mutual information between the observed data and the model parameter(s). Then unless A is small compared to B (relative parsimony), Bayesian updates won’t substantially shift belief away from the prior, and the posterior will be just a minor modification of the prior, so the whole process of obtaining data and performing inference will have produced no actual change in belief.
The difference between the MDL philosophy and the Bayesian philosophy is actually quite minor. There are some esoteric technical arguments about things like whether one method or the other converges in the limit of infinite data, but at the end of the day the two philosophies say almost exactly the same thing.
Not really. Bayesian methods can model random noise. Then the model is of the same size as the data being modeled.