Well, there’s a couple of issues here: first, logP(data|model) is a concave function for logistic regression, so unless logP(model) is also concave, the maximization may not reach the global optimum.
Secondly, the proper Bayesian thing to do would be to sample from the posterior, not maximize; for instance, in logistic regression the model is given by a vector of parameters denoted by theta. Suppose that we actually believed that the prior on theta was exp(-|theta|), where |theta| is the sum of the absolute values of the coordinates of theta. Then maximizing P(model|data) in this case will tend to give you solutions where most of the entries of theta are equal to 0, whereas the actual posterior places zero probability mass on such solutions.
On the second point—fair enough, though even under Bayes it’s sometimes reasonable to want a single answer on account of you only get to actually do one thing.
If you have that prior and you maximize P(model|data) on solutions with a zero probability mass on either P(data|model) or P(model), you’re screwing up multiplication.
Well, the point is that if you have a continuous-space, then the maximum-likelihood solution will have zero entries with positive probability, but the posterior probability of a zero entry is 0.
How? If any of the probabilities that the posterior probability factors into are zero, the product is also zero. Or do you just mean that since data are unlimited precision in a continuous space, no answer can ever have a positive probability because it’s infinitely unlikely?
Well, there’s a couple of issues here: first, logP(data|model) is a concave function for logistic regression, so unless logP(model) is also concave, the maximization may not reach the global optimum.
Secondly, the proper Bayesian thing to do would be to sample from the posterior, not maximize; for instance, in logistic regression the model is given by a vector of parameters denoted by theta. Suppose that we actually believed that the prior on theta was exp(-|theta|), where |theta| is the sum of the absolute values of the coordinates of theta. Then maximizing P(model|data) in this case will tend to give you solutions where most of the entries of theta are equal to 0, whereas the actual posterior places zero probability mass on such solutions.
On the second point—fair enough, though even under Bayes it’s sometimes reasonable to want a single answer on account of you only get to actually do one thing.
If you have that prior and you maximize P(model|data) on solutions with a zero probability mass on either P(data|model) or P(model), you’re screwing up multiplication.
Well, the point is that if you have a continuous-space, then the maximum-likelihood solution will have zero entries with positive probability, but the posterior probability of a zero entry is 0.
How? If any of the probabilities that the posterior probability factors into are zero, the product is also zero. Or do you just mean that since data are unlimited precision in a continuous space, no answer can ever have a positive probability because it’s infinitely unlikely?