I was thinking about this a few weeks ago. The answer is your units are related to the probability measure, and care is needed. Here’s the context:
Let’s say I’m in the standard set-up for linear regression: I have a bunch of input vectors {→xi}i=1,…n∈Rk and for some unknown →μ∈Rk and σ2>0, the outputs yi are independent with distributions
yi∼→xi⋅→μ+N(0,σ2)
Let X denote the n×k matrix whose ith row is →xi, assumed to be full rank. Let ^μ denote the random vector corresponding to the fitted estimate of →μ using ordinary least squares linear regression and let s2 denote the sum of squared residuals. It can be shown geometrically that:
(informally, the density of →μ−^μs2 is that of the random variable corresponding to sampling a multivariate gaussian with mean →0∈Rk and covariance matrix (XtX)−1, then sampling an independent χ2n−k distribution and dividing by the result). A naive undergrad might misinterpret this as meaning that after observing →yand computing ^μ,s2:
But of course, this can’t be true in general because we did not even mention a prior. But on the other hand, this is exactly the family of conjugate priors/posteriors in Bayesian linear regression… so what possibly-improper prior makes this the posterior?
I won’t spoil the whole thing for you (partly because I’ve accidentally spent too much time writing this comment!) but start with just σ2 and s2 and:
Calculate the exact posterior density of σ2 desired in terms of χ2n−k
Use Bayes theorem to figure out the prior
I personally messed up several times on step 2 because I was being extremely naive about the “units” cancelling in Bayes theorem. When I finally made it all precise using measures, things actually cancelled properly and got the correct improper prior distribution on σ2,→μ.
(If anyone wants me to finish fleshing out the idea, please let me know).
I was thinking about this a few weeks ago. The answer is your units are related to the probability measure, and care is needed. Here’s the context:
Let’s say I’m in the standard set-up for linear regression: I have a bunch of input vectors {→xi}i=1,…n∈Rk and for some unknown →μ∈Rk and σ2>0, the outputs yi are independent with distributions
yi∼→xi⋅→μ+N(0,σ2)
Let X denote the n×k matrix whose ith row is →xi, assumed to be full rank. Let ^μ denote the random vector corresponding to the fitted estimate of →μ using ordinary least squares linear regression and let s2 denote the sum of squared residuals. It can be shown geometrically that:
s2σ2∼χ2n−k,→μ−^μσ2∼N(→0,(XtX)−1),→μ−^μs2∼N(→0,(XtX)−1)χ2n−k
(informally, the density of →μ−^μs2 is that of the random variable corresponding to sampling a multivariate gaussian with mean →0∈Rk and covariance matrix (XtX)−1, then sampling an independent χ2n−k distribution and dividing by the result). A naive undergrad might misinterpret this as meaning that after observing →y and computing ^μ,s2:
σ2∼s2χ2n−k,→μ∣σ2∼σ2N(^μ,(XtX)−1),→μ∼s2N(^μ,(XtX)−1)χ2n−k
But of course, this can’t be true in general because we did not even mention a prior. But on the other hand, this is exactly the family of conjugate priors/posteriors in Bayesian linear regression… so what possibly-improper prior makes this the posterior?
I won’t spoil the whole thing for you (partly because I’ve accidentally spent too much time writing this comment!) but start with just σ2 and s2 and:
Calculate the exact posterior density of σ2 desired in terms of χ2n−k
Use Bayes theorem to figure out the prior
I personally messed up several times on step 2 because I was being extremely naive about the “units” cancelling in Bayes theorem. When I finally made it all precise using measures, things actually cancelled properly and got the correct improper prior distribution on σ2,→μ.
(If anyone wants me to finish fleshing out the idea, please let me know).
Thanks for the effort, though unfortunately I’m not familiar with linear regression.