notfnofn comments on Mistakes people make when thinking about units

notfnofn 25 Jun 2024 19:08 UTC
1 point
0
I was thinking about this a few weeks ago. The answer is your units are related to the probability measure, and care is needed. Here’s the context:
Let’s say I’m in the standard set-up for linear regression: I have a bunch of input vectors ${{\to x}_{i}}_{i = 1, \dots n} \in R^{k}$ and for some unknown $\to μ \in R^{k}$ and $σ^{2} > 0,$ the outputs $y_{i}$ are independent with distributions
$y_{i} \sim {\to x}_{i} \cdot \to μ + N (0, σ^{2})$
Let $X$ denote the $n \times k$ matrix whose $i$ th row is ${\to x}_{i}$ , assumed to be full rank. Let $^μ$ denote the random vector corresponding to the fitted estimate of $\to μ$ using ordinary least squares linear regression and let $s^{2}$ denote the sum of squared residuals. It can be shown geometrically that:
$\frac{s^{2}}{σ^{2}} \sim χ_{n - k}^{2}, \frac{\to μ -^μ}{σ^{2}} \sim N (\to 0, (X^{t} X)^{- 1}), \frac{\to μ -^μ}{s^{2}} \sim \frac{N (\to 0, (X^{t} X)^{- 1})}{χ_{n - k}^{2}}$
(informally, the density of $\frac{\to μ -^μ}{s^{2}}$ is that of the random variable corresponding to sampling a multivariate gaussian with mean $\to 0 \in R^{k}$ and covariance matrix $(X^{t} X)^{- 1}$ , then sampling an independent $χ_{n - k}^{2}$ distribution and dividing by the result). A naive undergrad might misinterpret this as meaning that after observing $\to y$ and computing $^μ, s^{2}$ :
$σ^{2} \sim \frac{s^{2}}{χ_{n - k}^{2}}, \to μ ∣ σ^{2} \sim σ^{2} N (^μ, (X^{t} X)^{- 1}), \to μ \sim \frac{s^{2} N (^μ, (X^{t} X)^{- 1})}{χ_{n - k}^{2}}$
But of course, this can’t be true in general because we did not even mention a prior. But on the other hand, this is exactly the family of conjugate priors/posteriors in Bayesian linear regression… so what possibly-improper prior makes this the posterior?
I won’t spoil the whole thing for you (partly because I’ve accidentally spent too much time writing this comment!) but start with just $σ^{2}$ and $s^{2}$ and:
1. Calculate the exact posterior density of $σ^{2}$ desired in terms of $χ_{n - k}^{2}$
2. Use Bayes theorem to figure out the prior
I personally messed up several times on step 2 because I was being extremely naive about the “units” cancelling in Bayes theorem. When I finally made it all precise using measures, things actually cancelled properly and got the correct improper prior distribution on $σ^{2}, \to μ$ .
(If anyone wants me to finish fleshing out the idea, please let me know).
- cubefox 26 Jun 2024 23:23 UTC
  2 points
  0
  Parent
  Thanks for the effort, though unfortunately I’m not familiar with linear regression.