P. comments on Why square errors?

P. 26 Nov 2022 14:34 UTC
9 points
4
Doesn’t minimizing the L1 norm correspond to performing MLE with laplacian errors?
- gjm 26 Nov 2022 19:29 UTC
  6 points
  2
  Parent
  Yes. I’m not sure where the thing about “uniformly distributed errors” comes from in Chai & Draxler; they don’t explain it. I think it’s just an error (it looks as if they are atmospheric scientists of some sort rather than mathematicians or statisticians).
  If your model of errors is, say, uniform between −1 and +1, then a good regression line is one that gets within a vertical distance of 1 unit of all your points, and any such is equally good. If you think your errors are uniformly distributed but don’t know the spread, then (without thinking about it much; I could be all wrong) I think the best regression line is the one that minimizes the worst error among all your data points; i.e., L-infinity regression. L1/MAE is right for Laplacian errors, L2/MSE is right for normally distributed errors.
  [EDITED to add:] Each of these models also corresponds to a notion of “average”: you want to pick a single true value and maximize the likelihood of your data. Normal errors ⇒ arithmetic mean. Laplacian errors ⇒ median. Uniform errors with unknown spread ⇒ (with the same caveat in the previous paragraph) half-way between min and max. Uniform errors between -a and +a ⇒ any point that’s >= max-a and ⇐ min+a, all such points (if there are any; if not, you’ve outright refuted your model of the errors) equally good.
- Donald Hobson 27 Nov 2022 19:32 UTC
  2 points
  0
  Parent
  Yep.
- Aprillion 26 Nov 2022 14:38 UTC
  1 point
  0
  Parent
  [EDITED]: good point, no idea what they meant with “uniform” distribution, the realization for me was about the connection that I can often assume errors are normally distributed, thus L2 is often the obvious choice