In case it hasn’t crossed your mind, I personally think it’s helpful to start in the setting of estimating the true mean μ of a data stream. A very natural choice estimator for μ is the sample mean of the xi, which I’ll denote ^μ. This can equivalently be formulated as the minimizer of ∑(xi−^μ)2.
Others have mentioned the normal distribution, but this feels secondary to me. Here’s why—let’s say xi∼σf(x−μσ), where f(x) is a known continuous probability distribution with mean 0 and variance 1, and μ,σ are unknown. So the distribution of each xi has mean μ and variance σ2 (and assume independence).
What must f(x) be for the sample mean ^μ to be the maximum likelihood estimator of μ? Gauss proved that it must be 12√πe−x2/2, and intuitively it’s not hard to see why it would have to be of the form aebx2.
So from this perspective, MSE is a generalization of taking the sample mean, and asking the linear model to have gaussian errors is necessary to formally justify MSE through MLE.
Replace sample mean with sample median and you get the mean absolute error.
In case it hasn’t crossed your mind, I personally think it’s helpful to start in the setting of estimating the true mean μ of a data stream. A very natural choice estimator for μ is the sample mean of the xi, which I’ll denote ^μ. This can equivalently be formulated as the minimizer of ∑(xi−^μ)2.
Others have mentioned the normal distribution, but this feels secondary to me. Here’s why—let’s say xi∼σf(x−μσ), where f(x) is a known continuous probability distribution with mean 0 and variance 1, and μ,σ are unknown. So the distribution of each xi has mean μ and variance σ2 (and assume independence).
What must f(x) be for the sample mean ^μ to be the maximum likelihood estimator of μ? Gauss proved that it must be 12√πe−x2/2, and intuitively it’s not hard to see why it would have to be of the form aebx2.
So from this perspective, MSE is a generalization of taking the sample mean, and asking the linear model to have gaussian errors is necessary to formally justify MSE through MLE.
Replace sample mean with sample median and you get the mean absolute error.