I think what it boils down to is that in 1 dimension, the mean / expected value is a really useful quantity, and you get it by minimizing squared error, whereas the absolute error gives the median, which is still useful, but much less so than the mean. (The mean is one of the moments of the distribution, (the first moment), while the median isn’t. Rational agents maximize expected utility, not median utility, etc. Even the M in MAE still stands for “mean”.) Plus, although algorithmic considerations aren’t too important for small problems; in large problems the fact that least squares just boils down to solving a linear system is really useful, and I’d guess that in almost any large problem, the least squares solution is much faster to obtain than the least absolute error solution.
I think what it boils down to is that in 1 dimension, the mean / expected value is a really useful quantity, and you get it by minimizing squared error, whereas the absolute error gives the median, which is still useful, but much less so than the mean. (The mean is one of the moments of the distribution, (the first moment), while the median isn’t. Rational agents maximize expected utility, not median utility, etc. Even the M in MAE still stands for “mean”.) Plus, although algorithmic considerations aren’t too important for small problems; in large problems the fact that least squares just boils down to solving a linear system is really useful, and I’d guess that in almost any large problem, the least squares solution is much faster to obtain than the least absolute error solution.