[statistics] An introduction to maximum likelihood estimationl: what?, why?, when?, how?

First we require a short introduction to Bayesian statistics.

What.

Because describing a multidimensional distribution is often difficult, it is useful to have crude tools for doing this. Maximum likelihood estimation is one such tool. Maximum likelihood is the process of finding the locally most likely point in the space of hypotheses under consideration.

In practice, we almost always find the maximum of the logarithm of the probability density function. Since probabilities are always larger than zero, this is always defined, and since the logarithm is a monotonic transformation, it does not affect the location of the maximum.

Taking the logarithm also makes the function of interest linear in an important way, the statistically independent parts of the distribution become linear:

Least-squares is a simple, powerful and common special case of maximum likelihood estimation.