Maximum Likelihood Estimation#
Maximum Likelihood Estimation (MLE) is a statistical method used to estimate the parameters of a probability distribution based on observed data \(\mathbf{x} = x_1, x_2, ..., x_n\). Evaluating the joint density of the data given a parametric family \(\theta\) gives the likelihood function, which is a function that measures how well the observed data fits the probability distribution with the given parameters:
Since the likelihood is computed from a product of the PDF evaluations for each data point, a PDF \(f\) that maximizes the likelihood is one that
has high densites around regions where many samples \(x_i\) cluster, but at the same time
has low but non-zero densities where few samples cluster. ince we take a product, a single zero-likelihood sample (\(f(x_i \mid \theta) = 0\)) can zero out the entire likelihood (\(L(\theta \mid \mathbf{x})\)).
The goal of MLE is consequently to find the values of the parameters for our chosen PDF \(f\) that maximize this likelihood function. The maximum likelihood estimate \(\hat{\theta}\) is the set of parameters for which the observed data is the most probable with the assumed probability distribution:
In practice, taking a product of many entries can quickly lead to numerical underflow issues. In consequence, it is more common to use the log-likelihood, which is the natural logarithm of the likelihood function:
where the product becomes a sum. To find the maximum likelihood estimate \(\hat{\theta}\), the following steps should be taken:
Define the likelihood function \(L(\theta \mid \mathbf{x})\).
Take natural logarithm of the likelihood function to get the log-likelihood function \(\ln (L(\theta \mid \mathbf{x}))\).
Differentiate the log-likelihood function and set it to zero: \(\cfrac{\partial L(\theta \mid \mathbf{x})}{\partial \theta} = 0\).
Solve the equation for \(\hat{\theta}\).
Note that the maximum can also occur at the boundary of the domain.
Let’s see it with an example!#
The following dataset describes the time elapsed between consecutive arrivals of passengers at a bus stop (in minutes):
Assume that the observations can be described by the exponential distribution, for which the PDF is given by
with \(\lambda\) being the parameter that has to be estimated. There dataset consists of \(n\) observations. Find the maximum likelihood estimate \(\hat{\lambda}\) following the four steps defined above:
Define the likelihood function \(L(\lambda \mid \mathbf{x})\).
Take natural logarithm of the likelihood function to get the log-likelihood function \(\ln (L(\lambda \mid \mathbf{x}))\).
Differentiate the log-likelihood function and set it to zero: \(\cfrac{\partial \ln L(\lambda \mid \mathbf{x})}{\partial \lambda} = 0\).
Solve the equation for \(\hat{\lambda}\).
1. Define the likelihood function \(L(\lambda \mid \mathbf{x})\).
According to the definition, the likelihood function is given by:
For the exponential distribution, \(f(x_i \mid \lambda) = \lambda e^{\normalsize -\lambda x}\).
Therefore, the likelihood function for the exponential distribution is given by:
2. Take natural logarithm of the likelihood function to get the log-likelihood function \(\ln (L(\lambda \mid \mathbf{x}))\).
Taking the natural logarithm of the likelihood function to obtain the log-likelihood function:
3. Differentiate the log-likelihood function and set it to zero: \(\cfrac{\partial \ln L(\lambda \mid \mathbf{x})}{\partial \lambda} = 0\)
Taking the derivative of the log-likelihood function with respect to \(\lambda\) and setting it to zero:
4. Solve the equation for \(\hat{\lambda}\).
We have the following equation to solve:
\(\cfrac{n}{\hat \lambda} - \sum_{i=1}^{n} x_i = 0\)
which results in the maximum likelihood estimate:
where \(\bar x = \cfrac{\sum_{i=1}^{n} x_i}{n}\) is the sample mean of the obsrved data.
This means that the maximum likelihood estimate of the exponential distribution is the inverse of the sample mean. Therefore:
The best fitting parameter of the Exponential distribution for the given data according to MLE is \(\lambda = 0.513\).
Attribution
This chapter was written by Patricia Mares Nasarre, Robert Lanzafame, and Max Ramgraber. Find out more here.