Maximum Likelihood Estimation#
Maximum Likelihood Estimation (MLE) is a statistical method used to estimate the parameters of a probability distribution based on observed data \(\mathbf{x} = x_1, x_2, ..., x_n\). Evaluating the joint density of the data given a parametric family \(\theta\) gives the likelihood function, which is a function that measures how well the observed data fits the probability distribution with the given parameters:
The goal of MLE is to find the values of the parameters that maximize the likelihood function. The maximum likelihood estimate \(\hat{\theta}\) is the set of parameters for which the observed data is the most probable with the assumed probability distribution:
In practice, it is more common to use the log-likelihood, which is the natural logarithm of the likelihood function:
To find the maximum likelihood estimate \(\hat{\theta}\), the following steps should be taken:
Define the likelihood function \(L(\theta \mid \mathbf{x})\).
Take natural logarithm of the likelihood function to get the log-likelihood function \(\ln (L(\theta \mid \mathbf{x}))\).
Differentiate the log-likelihood function and set it to zero: \(\cfrac{\partial L(\theta \mid \mathbf{x})}{\partial \theta} = 0\).
Solve the equation for \(\hat{\theta}\).
Note that the maximum can also occur at the boundary of the domain.
Let’s see it with an example!#
The following dataset describes the time elapsed between consecutive arrivals of passengers at a bus stop (in minutes):
Assume that the observations can be described by the exponential distribution, for which the PDF is given by
with \(\lambda\) being the parameter that has to be estimated. There dataset consists of \(n\) observations. Find the maximum likelihood estimate \(\hat{\lambda}\) following the four steps defined above:
Define the likelihood function \(L(\lambda \mid \mathbf{x})\).
Take natural logarithm of the likelihood function to get the log-likelihood function \(\ln (L(\lambda \mid \mathbf{x}))\).
Differentiate the log-likelihood function and set it to zero: \(\cfrac{\partial L(\lambda \mid \mathbf{x})}{\partial \lambda} = 0\).
Solve the equation for \(\hat{\lambda}\).
1. Define the likelihood function \(L(\lambda \mid \mathbf{x})\).
According to the definition, the likelihood function is given by:
For the exponential distribution, \(f(x_i \mid \lambda) = \lambda e^{\normalsize -\lambda x}\).
Therefore, the likelihood function for the exponential distribution is given by:
2. Take natural logarithm of the likelihood function to get the log-likelihood function \(\ln (L(\lambda \mid \mathbf{x}))\).
Taking the natural logarithm of the likelihood function to obtain the log-likelihood function:
3. Differentiate the log-likelihood function and set it to zero: \(\cfrac{\partial L(\lambda \mid \mathbf{x})}{\partial \lambda} = 0\)
Taking the derivative of the log-likelihood function with respect to \(\lambda\) and setting it to zero:
4. Solve the equation for \(\hat{\lambda}\).
We have the following equation to solve:
\(\cfrac{n}{\hat \lambda} - \sum_{i=1}^{n} x_i = 0\)
which results in the maximum likelihood estimate:
where \(\bar x = \cfrac{\sum_{i=1}^{n} x_i}{n}\) is the sample mean of the obsrved data.
This means that the maximum likelihood estimate of the exponential distribution is the inverse of the sample mean. Therefore:
The best fitting parameter of the Exponential distribution for the given data according to MLE is \(\lambda = 0.513\).
A video with further examples#
In the following video, you have an explanation on MLE together with 3 examples with different continuous distribution functions.
Attribution
This chapter was written by Patricia Mares Nasarre and Robert Lanzafame. Find out more here.