4.5. Autocovariance function#

Before we can look into the modelling of a stochastic process for instance using an Autoregressive (AR) model, we first need to introduce the autocovariance function (ACF) for a stationary time series, and describe the relationship between the ACF and the power spectral density (PSD).

As in Observation theory, the variance component is often determined based on the precision of an observable (at a given epoch), and the covariance components quantitatively indicate the statistical dependence between observables. In this case, dependence is inherently introduced by the physical process that produces the time series, and in fact our time series methods seek to (mathematically) account for this.

Autocovariance and autocorrelation#

Let us assume an arbitrary stationary (discrete) time series, \(S=[S_1,S_2,...,S_m]^T\), with mean \(\mathbb{E}(S_{i})=\mu\) and variance \(Var(S_{i})=\sigma^2\) for \(i=1,\ldots,m\). Remember that stationarity implies that the statistical properties of the time series do not depend on the time at which it is observed, i.e. expectation and variance are constant over time.

The formal (or: theoretical) autocovariance is defined as

\[ Cov(S_{t+\tau}, S_t) =\mathbb{E}(S_{t+\tau}S_t)-\mu^2 =c_{\tau} \]

We have that \(Cov(S_{t+\tau}, S_t) =Cov(S_t, S_{t-\tau})\).

The autocovariance depends on \(\tau\), which is referred to as lag; it is the difference in time between \(t+\tau\) and \(t\), and equally the difference between \(t\) and \(t-\tau\). In the sequel we use \(\tau\) both to indicate the time shift in unit of time (e.g. seconds) and the corresponding discrete time index, as at the start of this chapter it was for convenience assumed that \(\Delta t = 1\), and hence \(t_i = i\).

Exercise covariance

Show that the covariance can be written as:

\[Cov(S_{t+\tau}, S_t) = \mathbb{E}(S_{t+\tau}S_t)-\mu^2 =c_{\tau}\]
Exercise covariance

Prove that \(Cov(S_t, S_{t-\tau}) =Cov(S_{t+\tau}, S_t)\):

The formal autocorrelation is defined as

\[ r_{\tau} = \mathbb{E}(S_{t+\tau}S_t) \]

and \(r_{\tau} = r_{-\tau}\).

Note

When we have a zero-mean time series, \(\mu=0\), it follows that \(c_{\tau}=r_{\tau}\)

Empirical autocovariance#

The autocovariance function of a time series is not known beforehand, and hence needs to be estimated based on the actual observed values. The least-squares method or maximum likelihood method can be used to estimate this empirical (or sample) autocovariance function of a time series. Let us see how!

Least-squares estimation

For a given stationary time series \(S = [S_1,S_2,...,S_m]^T\), the least-squares estimator of the autocovariance function is given by

\[\ \hat{C}_{\tau} = \frac{1}{m-\tau}\sum_{i=1}^{m-\tau}(S_{i+\tau}-\mu)(S_{i}-\mu), \hspace{25px} \tau=0,1,...,m-1\]

and is an unbiased estimator \(E(\hat{C}_{\tau}) = c_\tau\).

The least-squares estimator of autocorrelation (also called empirical autocorrelation function) is then

\[ \hat{R}_{\tau}=\frac{1}{m-\tau}\sum_{i=1}^{m-\tau}S_{i+\tau} S_{i}, \hspace{25px} \tau=0,1,...,m-1 \]

Maximum likelihood estimation

The maximum likelihood estimator of autocovariance is given by

\[ \hat{C}_{\tau} = \frac{1}{m}\sum_{i=1}^{m-\tau}(S_{i+\tau}-\mu)(S_i-\mu), \hspace{25px} \tau=0,1,...,m-1 \]

Note that this is a biased estimator, \(\mathbb{E}(\hat{C}_{\tau})\neq c_{\tau}\).

Similarly, the maximum likelihood estimate of the autocorrelation follows as:

\[ \hat{R}_{\tau}=\frac{1}{m}\sum_{i=1}^{m-\tau}S_{i+\tau} S_{i}, \hspace{25px} \tau=0,1,...,m-1 \]

Note practically, for large \(m\), and not too large \(\tau\), there is little difference in using \(\frac{1}{m-\tau}\) and \(\frac{1}{m}\).

Note

Here we use capitals for \(\hat{C}_{\tau}\) and \(\hat{R}_{\tau}\) since estimators are always a function of the random observables \(S_t\).

Note

If mean \(\mu\) is not known, it can be replaced in practice by the empirical mean in the above expressions for the empirical autocovariance function.

Covariance matrix based on autocovariance#

The structure of a covariance matrix for a stationary time series is purely symmetric (mirrored in the diagonal) and it looks like

\[\begin{split} \Sigma_{S} = \begin{bmatrix} \sigma^2 & c_1 & c_2 & \dots & c_{m-1}\\ c_1 & \sigma^2 & c_1 & \ddots & \vdots \\ c_2 & c_1 & \sigma^2 & \ddots & c_2 \\ \vdots & \ddots & \ddots & \ddots & c_1\\ c_{m-1} & & c_2 & c_1 & \sigma^2\end{bmatrix}\end{split}\]

There are \(m\) (co)variance components - one variance component, \(\sigma^2 = c_0\), and \(m-1\) covariance components, \(c_i\) for \(i=1,\ldots,m-1\).

Normalized ACF#

The normalized autocovariance estimator can directly be obtained from the autocovariance estimator as

\[ \hat{\rho}_{\tau} = \frac{\hat{C}_{\tau}}{\hat{C_0}}, \hspace{20px}\tau = 0,...,m-1 \implies \hat{\rho}_0 = 1 \]

and \(|\hat{\rho}_{\tau}| \leq 1\) for all \(\tau\) (using the biased ACF \(\hat{C}_{\tau}\), which is the default implemented in Python)

Note

The estimated normalized autocovariance is the same as the time dependent Pearson correlation coefficient.

In literature \(\hat{\rho}_{\tau}\) is sometimes referred to as the autocorrelation (coefficient) function.

The variance of the normalized ACF can, for large \(m\), be approximated as

\[ \sigma_{\hat{\rho}_{\tau}}^2 = \frac{1}{m} (1 + 2 \sum_{i=1}^{\infty} \rho^2_{i}) \]

for processes with decaying autocorrelation function (Box et al., 2015) (Eq. (2.1.15)). The variance can be used, only for larger values of \(\tau\), as a check whether \(\rho_{\tau}\) is effectively zero beyond a certain lag. In practice one can substitute \(\hat{\rho}\) for \(\rho\).

If \(m\) is sufficiently large, \(\hat{\rho}_{\tau}\) is approximately normally distributed:

\[ \hat{\rho}_{\tau} \sim N(\rho_{\tau},\sigma^2_{\hat{\rho}_{\tau}}) \]

Though it is possible to compute \(\hat{\rho}_{\tau}\) for up to \(\tau=m-1\), it is recommended in practice to limit \(\tau\) for instance to at most 10 percent of \(m\).

Worked example#

Let us consider a time series of \(m=100\) observations, such that

\[ \hat{\rho}_1 = 0.4 \]

and \(\hat{\rho}_i \approx 0\) for \(i>1\).

We know that

\[ \hat{\rho}_1 \sim N(\rho_1,\sigma^2_{\hat{\rho}_1}) \]

with \(\rho_1\) unknown and

\[ \sigma^2_{\hat{\rho}_1}=\frac{1}{m}+\frac{2\hat{\rho}^2_1}{m}=0.0132\implies\sigma_{\hat{\rho}_1}=0.1149 \]

We will now apply a test whether the estimated autocorrelation is significant. The null hypothesis assumes there is no correlation:

  • \(\mathcal{H}_0\): \(\rho_1=0\)

  • \(\mathcal{H}_a\): \(\rho_1 \neq 0\)

Since we know the distribution of \(\hat{\rho}_1\), a suitable test statistic would be:

\[ T = \frac{\hat{\rho}_1}{\sigma_{\hat{\rho}_1}} \sim N(0,1) \]

where we would reject \(\mathcal{H}_0\) if \(|T|>k_{\alpha}\). With a false alarm rate of \(\alpha = 0.01\), we find that the critical value can be obtained from the table of the standard normal distribution. Note that we have a 2-sided critical region, hence we need to look up the value for \(0.5\alpha\).

In this example, we obtain:

\[ (T = 3.48 ) > (k_{\alpha}=2.58) \]

and hence the null hypothesis is rejected, implying that the autocorrelation at lag \(\tau = 1\) is significant.

Exercise

A zero-mean stationary noise process consists of \(m=5\) observations:

\[ s = \begin{bmatrix} 2 & 1 & 0 & -1 & -2 \end{bmatrix}^T \]

What is the least-squares estimate of the normalized ACF at \(\tau=1\); so compute \(\hat{\rho}_{1}\)?

Attribution

This chapter was written by Alireza Amiri-Simkooei, Christiaan Tiberius and Sandra Verhagen. Find out more here.