Forecasting example

Forecasting example#

In this example we aim to predict a future value of a time series characterized by an intercept $y_0$ and a rate $r$. The observation equation is given by

\[ Y(t) = y_0 + r t + \epsilon(t) \]

We will follow the steps from Chapter 4.8.

Step 0: Load data and plot#

In the following code we plot the time series. From the plot, we can clearly identify a linear trend. The observations are taken annually starting from 1900 to 2000. Our aim is to estimate the value of the time series in the year 2001. Time $t$ is expressed in years.

# Import necessary libraries

import numpy as np
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf

y0 = 100 # intercept
r = 2 # slope

# Parameters for the AR(1) process
sigma = 11          # Standard deviation of the noise
n = 101            # Length of the time series

# Initialize the process
np.random.seed(42)  # For reproducibility
noise_AR = np.zeros(n)
noise_AR[0] = np.random.normal(0, sigma)  # Initial value
t = np.arange(1900, 2001, 1)

# Generate the AR(1) process and noise series
noise = np.random.normal(0, sigma, n)  # Pre-generate noise for the second plot
phi = 0.8
for lag in range(1, n):
        noise_AR[lag] = phi * noise_AR[lag-1] + noise[lag]  # Use pre-generated noise

# Add intercept and linear trend
data = y0 + r*t + noise_AR

# Plot the time series
plt.figure(figsize=(12, 3))
plt.plot(t, data)
plt.title("Time series")
plt.xlabel("Time (year)")
plt.ylabel("Y(t)")
plt.grid(True)

../_images/e390834122bca90ce6880cc8d8f6f78c86d48adaa3f3f6d795fab8a6a0711e3e.png

Step 1: Estimate the signal of interest#

To estimate the linear trend and offset we can use the unweighted Least Squares estimator since we don’t know the covariance matrix of the data. This is equivalent to fitting a linear model to the time series.

The observation equation is written as:

\[ Y(t) = y_0 + r t + \epsilon(t). \]

Therefore, the linear functional model for the $101$ observables is expressed as:

\[\begin{split} \mathbb{E}(\underset{\mathrm{Y}}{\underbrace{\begin{bmatrix} Y_1 \\ \vdots \\ Y_{101} \end{bmatrix}}} )= \underset{\mathrm{A}}{\underbrace{\begin{bmatrix} 1 & t_1 \\ \vdots & \vdots \\ 1 & t_{101} \end{bmatrix}}}\underset{\mathrm{x}}{\underbrace{\begin{bmatrix} y_0 \\ r \end{bmatrix}}}, \end{split}\]

and the least-squares solution of the unknown parameter vector $\mathrm{x}$ is obtained as:

\[ \mathrm{\hat{X}= (A^T A)^{-1} A^T \mathrm{Y}}. \]

The time series without the linear trend and offset is simply the residuals:

\[ \mathrm{\hat{\epsilon}}= \mathrm{Y} - A\mathrm{\hat{X}}. \]

In the following code we estimate the linear trend and offset and remove them from the time series.

A = np.column_stack((np.ones_like(t), t))
xhat = np.linalg.solve(A.T @ A, A.T @ data)
yhat = A @ xhat
ehat = data - yhat

fig, axs = plt.subplots(2, 1, figsize=(12, 5))

# Plot data and fit
axs[0].plot(t, data, label='Data')
axs[0].plot(t, yhat, label='Fit', linestyle='--')
axs[0].set_xlabel('Time (year)')
axs[0].set_ylabel('Y(t)')
axs[0].set_title('Data and Fit')
axs[0].legend()
axs[0].grid()

# Plot residuals
axs[1].plot(t, ehat, label='Residuals', color='red')
axs[1].set_xlabel('Time (year)')
axs[1].set_ylabel('$\mathrm{\hat{\epsilon}}$')
axs[1].set_title('Residuals')
axs[1].legend()
axs[1].grid()

plt.tight_layout()
plt.show()

../_images/202ff29e9122c62038405e6bfe67f6c9b96176db77f5a0e9318a9c023d20cc87.png

In the above plots the blue line represents the observed time series $\mathrm{Y}$, the dashed orange line is the fitted line using the previously defined functional model $\hat{\mathrm{Y}} = \mathrm{A}\hat{\mathrm{X}}$, and the red line represents the residuals $\mathrm{\hat{\epsilon}}$. After detrending the time series, we can conclude that the residuals are stationary.

To be able to predict future values, we need to estimate the noise process. For this reason, in the next step we analyse the ACF of the residuals, to see if the noise process is uncorrelated

The next code, plots the residuals and its ACF:

# Create the 2x1 subplot
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 6), sharex=False)

# Plot residuals
ax1.plot(ehat,color='red')
ax1.set_ylabel('$\mathrm{\hat{\epsilon}}$')
ax1.set_title("Residuals")
ax1.set_xlabel("Time")
ax1.legend()
ax1.grid(True)

# Plot ACF
lags = 20
plot_acf(ehat, ax=ax2, lags=lags)
ax2.set_xlabel("Lag")
ax2.set_ylabel("ACF Value")
ax2.set_title("ACF of the residuals")
ax2.set_xlim(-0.5, lags)
ax2.set_xlabel("Lag")
ax2.set_ylabel("ACF")
ax2.grid(True)

# Display the plot
plt.tight_layout()
plt.show()

No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.

../_images/5169a8fabd9f8e243c7ab2acf60de35ab68127329d0a8f01659f8efeda3bb851.png

From the ACF of the residuals, we observe that they are time-correlated, with notably high values at the second lag and beyond. In the next step we model the residuals using the AR(1) model, and estimate the coefficient $\phi$.

Step 2: Model the noise using the AR(1) model#

To model the noise, we use the stationary time series $S_t = \hat{\epsilon}$, where $\hat{\epsilon}=Y-A\hat{X}$ are the residuals obtained after detrending the time series.

\[ S_{t} = \phi_1 S_{t-1} + \epsilon_t \]

and $\epsilon_t$ is the zero mean new white noise term in the present step of the AR(1)-recursion.

The model for estimating the AR(1)-coefficient $\phi$ reads $$ \underset{S}{\underbrace{\begin{bmatrix} S_2 \\ \vdots \\ S_{101} \end{bmatrix}}} = \underset{A}{\underbrace{\begin{bmatrix} S_1 \\ \vdots \\ S_{100} \end{bmatrix}}} \phi, $$

and the least-squares solution of the unknown parameter $\phi$ is obtained as:

\[ \hat{\phi} = (A^T A)^{-1} A^T S. \]

y = ehat[1:]
s1 = ehat[0:-1]

print(len(y))
print(len(s1))

# AR(1) model
X1 = np.column_stack((s1)).T
phi_ar1 = np.linalg.inv(X1.T @ X1) @ (X1.T @ y)
e1 = y - X1 @ phi_ar1

# Print AR(1) residuals
fig, ax = plt.subplots(2,1,figsize=(12, 6))
ax[0].plot(e1)
ax[0].set_title('AR(1) Residuals')
ax[0].set_ylabel('$\mathrm{\hat{\epsilon}}$')
ax[0].grid(True)

# Print ACF
plot_acf(e1, lags=40, ax=ax[1])
ax[1].set_xlabel("Lag")
ax[1].set_ylabel("ACF")
ax[1].grid(True)

fig.tight_layout()

# print the AR(1) coefficient
print('AR(1) Coefficient:')
print('Phi_1 = ', round(phi_ar1[0], 4))

100
100
AR(1) Coefficient:
Phi_1 =  0.787

../_images/8e12f58dc4ca28df3acdb4af09514a2b1a3c5dd90d10939c25ada8d76b409a62.png

After fitting the AR(1) model, we observe little to no correlation between lagged observations.

Step 3: Predict the signal-of-interest for year 2001#

Based on the previously estimated parameters $\mathrm{\hat{X}}$, we can predict the signal-of-interest for year 2001 $\mathrm{\hat{Y}_{2001}}$ following the functional relation:

\[\begin{split} \mathrm{\hat{Y}_{2001}}= \underset{A_p}{\underbrace{\begin{bmatrix} 1 & 2001 \end{bmatrix}}} \underset{\hat{\mathrm{X}}}{\underbrace{\begin{bmatrix} \hat{y}_0 \\ \hat{r} \end{bmatrix}}}. \end{split}\]

Ap = np.column_stack(([1], [2001]))
Y_signal_2001 =  Ap @ xhat

print('Y_signal = ', Y_signal_2001)

Y_signal =  [4098.68613485]

Step 4: Predict the noise for year 2001#

To predict future values of a time series, we must account not only for the signal-of-interest, but also for the noise component. We can predict the noise component at year 2001 using:

\[ \mathrm{\hat{\epsilon}_{2001}}= \Sigma_{Y_pY}\Sigma_Y^{-1}\hat{\epsilon} \]

where $\Sigma_{Y_pY}$ is the covariance matrix between the future values $Y_p$ and the observed values $Y$. Vector $\hat{\epsilon}$ contains the residuals of the observed data points of the time series, and $\hat{\epsilon}_{2001}$ is the (predicted) residual of the predicted data point.

In the case of an AR(1) process, using the recurrence of Step 2, with the trivial prediction of the zero mean white noise term $\hat{\epsilon}_{t} = 0$ for $t=2100$, this expression simplifies to:

\[ \mathrm{\hat{\epsilon}_{2001}}= \phi_1 \hat{\epsilon}_{2000} \]

with $\phi_1$ obtained from the normalized ACF of the residuals at lag $\tau = 1$.

It can be shown that both expressions are equivalent for an AR(1) process.

e_2001 =  phi_ar1 * ehat[-1]

print('e_2001 = ', e_2001)

e_2001 =  [-14.47939059]

Step 5: Predict the time series for year 2001#

Finally, the time series for year 2001 is predicted using the predicted signal-of-interest ($\hat{Y}_{signal} = A_p \hat{X}$) in Step 3, and the predicted residual ($\hat{\epsilon}_p$) in Step 4:

\[\begin{split} \mathrm{\hat{Y}_{2001}}= \underset{A_p}{\underbrace{\begin{bmatrix} 1 & 2001 \end{bmatrix}}} \underset{\hat{\mathrm{X}}}{\underbrace{\begin{bmatrix} \hat{y}_0 \\ \hat{r} \end{bmatrix}}} + \hat{\epsilon}_p. \end{split}\]

Y_2001 = Y_signal_2001 + e_2001

# Plot the time series
plt.figure(figsize=(12, 3))
plt.plot(t, data, label="Observed time series")
plt.plot(2001, Y_2001, 'ro', label="Predicted value at year 2001")  # red point  # red circle marker
plt.title("Time series")
plt.xlabel("Time (year)")
plt.ylabel("Y(t)")
plt.legend()
plt.grid(True)

print('Y(2001) = ',Y_2001)

Y(2001) =  [4084.20674426]

../_images/13272582cd55d9638ad5e5b351972e832235844ff6533cd2ea8e850889f4a81e.png

In this example, we predicted a time series one step ahead by modelling the noise simply as an AR(1) process. One can also predict multiple time steps ahead, and one can adopt a more sophisticated model, such as an AR(p) with $p>1$. While this example was limited to $p = 1$, the underlying concepts generalize to any value of $p$.

Attribution

This chapter was written by Alireza Amiri-Simkooei, Christiaan Tiberius and Sandra Verhagen. Find out more here.