Notebook exercise: fitting different models#
In this interactive notebook you will fit several models to a time series of height observations of a point on a glacier, to assess whether it is melting.
Click –> Live Code on the top right corner of this screen and then wait until all cells are executed.
Learning objectives:
set-up an observation model
apply least-squares estimation
assess the estimation results considering redundancy, squared norm of residuals, under- and overfitting
You have m
= 12 monthly measurements of the height of a point on a glacier. The measurements are obtained from a satellite laser altimeter.
Time [months]:
t
\(=[0, 1, 2, \ldots, 11]\)Observed heights [meters]:
y
\(=[102.4, 98.2, 97.5, 97.9, 99.7, 100.7, 98.3, 94.2, 90.9, 86.1, 81.2, 76.9]\)
t
and y
and m
are already defined, so you can directly use these variables in your code.
We will consider three different models, with the following observation equations:
Model 1: constant velocity, \(\mathbb{E}\left( Y_i \right) = x_0 + x_1 t_i\)
Model 2: 8th order polynomial, \(\mathbb{E}\left( Y_i \right) =x_0 + x_1 t_i + x_2 t_i^2 +\ldots+ x_8 t_i^8 = \sum_{p=0}^8 x_p t_i^p \)
Model 3: constant velocity + annual signal, \(\mathbb{E}\left( Y_i \right) = x_0 + x_1 t_i + x_2 \cos \Big(\frac{2 \pi t_i}{12} \Big)\)
import micropip
await micropip.install("ipywidgets")
import numpy as np
import matplotlib.pyplot as plt
import operator
import ipywidgets as widgets
from IPython.display import display
%matplotlib inline
def check_answer(variable_name, expected, comparison = operator.eq):
output = widgets.Output()
button = widgets.Button(description="Check answer")
def _inner_check(button):
with output:
if comparison(globals()[variable_name], expected):
output.outputs = [{'name': 'stdout', 'text': 'Correct!',
'output_type': 'stream'}]
else:
output.outputs = [{'name': 'stdout', 'text': 'Incorrect!',
'output_type': 'stream'}]
button.on_click(_inner_check)
display(button, output)
# times of observation [months]
t = np.arange(12)
# observed heights [m]
y = [102.4, 98.2, 97.5, 97.9, 99.7, 100.7, 98.3, 94.2,
90.9, 86.1, 81.2, 76.9]
m = len(t)
A_1_ans = np.column_stack((np.ones(m), t))
A_2_ans = np.column_stack((A_1_ans, t**2, t**3, t**4,
t**5, t**6, t**7, t**8))
A_3_ans = np.column_stack((np.ones(m), t, np.cos(2*np.pi*t/12)))
def lsqe_ans(y,A):
'''Apply least-squares estimation
Input:
y : vector with observations
A : design matrix
'''
# estimated parameters
xhat = np.linalg.inv(A.T @ A) @ A.T @ y
# adjusted observations
yhat = A @ xhat
# residuals
ehat = y - yhat
# squared norm of residuals
eTe = ehat.T @ ehat
return xhat, yhat, ehat, eTe
xhat_1_ans, yhat_1_ans, ehat_1_ans, eTe_1_ans = lsqe_ans(y,A_1_ans)
xhat_2_ans, yhat_2_ans, ehat_2_ans, eTe_2_ans = lsqe_ans(y,A_2_ans)
xhat_3_ans, yhat_3_ans, ehat_3_ans, eTe_3_ans = lsqe_ans(y,A_3_ans)
With column_stack
you can combine arrays (column vectors) to create a matrix. The design matrices for the first two models are already given. Add the matrix for the third model yourself, and check your answer below.
A_1 = np.column_stack((np.ones(m), t))
A_2 = np.column_stack((A_1, t**2, t**3, t**4, t**5, t**6, t**7, t**8))
A_3 = ?
check_answer("A_3",A_3_ans, np.array_equiv)
You can use the NumPy function np.linalg.inv
to compute the inverse of a matrix. Recall that for a matrix product \(\mathrm{A}\cdot \mathrm{B} \) you can use A @ B
. Complete the function below. The transpose \(\mathrm{A}^T\) is obtained with A.T
.
First complete the function lsqe
, and check whether it is correct below.
def lsqe(y,A):
'''Apply least-squares estimation
Input:
y : vector with observations
A : design matrix
'''
# estimated parameters
xhat = ?
# adjusted observations
yhat = A @ xhat
# residuals
ehat = ?
# squared norm of residuals
eTe = ehat.T @ ehat
return xhat, yhat, ehat, eTe
xhat_1, yhat_1, ehat_1, eTe_1 = lsqe(y,A_1)
xhat_2, yhat_2, ehat_2, eTe_2 = lsqe(y,A_2)
xhat_3, yhat_3, ehat_3, eTe_3 = lsqe(y,A_3)
print(f'Redundancy of linear trend model: '
f'{m - np.shape(A_1)[1]}')
print(f'Redundancy of 8th order polynomial model: '
f'{m - np.shape(A_2)[1]}')
print(f'Redundancy of linear trend + annual signal model: '
f'{m - np.shape(A_3)[1]}')
Check whether xhat_1
is correct (if not, you need to correct your lsqe
function).
check_answer("xhat_1", xhat_1_ans, np.array_equiv)
Check whether eTe_1
is correct (if not, you need to correct your lsqe
function).
check_answer("eTe_1", eTe_1_ans, np.array_equiv)
Solution
In case your code did not work, here’s the correct solution:
xhat = np.linalg.inv(A.T @ A) @ A.T @ y
ehat = y-yhat
Running the next cell will plot the observations and the three fitted models in one figure, and the residuals for each model in another figure. Note that the code with the function plot_results
is hidden.
def plot_results():
print(f'Squared norm of residuals model 1: {eTe_1_ans:.3f}')
print(f'Squared norm of residuals model 2: {eTe_2_ans:.3f}')
print(f'Squared norm of residuals model 3: {eTe_3_ans:.3f}')
fig, ax = plt.subplots(1, 2, figsize = (16, 6))
# left side plot : observations and fitted models
ax[0].plot(t, y, 'kx', label='observations')
ax[0].plot(t, yhat_1_ans, color='r', label='model 1')
ax[0].plot(t, yhat_2_ans, color='c', label='model 2')
ax[0].plot(t, yhat_3_ans, color='b', label='model 3')
ax[0].set_ylabel('height [meters]')
ax[0].set_xlabel('time [months]')
ax[0].set_title('Observations and fitted models')
ax[0].set_xlim(-0.2, (m-1)+0.2)
ax[0].legend()
# right side plot : residuals
ax[1].plot(t, ehat_1_ans, '+r', label='model 1')
ax[1].plot(t, ehat_2_ans, '+c', label='model 2')
ax[1].plot(t, ehat_3_ans, '+b', label='model 3')
ax[1].set_ylabel('height [meters]')
ax[1].set_xlabel('time [months]')
ax[1].set_title('Residuals')
ax[1].set_xlim(-0.2, 11.2)
ax[1].legend()
plt.show()
plot_results()
One of the models is underfitting. Try to reason what is meant by this and which model this applies to.
Solution
Model 1 obviously does not capture the annual signal, so it is does not accurately describe the relation between observations and height change. This is referred to as underfitting.
One of the models is overfitting. Try to reason what is meant by this and which model this applies to.
Solution
Model 2 is overfitting - it results in very small residuals since we include a lot of parameters, but what is the physical interpretation? Including higher orders will result in an even better fit (ultimately the perfect fit, see below). A big problem is that most likely this model will not be able to predict the height change for the future. This is illustrated in the figure below, where we predict the height for the coming 2 months, and see how the polynomial model predicts a very unrealistic increase in height.
What would happen if you would fit a 11th order polynomial?
Solution
This would result in a perfect fit: \(\mathrm{A}\) becomes a full-rank square matrix, such that \(\hat{\mathrm{x}} = \mathrm{A}^{-1} \mathrm{y}\) and the residuals become 0.
Do the residuals (see plot) behave as expected for each of the models.
Solution
No, you would expect them to fluctuate randomly around zero, but especially for model 1 you can see a clear residual signal, which is an indication that the model is too simplistic.
Discuss why the squared norm of residuals is quite different for the 3 models.
Solution
The squared norm for model 1 is very large, due to the annual signal that is not accounted for. The squared norm for model 2 is very small, so it results in a very good fit (due to the high order of the polynomial). Model 3 results in a somewhat larger squared norm than model 2 - this is due to the additional parameters in model 2, which will result in a better fit (i.e., it allows to better capture the actual height variation over time).
How could we quantitatvely assess whether model 3 is a good choice (this is a topic we will discuss later, but try to think for yourself how it could be done).
Solution
You would somehow have to compare the size of the residuals with the precision as described by \(\Sigma_{\epsilon}\).
def plot_prediction():
# create array with times
t_pred = np.arange(14)
m_pred = len(t_pred)
# create the A-matrices for the given times
A_pred_1 = np.column_stack((np.ones(m_pred), t_pred))
A_pred_2 = np.column_stack((A_pred_1, t_pred**2, t_pred**3, t_pred**4, t_pred**5, t_pred**6, t_pred**7, t_pred**8))
A_pred_3 = np.column_stack((np.ones(m_pred), t_pred, np.cos(2*np.pi*t_pred/12)))
# calculate predicted heights
model_1 = A_pred_1 @ xhat_1
model_2 = A_pred_2 @ xhat_2
model_3 = A_pred_3 @ xhat_3
# plot the predicted heights bases on the three models
plt.figure()
plt.plot(t_pred, model_1, color='r', label='model 1')
plt.plot(t_pred, model_2, color='c', label='model 2')
plt.plot(t_pred, model_3, color='b', label='model 3')
plt.xlim(-0.2, (m_pred-1)+0.2)
plt.xlabel('time [months]')
plt.ylabel('predicted height [meters]')
plt.legend(loc='best')
plt.show()
Running the next cell will plot the fitted models including predictions for the next two months. Note that the code with the function plot_prediction
is hidden.
plot_prediction()
Attribution
This chapter was written by Sandra Verhagen. Find out more here.