Notebook exercise: fitting different models

Notebook exercise: fitting different models#

In this interactive notebook you will fit several models to a time series of height observations of a point on a glacier, to assess whether it is melting.

Click –> Live Code on the top right corner of this screen and then wait until all cells are executed.

Learning objectives:

set-up an observation model
apply least-squares estimation
assess the estimation results considering redundancy, squared norm of residuals, under- and overfitting

You have m = 12 monthly measurements of the height of a point on a glacier. The measurements are obtained from a satellite laser altimeter.

Time [months]: t \(=[0, 1, 2, \ldots, 11]\)
Observed heights [meters]: y \(=[102.4, 98.2, 97.5, 97.9, 99.7, 100.7, 98.3, 94.2, 90.9, 86.1, 81.2, 76.9]\)

t and y and m are already defined, so you can directly use these variables in your code.

We will consider three different models, with the following observation equations:

Model 1: constant velocity, \(\mathbb{E}\left( Y_i \right) = x_0 + x_1 t_i\)
Model 2: 8th order polynomial, \(\mathbb{E}\left( Y_i \right) =x_0 + x_1 t_i + x_2 t_i^2 +\ldots+ x_8 t_i^8 = \sum_{p=0}^8 x_p t_i^p \)
Model 3: constant velocity + annual signal, \(\mathbb{E}\left( Y_i \right) = x_0 + x_1 t_i + x_2 \cos \Big(\frac{2 \pi t_i}{12} \Big)\)

import micropip
await micropip.install("ipywidgets")

import numpy as np
import matplotlib.pyplot as plt
import operator

import ipywidgets as widgets
from IPython.display import display

%matplotlib inline

def check_answer(variable_name, expected, comparison = operator.eq):
    output = widgets.Output()
    button = widgets.Button(description="Check answer")
    def _inner_check(button):
        with output:
            if comparison(globals()[variable_name], expected):
                output.outputs = [{'name': 'stdout', 'text': 'Correct!',
                                   'output_type': 'stream'}]
            else:
                output.outputs = [{'name': 'stdout', 'text': 'Incorrect!',
                                   'output_type': 'stream'}]
    button.on_click(_inner_check)
    display(button, output)

# times of observation [months]
t = np.arange(12)

# observed heights [m]
y = [102.4, 98.2, 97.5, 97.9, 99.7, 100.7, 98.3, 94.2,
                                     90.9, 86.1, 81.2, 76.9]

m = len(t)

A_1_ans = np.column_stack((np.ones(m), t))
A_2_ans = np.column_stack((A_1_ans, t**2, t**3, t**4,
                                    t**5, t**6, t**7, t**8))
A_3_ans = np.column_stack((np.ones(m), t, np.cos(2*np.pi*t/12)))

def lsqe_ans(y,A):
    '''Apply least-squares estimation
    Input:
    y : vector with observations
    A : design matrix 
    '''
    # estimated parameters
    xhat = np.linalg.inv(A.T @ A) @ A.T @ y
    # adjusted observations
    yhat = A @ xhat
    # residuals
    ehat = y - yhat
    # squared norm of residuals
    eTe = ehat.T @ ehat
    
    return xhat, yhat, ehat, eTe

xhat_1_ans, yhat_1_ans, ehat_1_ans, eTe_1_ans = lsqe_ans(y,A_1_ans)
xhat_2_ans, yhat_2_ans, ehat_2_ans, eTe_2_ans = lsqe_ans(y,A_2_ans)
xhat_3_ans, yhat_3_ans, ehat_3_ans, eTe_3_ans = lsqe_ans(y,A_3_ans)

Exercise 1

With column_stack you can combine arrays (column vectors) to create a matrix. The design matrices for the first two models are already given. Add the matrix for the third model yourself, and check your answer below.

A_1 = np.column_stack((np.ones(m), t))
A_2 = np.column_stack((A_1, t**2, t**3, t**4, t**5, t**6, t**7, t**8))
A_3 = ?

check_answer("A_3",A_3_ans, np.array_equiv)

Exercise 2

You can use the NumPy function np.linalg.inv to compute the inverse of a matrix. Recall that for a matrix product \(\mathrm{A}\cdot \mathrm{B} \) you can use A @ B. Complete the function below. The transpose \(\mathrm{A}^T\) is obtained with A.T.

First complete the function lsqe, and check whether it is correct below.

def lsqe(y,A):
    '''Apply least-squares estimation
    Input:
    y : vector with observations
    A : design matrix 
    '''
    # estimated parameters
    xhat = ?
    # adjusted observations
    yhat = A @ xhat
    # residuals
    ehat = ?
    # squared norm of residuals
    eTe = ehat.T @ ehat
    
    return xhat, yhat, ehat, eTe

xhat_1, yhat_1, ehat_1, eTe_1 = lsqe(y,A_1)
xhat_2, yhat_2, ehat_2, eTe_2 = lsqe(y,A_2)
xhat_3, yhat_3, ehat_3, eTe_3 = lsqe(y,A_3)

print(f'Redundancy of linear trend model: '
      f'{m - np.shape(A_1)[1]}')
print(f'Redundancy of 8th order polynomial model: '
      f'{m - np.shape(A_2)[1]}')
print(f'Redundancy of linear trend + annual signal model: '
      f'{m - np.shape(A_3)[1]}')

Check whether xhat_1 is correct (if not, you need to correct your lsqe function).

check_answer("xhat_1", xhat_1_ans, np.array_equiv)

Check whether eTe_1 is correct (if not, you need to correct your lsqe function).

check_answer("eTe_1", eTe_1_ans, np.array_equiv)

Solution

In case your code did not work, here’s the correct solution:

xhat = np.linalg.inv(A.T @ A) @ A.T @ y

ehat = y-yhat

Running the next cell will plot the observations and the three fitted models in one figure, and the residuals for each model in another figure. Note that the code with the function plot_results is hidden.

def plot_results():
    print(f'Squared norm of residuals model 1: {eTe_1_ans:.3f}')
    print(f'Squared norm of residuals model 2: {eTe_2_ans:.3f}')
    print(f'Squared norm of residuals model 3: {eTe_3_ans:.3f}')

    fig, ax = plt.subplots(1, 2, figsize = (16, 6))
    # left side plot : observations and fitted models
    ax[0].plot(t, y, 'kx', label='observations')
    ax[0].plot(t, yhat_1_ans, color='r', label='model 1')
    ax[0].plot(t, yhat_2_ans, color='c', label='model 2')
    ax[0].plot(t, yhat_3_ans, color='b', label='model 3')
    ax[0].set_ylabel('height [meters]')
    ax[0].set_xlabel('time [months]')
    ax[0].set_title('Observations and fitted models')
    ax[0].set_xlim(-0.2, (m-1)+0.2)
    ax[0].legend()
    
    # right side plot : residuals
    ax[1].plot(t, ehat_1_ans, '+r', label='model 1')
    ax[1].plot(t, ehat_2_ans, '+c', label='model 2')
    ax[1].plot(t, ehat_3_ans, '+b', label='model 3')
    ax[1].set_ylabel('height [meters]')
    ax[1].set_xlabel('time [months]')
    ax[1].set_title('Residuals')
    ax[1].set_xlim(-0.2, 11.2)
    ax[1].legend()
    plt.show()

plot_results()

Wrap-up exercises

One of the models is underfitting. Try to reason what is meant by this and which model this applies to.

Solution

Model 1 obviously does not capture the annual signal, so it is does not accurately describe the relation between observations and height change. This is referred to as underfitting.

One of the models is overfitting. Try to reason what is meant by this and which model this applies to.

Solution

Model 2 is overfitting - it results in very small residuals since we include a lot of parameters, but what is the physical interpretation? Including higher orders will result in an even better fit (ultimately the perfect fit, see below). A big problem is that most likely this model will not be able to predict the height change for the future. This is illustrated in the figure below, where we predict the height for the coming 2 months, and see how the polynomial model predicts a very unrealistic increase in height.

What would happen if you would fit a 11th order polynomial?

Solution

This would result in a perfect fit: \(\mathrm{A}\) becomes a full-rank square matrix, such that \(\hat{\mathrm{x}} = \mathrm{A}^{-1} \mathrm{y}\) and the residuals become 0.

Do the residuals (see plot) behave as expected for each of the models.

Solution

No, you would expect them to fluctuate randomly around zero, but especially for model 1 you can see a clear residual signal, which is an indication that the model is too simplistic.

Discuss why the squared norm of residuals is quite different for the 3 models.

Solution

The squared norm for model 1 is very large, due to the annual signal that is not accounted for. The squared norm for model 2 is very small, so it results in a very good fit (due to the high order of the polynomial). Model 3 results in a somewhat larger squared norm than model 2 - this is due to the additional parameters in model 2, which will result in a better fit (i.e., it allows to better capture the actual height variation over time).

How could we quantitatvely assess whether model 3 is a good choice (this is a topic we will discuss later, but try to think for yourself how it could be done).

Solution

You would somehow have to compare the size of the residuals with the precision as described by \(\Sigma_{\epsilon}\).

def plot_prediction():
    # create array with times
    t_pred = np.arange(14)
    m_pred = len(t_pred)

    # create the A-matrices for the given times
    A_pred_1 = np.column_stack((np.ones(m_pred), t_pred))
    A_pred_2 = np.column_stack((A_pred_1, t_pred**2, t_pred**3, t_pred**4, t_pred**5, t_pred**6, t_pred**7, t_pred**8))
    A_pred_3 = np.column_stack((np.ones(m_pred), t_pred, np.cos(2*np.pi*t_pred/12)))

    # calculate predicted heights
    model_1 = A_pred_1 @ xhat_1
    model_2 = A_pred_2 @ xhat_2
    model_3 = A_pred_3 @ xhat_3

    # plot the predicted heights bases on the three models
    plt.figure()
    plt.plot(t_pred, model_1, color='r', label='model 1')
    plt.plot(t_pred, model_2, color='c', label='model 2')
    plt.plot(t_pred, model_3, color='b', label='model 3')
    plt.xlim(-0.2, (m_pred-1)+0.2)
    plt.xlabel('time [months]')
    plt.ylabel('predicted height [meters]')
    plt.legend(loc='best')
    plt.show()

Running the next cell will plot the fitted models including predictions for the next two months. Note that the code with the function plot_prediction is hidden.

plot_prediction()

Attribution

This chapter was written by Sandra Verhagen. Find out more here.