Notebook exercise: which melting model is better?#

In this interactive notebook two models are fitted to a time series of height observations of a point on a glacier.

It is the same problem as we used in the notebook in the chapter on Weighted least-squares.

You will apply a statistical test to decide between two different models to describe the ice melt as function of time.

Click –> Live Code on the top right corner of this screen and then wait until all cells are executed.

You have 12 monthly measurements of the height of a point on a glacier. The measurements are obtained from a satellite laser altimeter.

  • Time [months]: t \(=[0, 1, 2, \ldots, 11]\)

  • Observed heights [meters]: y \(=[102.4, 98.2, 97.5, 97.9, 99.7, 100.7, 98.3, 94.2, 90.9, 86.1, 81.2, 76.9]\)

We will consider three different models, with the following observation equations:

  • Model 1: constant velocity, \(\mathbb{E}\left( Y_i \right) = x_0 + x_1 t_i\)

  • Model 2: constant velocity + annual signal, \(\mathbb{E}\left( Y_i \right) = x_0 + x_1 t_i + x_2 cos \Big(\frac{2 \pi t_i}{12} \Big)\)

The observation are independent and have a precision of 1 m.

The first part of the code is hidden in the interactive version in the book.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
from scipy.stats.distributions import chi2
%matplotlib inline

import micropip
await micropip.install("ipywidgets")
import ipywidgets as widgets
from IPython.display import display
import operator

def check_answer(variable_name, expected, comparison = operator.eq):
    output = widgets.Output()
    button = widgets.Button(description="Check answer")
    def _inner_check(button):
        with output:
            if comparison(globals()[variable_name], expected):
                output.outputs = [{'name': 'stdout', 'text': 'Correct!', 'output_type': 'stream'}]
            else:
                output.outputs = [{'name': 'stdout', 'text': 'Incorrect!', 'output_type': 'stream'}]
    button.on_click(_inner_check)
    display(button, output)
t = np.arange(12)
y = np.array([102.4, 98.2, 97.5, 97.9, 99.7, 100.7, 
               98.3, 94.2, 90.9, 86.1, 81.2, 76.9])
m = len(t)

alpha = 0.025

A_1 = np.column_stack((np.ones(m), t))
A_2 = np.column_stack((np.ones(m), t, np.cos(2*np.pi*t/12)))

xhat_1 = np.linalg.inv(A_1.T @ A_1) @ A_1.T @ y
xhat_2 = np.linalg.inv(A_2.T @ A_2) @ A_2.T @ y
yhat_1 = A_1 @ xhat_1
yhat_2 = A_2 @ xhat_2
ehat_1 = y - A_1 @ xhat_1
ehat_2 = y - A_2 @ xhat_2

eTe_1 = ehat_1.T @ ehat_1
eTe_2 = ehat_2.T @ ehat_2

Plot with observations and the fitted models#

The squared norm of residuals is equal to \(\hat{\epsilon}^T \Sigma_Y^{-1} \hat{\epsilon}\).

print(f'Squared norm of residuals for model 1: {eTe_1:.3f}')
print(f'Squared norm of residuals for model 2: {eTe_2:.3f}')

plt.figure()
plt.plot(t, y, 'kx', label='observations')
plt.plot(t, yhat_1, color='r', label='model 1')
plt.plot(t, yhat_2, color='b', label='model 2')
plt.xlim(-0.2, (m-1)+0.2)
plt.xlabel('time [months]')
plt.ylabel('height [meters]')
plt.legend(loc='best')
plt.show()

Apply the generalized likelihood ratio test#

The null hypothesis is that the linear trend model is applicable; the alternative hypothesis is that the linear trend + annual signal model is correct. Implement the test to decide whether the null hypothesis is accepted, or rejected in favor of the alternative hypothesis. Use a false alarm probability of 0.025.

Generalized likelihood ratio test:#

Accept alternative hypothesis and reject null hypothesis if: \( T_q = \hat{\epsilon}^T\Sigma_Y^{-1}\hat{\epsilon}-\hat{\epsilon}_a^T\Sigma_Y^{-1}\hat{\epsilon}_a > k_{\alpha}\)

First calculate the threshold value k \(=k_{\alpha}\). Recall that \(T_q \sim \chi^2(q,0)\), where \(q\) is the number of extra parameters in the alternative hypothesis. You can use chi2.ppf from SciPy.

Then calculate the test statistic value Tq. The squared norms of residuals of model 1 and 2 have already been calculated above (code is hidden) and have variable names eTe_1 and eTe_2. You will also need to add the ‘test’ in the if statement.

alpha = 0.025
k = ?
check_answer("k",chi2.ppf(1 - alpha, 1), np.array_equiv)
Tq = ?

if ?:
    print(f'Test statistic = {Tq:.2f}\n')
    print(f'Threshold value = {k:.2f}\n')
    print('Null hypothesis rejected, Alternative hypothesis accepted')
else:
    print(f'Test statistic = {Tq:.2f}\n')
    print(f'Threshold value = {k:.2f}\n')
    print('Null hypothesis accepted')    
check_answer("Tq",eTe_1 - eTe_2, np.array_equiv)

As we could have guessed from the figure, model 2 (linear trend + annual signal) provides a much better fit. In many practical situations it may be less obvious whether a model fits significantly better than the null hypothesis, and the general likelihood test provides a statistical test to assess this.

Challenge:#

Try to come up with another model with at most 4 parameters, and then:

  1. set-up the \(\mathrm{A}\)-matrix,

  2. apply BLUE,

  3. compute the sum of squared residuals \(\hat{\epsilon}^T \Sigma_Y^{-1} \hat{\epsilon}\) for that model

  4. apply the GLRT with model 2 as null hypothesis, and your model as alternative hypothesis.

Attribution

This chapter was written by Sandra Verhagen. Find out more here.