Workshop 2: Is it Melting?¶

CEGM1000 MUDE: Week 1.3. Wednesday, Sep 18, 2024.

In this notebook you will fit a model to a time series of height observations of a point on a glacier, to assess whether it is melting.

Learning objectives:

apply least-squares (LS) and best linear unbiased (BLU) estimation
evaluate the precision of the estimated parameters
discuss the differences between least-squares and best linear unbiased estimation

You have 12 monthly measurements of the height of a point on a glacier. The measurements are obtained from a satellite laser altimeter. The observations are given in the code below.

We will fit a model assuming a linear trend (constant velocity) to account for the melting, plus an annual signal due to the seasonal impact on ice. Formulating a generalized form of this model is part of your task below. Note, however, that the periodic signal can be implemented in many ways. For this assignment you can assume that the periodic term is implemented as follows:

$$ \cos(2 \pi t / 12 + \phi) $$

where $\phi$ is the phase angle, which we will assume is 0 for this workshop and will leave out of the model formulation in order to keep our model linear (i.e., leave it out of the A-matrix).

The precision (1 $\sigma$) of the first six observations is 0.7 meter, the last six observations have a precision of 2 meter due to a change in the settings of the instrument. All observations are mutually independent.

In [2]:

import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt


# this will print all float arrays with 3 decimal places
float_formatter = "{:.3f}".format
np.set_printoptions(formatter={'float_kind':float_formatter})

Part 1: Observation model¶

First we will construct the observation model, based on the following information that is provided: times of observations, observed heights and number of observations.

$\textbf{Task 1.1}:$

Complete the definition of the observation time array times and define the other variables such that the print statements correctly describe the design matrix A.

In [ ]:

y = [59.82, 57.20, 59.09, 59.49, 59.68, 59.34, 60.95, 59.34, 55.38, 54.33, 48.71, 48.47]

times = YOUR_CODE_HERE
number_of_observations = YOUR_CODE_HERE
number_of_parameters = YOUR_CODE_HERE
print(f'Dimensions of the design matrix A:')
print(f'  {YOUR_CODE_HERE} rows')
print(f'  {YOUR_CODE_HERE} columns')

Next, we need to assemble the design matrix in Python. Rather than enter each value manually, we have a few tricks to make this easier. Here is an example Python cell that illustrates how one-dimensional arrays (i.e., columns or rows) can be assembled into a matrix in two ways:

As diagonal elements, and
As parallel elements (columns, in this case)

Note also the use of np.ones() to quickly make a row or column of any size with the same values in each element.

In [ ]:

column_1 = np.array([33333, 0, 0])
column_2 = 99999*np.ones(3)

example_1 = np.diagflat([column_1, column_2])
example_2 = np.column_stack((column_1, column_2))

print(example_1, '\n\n', example_2)

See documentation on np.diagflat and np.column_stack if you would like to understand more about these Numpy methods.

$\textbf{Task 1.2}:$

Complete the $\mathrm{A}$-matrix (design matrix) and covariance matrix $\Sigma_Y$ (stochastic model) as a linear trend with an annual signal. The assert statement is used to confirm that the dimensions of your design matrix are correct (an error will occur if not).

In [16]:

A = YOUR_CODE_HERE
Sigma_Y = YOUR_CODE_HERE

assert A.shape == (number_of_observations, number_of_parameters)

2. Least-squares and Best linear unbiased (BLU) estimation¶

The BLU estimator is a linear and unbiased estimator, which provides the best precision, where:

linear estimation: $\hat{X}$ is a linear function of the observables $Y$,
unbiased estimation: $\mathbb{E}(\hat{X})=\mathrm{x}$,
best precision: sum of variances, $\sum_i \sigma_{\hat{x}_i}^2$, is minimized.

The solution $\hat{X}$ is obtained as: $$ \hat{X} = \left(\mathrm{A}^T\Sigma_Y^{-1} \mathrm{A} \right)^{-1} \mathrm{A}^T\Sigma_Y^{-1}Y $$ Note that here we are looking at the estimator, which is random, since it is expressed as a function of the observable vector $Y$. Once we have a realization $y$ of $Y$, the estimate $\hat{\mathrm{x}}$ can be computed.

It can be shown that the covariance matrix of $\hat{X}$ is given by: $$ \Sigma_{\hat{X}} = \left(\mathrm{A}^T\Sigma_Y^{-1} \mathrm{A} \right)^{-1} $$

$\textbf{Task 2.1}$

Apply least-squares, and best linear unbiased estimation to estimate $\mathrm{x}$. The code cell below outlines how you should compute the inverse of $\Sigma_Y$. Compute the least squares estimates then the best linear unbiased estimate.

Remember that the @ operator (matmul operator) can be used to easily carry out matrix multiplication for Numpy arrays. The transpose method of an array, my_array.T, is also useful.

In [ ]:

inv_Sigma_Y = np.linalg.inv(Sigma_Y)

xhat_LS = YOUR_CODE_HERE
xhat_BLU = YOUR_CODE_HERE

print('LS estimates in [m], [m/month], [m], resp.:\t', xhat_LS)
print('BLU estimates in [m], [m/month], [m], resp.:\t', xhat_BLU)

Task 2.2: The covariance matrix of least-squares can be obtained as well by applying the covariance propagation law. Calculate the covariance matrix and vector with standard deviations of both the least-squares and BLU estimates. What is the precision of the estimated parameters? The diagonal of a matrix can be extracted with np.diagonal.

Hint: define an intermediate variable first to collect some of the repetitive matrix terms.

In [ ]:

LT = YOUR_CODE_HERE
Sigma_xhat_LS = LT YOUR_CODE_HERE
std_xhat_LS = YOUR_CODE_HERE

Sigma_xhat_BLU = YOUR_CODE_HERE
std_xhat_BLU = YOUR_CODE_HERE

print(f'Precision of LS  estimates in [m], [m/month], [m], resp.:', std_xhat_LS)
print(f'Precision of BLU estimates in [m], [m/month], [m], resp.:', std_xhat_BLU)

Task 2.3: We are mostly interested in the melting rate. Discuss the estimated melting rate with respect to its precision.

Write your answer here.

3. Residuals and plot with observations and fitted models.¶

Run the code below (no changes needed) to calculate the weighted squared norm of residuals with both estimation methods, and create plots of the observations and fitted models.

In [ ]:

eTe_LS = (y - A @ xhat_LS).T @ (y - A @ xhat_LS)
eTe_BLU = (y - A @ xhat_BLU).T @ inv_Sigma_Y @ (y - A @ xhat_BLU)

print(f'Weighted squared norm of residuals with LS  estimation: {eTe_LS:.3f}')
print(f'Weighted squared norm of residuals with BLU estimation: {eTe_BLU:.3f}')

plt.figure()
plt.rc('font', size=14)
plt.plot(times, y, 'kx', label='observations')
plt.plot(times, A @ xhat_LS, color='r', label='LS')
plt.plot(times, A @ xhat_BLU, color='b', label='BLU')
plt.xlim(-0.2, (number_of_observations - 1) + 0.2)
plt.xlabel('time [months]')
plt.ylabel('height [meters]')
plt.legend(loc='best');

Task 3.1:

Explain the difference between the fitted models.
Can we see from this figure which model fits better (without information about the stochastic model)?

Write your answer here.

4. Confidence bounds¶

In the code below you need to calculate the confidence bounds, e.g., CI_yhat_BLU is a vector with the values $k\cdot\sigma_{\hat{y}_i}$ for BLUE. This will then be used to plot the confidence intervals: $$ \hat{y}_i \pm k\cdot\sigma_{\hat{y}_i} $$

Recall that $k$ can be calculated from $P(Z < k) = 1-\frac{1}{2}\alpha$.

$\textbf{Task 4.1}$

Complete the code below to calculate the 98% confidence intervals of both the observations $y$ and the adjusted observations $\hat{y}$.

Use norm.ppf to compute $k_{98}$. Also try different percentages.

In [33]:

yhat_LS = YOUR_CODE_HERE
Sigma_Yhat_LS = YOUR_CODE_HERE
yhat_BLU = YOUR_CODE_HERE
Sigma_Yhat_BLU = YOUR_CODE_HERE

alpha = YOUR_CODE_HERE
k98 = YOUR_CODE_HERE

CI_y = YOUR_CODE_HERE
CI_yhat_LS = YOUR_CODE_HERE
CI_yhat_BLU = YOUR_CODE_HERE

You can directly run the code below to create the plots.

In [ ]:

plt.figure(figsize = (10,4))
plt.rc('font', size=14)
plt.subplot(121)
plt.plot(times, y, 'kx', label='observations')
plt.plot(times, yhat_LS, color='r', label='LS')
plt.plot(times, yhat_LS + CI_yhat_LS, 'r:', label=f'{100*(1-alpha):.1f}% conf.')
plt.plot(times, yhat_LS - CI_yhat_LS, 'r:')
plt.xlabel('time [months]')
plt.ylabel('height [meters]')
plt.legend(loc='best')
plt.subplot(122)
plt.plot(times, y, 'kx', label='observations')
plt.errorbar(times, y, yerr = CI_y, fmt='', capsize=5, linestyle='', color='blue', alpha=0.6)
plt.plot(times, yhat_BLU, color='b', label='BLU')
plt.plot(times, yhat_BLU + CI_yhat_BLU, 'b:', label=f'{100*(1-alpha):.1f}% conf.')
plt.plot(times, yhat_BLU - CI_yhat_BLU, 'b:')
plt.xlim(-0.2, (number_of_observations - 1) + 0.2)
plt.xlabel('time [months]')
plt.legend(loc='best');

Task 4.2: Discuss the shape of the confidence bounds. Do you think the model (linear trend + annual signal) is a good choice?

Write your answer here.

Task 4.3: What is the BLU-estimated melting rate and the amplitude of the annual signal and their 98% confidence interval? Hint: extract the estimated values and standard deviations from xhat_BLU and std_xhat_BLU, respectively.

In [ ]:

rate =YOUR_CODE_HERE
CI_rate = YOUR_CODE_HERE

amplitude = YOUR_CODE_HERE
CI_amplitude = YOUR_CODE_HERE

print(f'The melting rate is {rate:.3f} ± {CI_rate:.3f} m/month (98% confidence level)')
print(f'The amplitude of the annual signal is {amplitude:.3f} ± {CI_amplitude:.3f} m (98% confidence level)')

Task 4.4: Can we conclude the glacier is melting due to climate change?

Write your answer here.

End of notebook.

Workshop 2: Is it Melting?¶

.markdown {width:100%; position: relative} article { position: relative }

Part 1: Observation model¶

2. Least-squares and Best linear unbiased (BLU) estimation¶

3. Residuals and plot with observations and fitted models.¶

4. Confidence bounds¶

.markdown {width:100%; position: relative} article { position: relative }