Group Assignment 1.8: Fit (and test) me baby, one more time#

Data Focus: GNSS Observations#

In the previous assignment, you collected and analyzed height data from two different techniques: Global Navigation Satellite Systems (GNSS) and Interferometric Synthetic-aperture radar (InSAR). For this continuation assignment, we will focus exclusively on the GNSS height observations. This involves using the time series data provided in the file gnss_observations.csv, which lists observed vertical displacements (in meters). You will not be using the InSAR data in this week’s assignment.

Stochastic Model of Observations Consistent with the previous assignment, the GNSS observations are characterized by the following stochastic properties:

  • They are normally distributed and independent.

  • They have a known standard deviation \(\sigma_\textrm{GNSS} = 15\) mm

Assignment Theme: Non-Linear Functional Models#

The core focus of this week’s work is the estimation of parameters using non-linear functional models. You will explore the techniques required to solve these models, including estimating the parameters and assessing the precision of the estimates, and conducting rigorous statistical evaluations using methods like the overall model test (OMT) and the generalized likelihood ratio test (GLRT).

import numpy as np
import scipy as sc
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats.distributions import chi2
from scipy.stats import norm
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual

np.set_printoptions(precision=3)
import os
from urllib.request import urlretrieve

def findfile(fname):
    if not os.path.isfile(fname):
        print(f"Downloading {fname}...")
        urlretrieve('http://files.mude.citg.tudelft.nl/'+fname, fname)

findfile('gnss_observations.csv')

Part 1: Preparing the data and code from last week’s assignment#

As mentioned, this assignment focuses on only the GNSS observations. Here the data is loaded. Make sure to add your own code where needed. You should have done this already in last week’s assignment. The BLUE function (BLUE) and functions for plotting the residuals (plot_residuals) are copied here from last week’s assignment here.

gnss = pd.read_csv('./gnss_observations.csv')
gnss_dates = pd.to_datetime(gnss['Dates'])
gnss_doy = (gnss['Day of Year']).to_numpy()
gnss_obs = (gnss['Observations[m]']).to_numpy()

m_gnss = len(gnss_obs)
y_gnss = gnss_obs
std_gnss = 0.015 #m
Sigma_Y_gnss = np.identity(len(gnss_dates))*std_gnss**2
def BLUE(A, y, Sigma_Y):
    """Calculate the Best Linear Unbiased Estimator
    
    Write a docstring here (an explanation of your function).
    
    Function to calculate the Best Linear Unbiased Estimator
    
    Input:
        A = A matrix (mxn)
        y = vector with observations (mx1)
        Sigma_Y = variance-covariance matrix of the observations (mxm)
    
    Output:
        xhat = vector with the estimates (nx1)
        Sigma_Xhat = variance-covariance matrix of the unknown parameters (nxn)
    """
    
    Sigma_Xhat = np.linalg.inv(A.T @ np.linalg.inv(Sigma_Y) @ A)
    xhat = Sigma_Xhat @ A.T @ np.linalg.inv(Sigma_Y) @ y
    
    return xhat, Sigma_Xhat  
def plot_residual(date, y_obs, yhat, data_type, A,
                  Sigma_Xhat, Sigma_Y):

    ehat = y_obs - yhat

    # Compute the vc matrix for \hat{y}
    Sigma_Yhat = A @ Sigma_Xhat @ A.T
    std_y = np.sqrt(Sigma_Yhat.diagonal())

    # Compute the vc matrix for \hat{e}
    Sigma_ehat = Sigma_Y - Sigma_Yhat
    std_ehat = np.sqrt(Sigma_ehat.diagonal())
    
    # Compute residuals normalized by standard deviation
    ehat_normalized = ehat / std_ehat

    # Show the 99% confidence interval
    k99 = norm.ppf(1 - 0.5*0.01)
    confidence_interval_y = k99*std_y
    confidence_interval_res = k99*std_ehat

    # Plot original data and fitted model
    plt.figure(figsize = (15,5))
    plt.plot(date, y_obs, 'k+',  label = 'Observations')
    plt.plot(date, yhat,  label = 'Fitted model')
    plt.fill_between(date, (yhat - confidence_interval_y), 
                     (yhat + confidence_interval_y), facecolor='orange',
                     alpha=0.4, label = '99% Confidence Region')
    plt.legend()
    plt.ylabel(data_type + ' Displacement [mm]')
    plt.xlabel('Time')
    plt.title(data_type + ' Observations and Fitted Model')


    # Plot time series of the residual
    plt.figure(figsize = (15,5))
    plt.plot(date, ehat, 'o', markeredgecolor='black', label='Residual')
    plt.plot(date,-confidence_interval_res, '--', color='orange', 
             label = '99% Confidence Region')
    plt.plot(date,confidence_interval_res, '--', color='orange')
    plt.legend()
    plt.ylabel(data_type + ' residual [mm]')
    plt.xlabel('Time')
    plt.title(data_type + ' Residuals')

    # Plot histogram and normal distribution
    plt.figure(figsize = (12,5))
    plt.subplot(121)
    plt.hist(ehat_normalized, bins = 40, density=True,  edgecolor='black')
    x = np.linspace(np.min(ehat_normalized), np.max(ehat_normalized), num=100);
    plt.plot(x, norm.pdf(x, loc=0.0, scale = 1),
             linewidth=4.0)
    plt.title(data_type + ' Normalized Residuals Histogram')
    plt.xlabel('residual [-]')
    plt.ylabel('density')
    
    # Q-Q plot
    plt.subplot(122)
    sc.stats.probplot(ehat_normalized, dist="norm", plot=plt)
    plt.title(data_type + ' Normalized Residuals Q-Q Plot')
    plt.xlabel('Theoretical Quantiles')
    plt.ylabel('Sample Quantiles') 
    
    print (f'The mean value of the {data_type} residuals is {np.mean(ehat):.3f} m')
    print (f'The standard deviation of the {data_type} residuals is {np.std(ehat):.3f} m')

    return ehat

Part 2: Set-up non-linear functional model#

In the model we fitted so far, we only considered a linear velocity and seasonal variation in grounwater level. However, in reality the response to groundwater injection is likely to be more complicated than just a linear uplift. Therefore you will now consider a slightly more sophisticated, non-linear model to account for the observed uplift.

Please note that this is very simplified model that does not necessarily rely on physics.

The model under null hypothesis \(H_0\) is defined as:

\[ d = d_0 + v \ (1-\exp\left(\frac{-t}{a}\right)) + A\sin(\frac{2\pi t}{365} - \phi), \]

where \(d\) is the displacement, and \(t\) is the time, and \(\phi =\frac{\pi}{2}\).

The model now has 4 unknowns:

  1. \(d_0\), the initial displacement at \(t_0\)

  2. \(v\), which can be seen as the response of the soil layers due to groundwater injection.

  3. \(a\), a scaling parameter which represents the memory of the system

  4. \(A\), amplitude of the seasonal effect

It can be seen that the model is non-linear. We will therefore use non-linear least-squares to solve for the unknown parameters.

Task 2.1:

Choose initial values for the model parameters. Use the code and Markdown cells below to justify your decision. We suggest two possible approaches: a) use the forward model and make a plot to see if you can get it in the right order of magnitude, or b) make an inference about what the values might be using knowledge about each term in the model.

Note: it may be useful at this point to define a function for your forward model to check the values. You will be able to re-use it in later Tasks as well.

def forward_model(time, d_i, v_i, a_i, A_i):
    """Compute the displacements based on our initial guess.
    """

    y_comp = d_i + v_i*(1-np.exp(-time/a_i)) + A_i * np.sin(2 * np.pi * time / 365 - np.pi/2)

    return y_comp

Note: In the code below we will create an interactive plot. Note that these do not work/render online. You have to download the notebook and run it locally to see the interactivity.

# Your initials guess
v_i = ### YOUR CODE HERE ###
a_i = ### YOUR CODE HERE ###
do_i = ### YOUR CODE HERE ###
A_i = ### YOUR CODE HERE ###


v_slider = widgets.FloatSlider(value=v_i, min=0.0, max=1.0, step=0.01, description='v_i [m/day]:')
a_slider = widgets.FloatSlider(value=a_i, min=0.0, max=150.0, step=1.0, description='a_i [°]:')
do_slider = widgets.FloatSlider(value=do_i, min=9.8, max=10.2, step=0.01, description='d_o [m]:')
A_slider = widgets.FloatSlider(value=A_i, min=-0.3, max=0.5, step=0.01, description='A_i [m]:')

def update_plot(v_i, a_i, do_i, A_i):
    y_init = forward_model(gnss_doy, do_i, v_i, a_i, A_i)
    res = y_init - gnss_obs
    res2 = np.sum(res**2)
    plt.figure(figsize=(15,5))
    plt.plot(gnss_dates, y_init, 'r-', label='Initial guess')
    plt.plot(gnss_dates, gnss_obs, 'co', mec='black', label='Observations')
    plt.title('GNSS Observations and Initial Guess, Residual Sum of Squares: {:.4f} $m^2$'.format(res2))
    plt.xlabel('Dates')
    plt.ylabel('Displacement [m]')
    plt.ylim(np.min(gnss_obs)-3*np.std(gnss_obs), np.max(gnss_obs)+3*np.std(gnss_obs))
    plt.grid()
    plt.legend()
    plt.show()
    
interact(update_plot, v_i=v_slider, a_i=a_slider, do_i=do_slider, A_i=A_slider);

Task 2.2:

Set up the Jacobian matrix for the non-linear least-squares. Using the initial values from Task 2.1, print the first 5 rows of the Jacobian matrix to confirm that your function works (i.e., confirm code runs, and initial values give acceptable results).

Please check the print statement from the cell below to see if you’ve set up your Jacobian correctly!

def jacobian(time, do_i, v_i, a_i, A_i):
    """Create the Jacobian matrix.
    
    The columns represent the linearized system of equations.
    """    
    
    J_c1 = ### YOUR CODE HERE ###
    J_c2 = ### YOUR CODE HERE ###
    J_c3 = ### YOUR CODE HERE ###
    J_c4 = ### YOUR CODE HERE ###

    J = np.column_stack((J_c1, J_c2, J_c3, J_c4))
    
    return J
# Check Jacobian function
time_check = np.array([0, 100, 200])

do_i_check = 1.0
v_i_check = 2.0
a_i_check = 50.0
A_i_check = 0.5

J_check = jacobian(time_check, do_i_check, v_i_check, a_i_check, A_i_check)

J_expected = np.array([
    [1.0, 0.0, -0.0, -1.0],
    [1.0, 0.865, -0.011, 0.15],
    [1.0, 0.982, -0.003, 0.955]
])

if np.allclose(J_check, J_expected, atol=1e-3):
    print("✅ Jacobian matches the expected output!")
else:
    print("❌ Jacobian does NOT match the expected output.")
    print("Computed J:\n", J_check)
    print("Expected J:\n", J_expected)
# Print first 5 rows of the Jacobian matrix

Task 2.3:

What is the redundancy of the model with GNSS?

Part 3: Set-up Gauss-Newton iteration algorithm#

Task 3.1:

Set up a Gauss-Newton iteration algorithm (complete the code below). Choose the criterion to stop the iteration. Explain below how and why you define the stop-criterium.

def nlsq_iteration(x0, day, y_obs, Sigma_Y):
    """Use Gauss-Newton iteration to find non-linear parameters."""

    xnrm = 1000 # initialize stop criteria

    xhat_i = np.zeros((50, 4))
    xhat_i[0,:] = x0

    do_i = xhat_i[0,0]
    v_i = xhat_i[0,1]
    a_i = xhat_i[0,2]
    A_i = xhat_i[0,3]   

    iter_ind = 0

    while xnrm >= 1e-12 and iter_ind < 49:

        # computed deformation yi based on 'estimates' 
        y_i = forward_model(day, do_i, v_i, a_i, A_i)
        
        dy = ### YOUR CODE HERE ###
        J = ### YOUR CODE HERE ###

        d_xhat, Sigma_Xhat = ### YOUR CODE HERE ###
        # Hint: re-use your function BLUE, above

        xhat_i[iter_ind+1,:] = xhat_i[iter_ind,:] + d_xhat.T

        do_i  = xhat_i[iter_ind+1,0]
        v_i  = xhat_i[iter_ind+1,1]
        a_i  = xhat_i[iter_ind+1,2]
        A_i  = xhat_i[iter_ind+1,3]
        
        xnrm = ### YOUR CODE HERE ###

        # Update the iteration number
        iter_ind += 1

        if iter_ind==49:
            print("Number of iterations too large, check initial values.")

    xhat = xhat_i[iter_ind,:]
    xhat_i = xhat_i[0:iter_ind+1, :]
    return xhat, Sigma_Xhat, xhat_i, iter_ind, J

Task 3.2:

Apply Gauss–Newton iteration on your model (run the code you completed above). For each unknown parameter, plot your estimates versus the iteration number (horizontal axis: iteration number, vertical axis: parameter estimate).

Choose an appropriate number of iterations to clearly demonstrate the convergence behavior of your estimates.

x0 = [do_i, v_i, a_i, A_i]

[xhat_gnss, Sigma_Xhat_gnss, xhat_i_gnss, niter_gnss, J_final_gnss] = nlsq_iteration(### YOUR CODE HERE ###)

print('\n GNSS Results for each iteration (#Iterations =', niter_gnss, ')')
print(xhat_i_gnss)

Note: In the code below we will create an interactive plot. Note that these do not work/render online. You have to download the notebook and run it locally to see the interactivity.

iteration_slider = widgets.IntSlider(value=0, min=0, max=niter_gnss, step=1, description='Iteration:')

def plot_fit_iteration(iteration, xhat):
    yhat = forward_model(gnss_doy, xhat[iteration,0], xhat[iteration,1], xhat[iteration,2], xhat[iteration,3])
    plt.figure(figsize=(15,3))
    plt.plot(gnss_dates, gnss_obs, 'co', mec='black', label='Observations')
    print(yhat.shape)
    plt.plot(gnss_dates, yhat, 'r-', label='Model')
    plt.title('GNSS Observations and Model Fit')
    plt.xlabel('Dates')
    plt.ylabel('Displacement [m]')
    plt.ylim(np.min(gnss_obs)-3*np.std(gnss_obs), np.max(gnss_obs)+3*np.std(gnss_obs))
    plt.grid()
    plt.legend()
    plt.show()

def plot_parameters_iteration(iteration, xhat):    
    plt.figure(figsize=(15,3))
    plt.subplot(1, 3, 1)
    plt.plot(xhat[:, 0], linewidth=4)
    plt.plot(iteration, xhat[iteration, 0], 'ro', markersize=12)
    plt.title('Estimated offset')
    plt.ylabel('Offset [m]')
    plt.xlabel('Number of iterations [-]')
    plt.grid()
    plt.subplot(1, 3, 2)
    plt.plot(xhat[:, 1], linewidth=4)
    plt.plot(iteration, xhat[iteration, 1], 'ro', markersize=12)
    plt.title('Estimated v value')
    plt.ylabel('Estimated v value [m/year]')
    plt.xlabel('Number of iterations [-]')
    plt.grid()
    plt.subplot(1, 3, 3)
    plt.plot(xhat[:, 2], linewidth=4)
    plt.plot(iteration, xhat[iteration, 2], 'ro', markersize=12)
    plt.title('Estimated $a$ value')
    plt.ylabel('a value [days]')
    plt.xlabel('Number of iterations [-]')
    plt.grid()
interact(plot_fit_iteration, iteration=iteration_slider, xhat=fixed(xhat_i_gnss), yhat=forward_model(gnss_doy, xhat_i_gnss[iteration_slider.value,0], xhat_i_gnss[iteration_slider.value,1], xhat_i_gnss[iteration_slider.value,2], xhat_i_gnss[iteration_slider.value,3]));
interact(plot_parameters_iteration, iteration=iteration_slider, xhat=fixed(xhat_i_gnss));

Task 3.3

  • Does your iteration converge? If not, find out why and provide an explanation.

  • After how many iterations does it converge?

Part 4: Assess the Precision of the Estimates#

Task 4.1:

What is the quality of the final estimates?

Provide the full covariance matrix of your estimates, and give an interpretation of the numbers in the covariance matrix.

# ### YOUR CODE HERE ###

Task 4.2:

Compute the modeled displacements (\(\hat{\mathrm{y}}\)), and corresponding residuals (\(\hat{\mathrm{\epsilon}}\)). Visualize the results in two graphs and add the confidence bounds (\(t\)-versus-displacement and \(t\)-versus-residuals).

Also create a histogram of the residuals where you plot the normal distribution (which you can estimate from the histogram) as well and report the mean and sigma of the residuals.

yhat_gnss_nonlinear = forward_model(### YOUR CODE HERE ###)
ehat_gnss_nonlinear = plot_residual(### YOUR CODE HERE ###)

Task 4.3:

Answer the following questions:

  1. Do you see any systematic effect?
  2. Give your interpretation for any discrepancy between observations and the model?
  3. What is the mean value of the residuals and what does this value tells you?
  4. And what is the empirical standard deviation of the residuals? Is it as expected?

Part 5: Model Testing#

You would now like to test the validity of your model using the overall model test, with a false alarm probability of 95%. Is the null hypothesis (default model is correct) accepted?

Task 5.1:

Conduct the Overall Model Test (OMT) to evaluate the statistical consistency of your non-linear model. Calculate the test statistic (\(T_{\text{detection}}\)) and determine the critical value (\(T_{\text{critical}}\)) at a 95% confidence level, using the model redundancy (\(r\)) as the degrees of freedom. Report the calculated test statistic and state whether you accept or reject the nullhypothesis (\(H_0\)).

# Dimensions of design matrix
m,n = ### ### YOUR CODE HERE ### ###

# Compute redundancy
r = ### ### YOUR CODE HERE ### ###

# OMT test statistics
t_detection = ### ### YOUR CODE HERE ### ###

# OMT threshold
T_detection = ### ### YOUR CODE HERE ### ###

print(f'\nTest statistic is {t_detection:.2f}, threshold is {T_detection:.2f}.') 

if ### YOUR CODE HERE ###:
    print('We accept the null hypothesis H0')
else:
    print('We reject the null hypothesis H0')

Model under \(H_a\)#

As shown in the residual plot of the nonlinear model fitting, a shift may happen around 2024-07, which may explain the rejection of the null hypothesis. To test whether this is true and if so, to identify the epoch where the shift happens, you need to execute the Generalized Likelihood Ratio Test (GLRT). The following model under alternatives hypotheses \(H_a\) needs to be considered, where the shift happens at \(k\)-th epoch, the observation equations under \(H_a\) can be written as

\[\begin{split} d = d_0 + v \ (1-\exp\left(\frac{-t}{a}\right)) + A\sin(\frac{2\pi t}{365} - \phi),\quad t<k \\ d = d_0 + v \ (1-\exp\left(\frac{-t}{a}\right)) + A\sin(\frac{2\pi t}{365}- \phi) + s,\quad t>=k, \end{split}\]

where \(s\) is the size of the shift.

According to the observation equation under \(H_a\), please update your functions to compute the forward model, Jacobian, and non-linear iteration.

Task 5.2:

Implement the updated forward model and construct the new \(5 \times 1\) Jacobian matrix \((\mathbf{J})\) required for this system. Then, set up the Gauss-Newton iteration algorithm to solve for the five non-linear parameters.

def forward_model_with_shift(time, do_i, v_i, a_i, A_i, k, shift):
    
    """Compute the displacements based on our initial guess.
    """
    
    y_comp = do_i + v_i*(1-np.exp(-time/a_i)) + A_i * np.sin(2 * np.pi * (time-90) / 365)
    y_shift = np.zeros(len(time))
    y_shift[time>=k] = shift
    y_comp = y_comp + y_shift
    
    return y_comp


def jacobian_with_shift(time, do_i, v_i, a_i, A_i, k, shift):
    
    """Create the Jacobian matrix.
    
    The columns represent the linearized system of equations.
    """    
    J_c1 = ### YOUR CODE HERE ###
    J_c2 = ### YOUR CODE HERE ###
    J_c3 = ### YOUR CODE HERE ###
    J_c4 = ### YOUR CODE HERE ###
    J_c5 = ### YOUR CODE HERE ###
    J_c5[time>=k] = ### YOUR CODE HERE ###

    J = np.column_stack((J_c1, J_c2, J_c3, J_c4, J_c5))

    return J


def nlsq_iteration_with_shift(x0, day, y_obs, Sigma_Y, k):
    """Use Gauss-Newton iteration to find non-linear parameters."""

    xnrm = 1000 # initialize stop criteria

    xhat_i = np.zeros((50, 5))
    xhat_i[0,:] = x0

    do_i = xhat_i[0,0]
    v_i = xhat_i[0,1]
    a_i = xhat_i[0,2]
    A_i = xhat_i[0,3]
    shift_i = xhat_i[0,4]

    iter_ind = 0

    while xnrm >= 1e-12 and iter_ind < 49:

        # computed deformation yi based on 'estimates' 
        y_i = forward_model_with_shift(day, do_i, v_i, a_i, A_i, k, shift_i)
        
        dy = ### YOUR CODE HERE ###
        J = ### YOUR CODE HERE ###

        d_xhat, Sigma_Xhat = ### YOUR CODE HERE ###
        # Hint: re-use your function BLUE, above

        xhat_i[iter_ind+1,:] = xhat_i[iter_ind,:] + d_xhat.T

        do_i  = xhat_i[iter_ind+1,0]
        v_i  = xhat_i[iter_ind+1,1]
        a_i  = xhat_i[iter_ind+1,2]
        A_i  = xhat_i[iter_ind+1,3]
        shift_i = xhat_i[iter_ind+1,4]

        xnrm = ### YOUR CODE HERE ###

        # Update the iteration number
        iter_ind += 1

        if iter_ind==49:
            print("Number of iterations too large, check initial values.")

    xhat = xhat_i[iter_ind,:]
    xhat_i = xhat_i[0:iter_ind+1, :]
    return xhat, Sigma_Xhat, xhat_i, iter_ind, J

Part 6: Model identification with GLRT#

You know that the shift happens at the days in the range \(195\) to \(204\), which means you have 10 alternative hypotheses with \(195\leq k<=204\). Compute the test statistics of the Generalized Likelihood Ratio Test (GLRT) for each alternative hypothesis. Then find the day of shift by identifying the most likely alternative hypothesis.

Task 6.1:

Utilize the Generalized Likelihood Ratio Test (GLRT) to estimate the most likely day the shift occurred, knowing \(k\) is between day 195 and 204. For each possible day \(k\), run the Gauss-Newton algorithm and compute the GLRT test statistic, which measures the improvement over the no-shift model. Finally, identify the day \(\mathbf{k}_{\text{hat}}\) corresponding to the maximum test statistic and report this estimated day of shift.

x0 = [do_i, v_i, a_i, A_i, 0]

T_GLRT = np.zeros(10)
for k in range(195,205):

    [xhat_k, Sigma_Xhat_k, xhat_i_k, niter_k, J_final_k] = nlsq_iteration_with_shift(### YOUR CODE HERE ###)

    yhat_gnss_k = ### YOUR CODE HERE ###

    ehat_gnss_k = ### YOUR CODE HERE ###
    
    t_GLRT = t_detection - ehat_gnss_k.T @ np.linalg.inv(Sigma_Y_gnss) @ ehat_gnss_k
    T_GLRT[k-195] = t_GLRT
    
i_GLRT = np.argmax(T_GLRT)
k_hat = i_GLRT + 195

print('\nGLRT Test Statistics for each assumed change point k:')
print(T_GLRT)
print(f'\nThe estimated time of the shift is day {k_hat}, with test statistic {T_GLRT[i_GLRT]:.2f}.')

Task 6.2:

Plot the fitted model and the residuals of the selected alternative hypothesis.

[xhat_k, Sigma_Xhat_k, xhat_i_k, niter_k, J_final_k] = nlsq_iteration_with_shift(### YOUR CODE HERE ###)

yhat_gnss_k = forward_model_with_shift(### YOUR CODE HERE ###)
ehat_gnss_k = plot_residual(gnss_dates, y_gnss, yhat_gnss_k,
                             'GNSS', J_final_k, Sigma_Xhat_k,
                             Sigma_Y_gnss)

Note: In the code below we will create an interactive plot. Note that these do not work/render online. You have to download the notebook and run it locally to see the interactivity.

iteration_slider = widgets.IntSlider(value=0, min=0, max=niter_k, step=1, description='Iteration:')

def plot_fit_iteration_2(iteration, xhat):
    yhat = forward_model_with_shift(gnss_doy, xhat[iteration,0], xhat[iteration,1], xhat[iteration,2], xhat[iteration,3], k_hat, xhat[iteration,4])
    plt.figure(figsize=(15,3))
    plt.plot(gnss_dates, gnss_obs, 'co', mec='black', label='Observations')
    print(yhat.shape)
    plt.plot(gnss_dates, yhat, 'r-', label='Model')
    plt.title('GNSS Observations and Model Fit')
    plt.xlabel('Dates')
    plt.ylabel('Displacement [m]')
    plt.ylim(np.min(gnss_obs)-3*np.std(gnss_obs), np.max(gnss_obs)+3*np.std(gnss_obs))
    plt.grid()
    plt.legend()
    plt.show()

def plot_parameters_iteration_2(iteration, xhat):    
    plt.figure(figsize=(15,3))
    plt.subplot(1, 3, 1)
    plt.plot(xhat[:, 0], linewidth=4)
    plt.plot(iteration, xhat[iteration, 0], 'ro', markersize=12)
    plt.title('Estimated offset')
    plt.ylabel('Offset [m]')
    plt.xlabel('Number of iterations [-]')
    plt.grid()
    plt.subplot(1, 3, 2)
    plt.plot(xhat[:, 1], linewidth=4)
    plt.plot(iteration, xhat[iteration, 1], 'ro', markersize=12)
    plt.title('Estimated v value')
    plt.ylabel('Estimated v value [m/year]')
    plt.xlabel('Number of iterations [-]')
    plt.grid()
    plt.subplot(1, 3, 3)
    plt.plot(xhat[:, 2], linewidth=4)
    plt.plot(iteration, xhat[iteration, 2], 'ro', markersize=12)
    plt.title('Estimated $a$ value')
    plt.ylabel('a value [days]')
    plt.xlabel('Number of iterations [-]')
    plt.grid()
interact(plot_fit_iteration_2, iteration=iteration_slider, xhat=fixed(xhat_i_k));
interact(plot_parameters_iteration_2, iteration=iteration_slider, xhat=fixed(xhat_i_k));

Task 6.3:

Rerun the Overall Model Test (OMT) using the parameters estimated with the optimal shift day, \(\mathbf{k}_{\text{hat}}\), identified. Report the new test statistic value. Based on the comparison with the critical value, state whether you accept or reject the Null Hypothesis (\(H_0\)) and briefly explain what this result indicates about the final model’s validity.

# Dimensions of design matrix
m,n = ### YOUR CODE HERE ###

# Compute redundancy
r = ### YOUR CODE HERE ###

# OMT test statistics
t_detection = ### YOUR CODE HERE ###

# OMT threshold
T_detection = ### YOUR CODE HERE ###

print(f'\nTest statistic is {t_detection:.2f}, threshold is {T_detection:.2f}.') 
if ### YOUR CODE HERE ###:
    print('We accept the null hypothesis H0')
else:
    print('We reject the null hypothesis H0')

Task 6.4:

Compare the final, selected model (with shift \(s\) at day \(\mathbf{k}_{\text{hat}}\)) against the initial model under \(H_0\) (the model without shift) in terms of overall model fit and residual statistics. Explain in detail why it is essential to rely on the statistical tests (like the \(\chi^2\) and GLRT) rather than solely on visual inspection, especially when the simpler \(H_0\) model might appear “good enough” when simply plotted against the observations.

Task 7: Strategies to improve?#

Task 7.1:

In order to get a better fit to the data (smaller residuals) for this case study, which of the following strategies could help? (elaborate on your answer)

  1. better observations?
  2. a more complicated geophysical model?
  3. better initial values?
  4. more observations?

By Chengyu Yin, Lina Hagenah and Sandra Verhagen, Delft University of Technology. CC BY 4.0, more info on the Credits page of Workbook.