Notebook: factors influencing precision#

In this notebook we will compute and plot confidence intervals for a constant velocity model.

Click –> Live Code on the top right corner of this screen and then wait until all cells are executed. You will only have to run the code, and optionally may want to change some variables.

Learning objectives:

  • evaluate the influence of changing some parameters: number of observations, standard deviation and sampling interval.

  • discuss why the changes occur.

  • discuss and understand the importance of correctly assessing the confidence intervals.

The code cells to import packages, with the functions used to apply BLUE and calculate the confidence intervals, and the plot_all function are hidden. Download the notebook to see the complete notebook.

# Import packages
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
%matplotlib inline

import micropip
await micropip.install("ipywidgets")
import ipywidgets as widgets
from IPython.display import display
import operator

Observation model#

The code cell below sets up our problem in three steps:

  1. Set the observation model using the time when the observations were taken (epoch) to create our \(\mathrm{A}\)-matrix

  2. Artificially create our random observations, y, with error e

  3. Create the \(\mathrm{A}\)-matrix of our predictions

The simulated set of observations uses the random.seed method to ensure that the realizations in the random sample are always the sam. This is to ensure that everyone gets the same result; it is still a random sample.

t = np.vstack([1,3,3.5,5,5.5,7,9,10,10.5,
               11.5,13,14,16,16.5,18,19,
               20,22.5,24,25])
m = len(t)
A = np.hstack((np.ones((m,1)),t))
sigma = 2

np.random.seed(1613353294)
e = np.random.normal(0, sigma, m)
xo = [3,0.3]
y = A @ xo + e

m_pred = 30
t_pred = np.arange(1,m_pred+1)
A_pred = np.column_stack((np.ones(m_pred), t_pred))
def BLUE(A, y, Sigma_Y):
    """ 
    Function to calculate the Best Linear Unbiased Estimator
    Output: 
    xhat        estimated parameters
    Sigma_Xhat  covariance matrix of estimated parameters
    yhat        adjusted observations
    """
    inv_Sigma_Y = np.linalg.inv(Sigma_Y)
    Sigma_Xhat = np.linalg.inv(A.T @ inv_Sigma_Y @ A)
    xhat = Sigma_Xhat @ A.T @ inv_Sigma_Y @ y
    yhat = A @ xhat
    return xhat, Sigma_Xhat, yhat
def BLUE_predict(A_pred, xhat, Sigma_Xhat):
    """ 
    Function to calculate the fitted model (yhat) incl. predictions
    Output: 
    yhat_pred        predicted observations
    Sigma_Yhat_pred  covariance matrix of predicted observations
    """
    yhat_pred = A_pred @ xhat
    Sigma_Yhat_pred = A_pred @ Sigma_Xhat @ A_pred.T
    return yhat_pred, Sigma_Yhat_pred
def conf_interval(yhat, Sigma_Yhat, conf_level):
    """ 
    Function to calculate confidence interval of fitted model
    conf_level is the confidence level as percentage (e.g., 95)
    Output: 
    CI_yhat  confidence bound of yhat 
    k        CI_yhat[i] = k * sigma_yhat[i]
    """
    alpha = 1 - conf_level/100
    k = norm.ppf(1-0.5*alpha)
    CI_yhat = k * np.sqrt(np.diagonal(Sigma_Yhat))
    return CI_yhat, k
def plot_all(t, y, A, t_pred, A_pred, sigma, conf_level_1, conf_level_2):
    """ 
    Function to create plot with observations, fitted model 
    and confidence bounds (for 2 different confidence levels)
    """
    m = len(t)
    Sigma_Y = sigma**2*np.eye(m) 
    xhat, Sigma_Xhat, yhat = BLUE(A, y, Sigma_Y)
    yhat_pred, Sigma_Yhat_pred = BLUE_predict(A_pred, xhat, Sigma_Xhat)
    CI_yhat_1, k_1 = conf_interval(yhat_pred, Sigma_Yhat_pred, conf_level_1)
    CI_yhat_2, k_2 = conf_interval(yhat_pred, Sigma_Yhat_pred, conf_level_2)
    
    # create plot with observations, error bars and confidence intervals
    plt.figure(figsize = (10,6))
    plt.xlabel('t [s]')
    plt.ylabel('x(t) [m]')

    plt.plot(t, y, 'k*',
             label=f'observations with {conf_level_1:.1f}% conf.')
    plt.title('$m$='+str(m)+' and $\sigma$='+str(sigma)+' [m]')

    # plot observations with errorbars   
    plt.errorbar(t, y, yerr = k_1*sigma,
                 fmt='', capsize=5, linestyle='')

    # plot model and predicted observations
    plt.plot(t, yhat, 'go')

    # plot 95% confidence intervals
    plt.plot(t_pred, yhat_pred + CI_yhat_1, 'r',
             label=f'{conf_level_1:.1f}% conf.')
    plt.plot(t_pred, yhat_pred - CI_yhat_1, 'r')

    # plot 99% confidence intervals
    plt.plot(t_pred, yhat_pred, 'g')
    plt.plot(t_pred, yhat_pred + CI_yhat_2, 'r:',
             label=f'{conf_level_2:.1f}% conf.')
    plt.plot(t_pred, yhat_pred - CI_yhat_2, 'r:')

    plt.ylim(-2, 20)
    plt.legend()

Plot observations and confidence intervals (95% and 99%) with all observations#

You can also try different confidence levels.

# default model
plot_all(t, y, A, t_pred, A_pred, sigma, 95, 99)

Explain the shape of the confidence bounds and the diffence between 95% and 99% confidence interval.

What if we reduce the number of observations (only the central 10)#

You can also try to select a different (e.g., even shorter) time range by selecting a subset of the observations.

t_2 = t[5:15]
y_2 = y[5:15]
A_2 = np.hstack((np.ones((len(t_2),1)),t_2))

plot_all(t_2, y_2, A_2, t_pred, A_pred, sigma, 95, 99)

Compare with the first figure and discuss the differences.

What if we thought we had better precision (factor 2 better)#

Note: in reality the precision of the observations is \(\sigma=2\)m, so in fact this will show the confidence intervals based on an incorrect stochastic model.

We can investigate different precision by changing the standard deviation.

sigma_wrong = 1
plot_all(t, y, A, t_pred, A_pred, sigma_wrong , 95, 99)

Compare with the first figure.

  • The fitted line is the same, why is it not affected?

  • Discuss the differences in confidence intervals.

What if we have a different sampling interval?#

t_3 = np.vstack([7,8,8.5,9,9.5,10,10.5,11,11.5,12,13,
                 13.5,14.5,15,15.5,16,16.5,17.5,18.5,19])
A_3 = np.hstack((np.ones((len(t_3),1)),t_3))  

plot_all(t_3, y, A_3, t_pred, A_pred, sigma, 95, 99)

Compare with the first figure.

  • Discuss the differences. Especially compare the width of the confidence intervals at the centre and at the end of the time interval.

  • Is it better or not to have observations with smaller sampling interval?

Attribution

This chapter was written by Sandra Verhagen. Find out more here.