Notebook: factors influencing precision

Notebook: factors influencing precision#

In this notebook we will compute and plot confidence intervals for a constant velocity model.

Click –> Live Code on the top right corner of this screen and then wait until all cells are executed. You will only have to run the code, and optionally may want to change some variables.

Learning objectives:

evaluate the influence of changing some parameters: number of observations, standard deviation and sampling interval.
discuss why the changes occur.
discuss and understand the importance of correctly assessing the confidence intervals.

The code cells to import packages, with the functions used to apply BLUE and calculate the confidence intervals, and the plot_all function are hidden. Download the notebook to see the complete notebook.

# Import packages
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
%matplotlib inline

import micropip
await micropip.install("ipywidgets")
import ipywidgets as widgets
from IPython.display import display
import operator

Observation model#

The code cell below sets up our problem in three steps:

Set the observation model using the time when the observations were taken (epoch) to create our \(\mathrm{A}\)-matrix
Artificially create our random observations, y, with error e
Create the \(\mathrm{A}\)-matrix of our predictions

The simulated set of observations uses the random.seed method to ensure that the realizations in the random sample are always the sam. This is to ensure that everyone gets the same result; it is still a random sample.

t = np.vstack([1,3,3.5,5,5.5,7,9,10,10.5,
               11.5,13,14,16,16.5,18,19,
               20,22.5,24,25])
m = len(t)
A = np.hstack((np.ones((m,1)),t))
sigma = 2

np.random.seed(1613353294)
e = np.random.normal(0, sigma, m)
xo = [3,0.3]
y = A @ xo + e

m_pred = 30
t_pred = np.arange(1,m_pred+1)
A_pred = np.column_stack((np.ones(m_pred), t_pred))

def BLUE(A, y, Sigma_Y):
    """ 
    Function to calculate the Best Linear Unbiased Estimator
    Output: 
    xhat        estimated parameters
    Sigma_Xhat  covariance matrix of estimated parameters
    yhat        adjusted observations
    """
    inv_Sigma_Y = np.linalg.inv(Sigma_Y)
    Sigma_Xhat = np.linalg.inv(A.T @ inv_Sigma_Y @ A)
    xhat = Sigma_Xhat @ A.T @ inv_Sigma_Y @ y
    yhat = A @ xhat
    return xhat, Sigma_Xhat, yhat

def BLUE_predict(A_pred, xhat, Sigma_Xhat):
    """ 
    Function to calculate the fitted model (yhat) incl. predictions
    Output: 
    yhat_pred        predicted observations
    Sigma_Yhat_pred  covariance matrix of predicted observations
    """
    yhat_pred = A_pred @ xhat
    Sigma_Yhat_pred = A_pred @ Sigma_Xhat @ A_pred.T
    return yhat_pred, Sigma_Yhat_pred

def conf_interval(yhat, Sigma_Yhat, conf_level):
    """ 
    Function to calculate confidence interval of fitted model
    conf_level is the confidence level as percentage (e.g., 95)
    Output: 
    CI_yhat  confidence bound of yhat 
    k        CI_yhat[i] = k * sigma_yhat[i]
    """
    alpha = 1 - conf_level/100
    k = norm.ppf(1-0.5*alpha)
    CI_yhat = k * np.sqrt(np.diagonal(Sigma_Yhat))
    return CI_yhat, k

def plot_all(t, y, A, t_pred, A_pred, sigma, conf_level_1, conf_level_2):
    """ 
    Function to create plot with observations, fitted model 
    and confidence bounds (for 2 different confidence levels)
    """
    m = len(t)
    Sigma_Y = sigma**2*np.eye(m) 
    xhat, Sigma_Xhat, yhat = BLUE(A, y, Sigma_Y)
    yhat_pred, Sigma_Yhat_pred = BLUE_predict(A_pred, xhat, Sigma_Xhat)
    CI_yhat_1, k_1 = conf_interval(yhat_pred, Sigma_Yhat_pred, conf_level_1)
    CI_yhat_2, k_2 = conf_interval(yhat_pred, Sigma_Yhat_pred, conf_level_2)
    
    # create plot with observations, error bars and confidence intervals
    plt.figure(figsize = (10,6))
    plt.xlabel('t [s]')
    plt.ylabel('x(t) [m]')

    plt.plot(t, y, 'k*',
             label=f'observations with {conf_level_1:.1f}% conf.')
    plt.title('$m$='+str(m)+' and $\sigma$='+str(sigma)+' [m]')

    # plot observations with errorbars   
    plt.errorbar(t, y, yerr = k_1*sigma,
                 fmt='', capsize=5, linestyle='')

    # plot model and predicted observations
    plt.plot(t, yhat, 'go')

    # plot 95% confidence intervals
    plt.plot(t_pred, yhat_pred + CI_yhat_1, 'r',
             label=f'{conf_level_1:.1f}% conf.')
    plt.plot(t_pred, yhat_pred - CI_yhat_1, 'r')

    # plot 99% confidence intervals
    plt.plot(t_pred, yhat_pred, 'g')
    plt.plot(t_pred, yhat_pred + CI_yhat_2, 'r:',
             label=f'{conf_level_2:.1f}% conf.')
    plt.plot(t_pred, yhat_pred - CI_yhat_2, 'r:')

    plt.ylim(-2, 20)
    plt.legend()

Plot observations and confidence intervals (95% and 99%) with all observations#

You can also try different confidence levels.

# default model
plot_all(t, y, A, t_pred, A_pred, sigma, 95, 99)

Explain the shape of the confidence bounds and the diffence between 95% and 99% confidence interval.

Discussion

The 99% confidence interval is based on a larger probability that the observation error should be in the interval, so the interval will be wider.
Uncertainty in fitted line is due to uncertainty in estimated parameters: uncertainty in intercept manifests itself as a constant offset above and below the fitted line, the uncertainty in the slope implies uncertainty in the angle, which causes the widening towards the start and end.

What if we reduce the number of observations (only the central 10)#

You can also try to select a different (e.g., even shorter) time range by selecting a subset of the observations.

t_2 = t[5:15]
y_2 = y[5:15]
A_2 = np.hstack((np.ones((len(t_2),1)),t_2))

plot_all(t_2, y_2, A_2, t_pred, A_pred, sigma, 95, 99)

Compare with the first figure and discuss the differences.

Discussion

Since we have a different set of observations, the estimates will be different.
With fewer observations, the precision will be worse, hence the wider confidence intervals.
Since all observations are in the centre, there is much more uncertainty in the predicted values outside the range of observation times.

What if we thought we had better precision (factor 2 better)#

Note: in reality the precision of the observations is \(\sigma=2\)m, so in fact this will show the confidence intervals based on an incorrect stochastic model.

We can investigate different precision by changing the standard deviation.

sigma_wrong = 1
plot_all(t, y, A, t_pred, A_pred, sigma_wrong , 95, 99)

Compare with the first figure.

The fitted line is the same, why is it not affected?
Discuss the differences in confidence intervals.

Discussion

Only if all observations are independent and have the same variance, the BLU estimate will be identical to the ordinary least-squares estimate (variance cancels). However, the precision will not be the same!
With (assumed) higher precision, the confidence intervals will be tighter (errors expected to be smaller).

Note again: in this case the assumed precision is too optimistic, which is visible since the fitted model does not go through many of the blue error bars.

What if we have a different sampling interval?#

t_3 = np.vstack([7,8,8.5,9,9.5,10,10.5,11,11.5,12,13,
                 13.5,14.5,15,15.5,16,16.5,17.5,18.5,19])
A_3 = np.hstack((np.ones((len(t_3),1)),t_3))  

plot_all(t_3, y, A_3, t_pred, A_pred, sigma, 95, 99)

Compare with the first figure.

Discuss the differences. Especially compare the width of the confidence intervals at the centre and at the end of the time interval.
Is it better or not to have observations with smaller sampling interval?

Discussion

The number of observations is the same. The confidence interval in centre is tighter due to the smaller sampling interval, since here the the model is better constrained by the surrounding observations. However, outside the observation interval the uncertainty increases more rapidly.
Observations on a larger range of observation times will especially contribute to less uncertainty in the slope estimation, so would be favorable, especially if we want to make predicitions outside the observation interval.

Attribution

This chapter was written by Sandra Verhagen. Find out more here.

Notebook: factors influencing precision

Contents

Notebook: factors influencing precision#

Observation model#

Plot observations and confidence intervals (95% and 99%) with all observations#

What if we reduce the number of observations (only the central 10)#

What if we thought we had better precision (factor 2 better)#

What if we have a different sampling interval?#