Resit 23/24 Q1¶

No description has been provided for this image No description has been provided for this image

CEGM1000 MUDE

Exercise 1: Programming¶

A. You are discretizing a single-valued differential equation $f(x,t)$ in time and pre-allocating a list or array to store the results at each time step. Which of the following commands is appropriate?

  • np.linspace(0, time_max, n_point)
  • np.discretize(0, time_max, n_point)
  • range(0, time_max)
  • range(0, time_max, n_point)

Model answer

  • np.linspace(0, time_max, n_point)

B. Which of the following methods is the best method to provide information on the rows and columns of a Pyhton object?

  • sort()
  • shape()
  • size()
  • dimensions()
  • rows_and_columns()

Model answer

  • shape()

C. Sketch the file structure of a typical weekly MUDE repository that includes at least three key file types or directories. Inlcude a short explanation and description of each item listed. Your answers should demonstrate how files in the MUDE repository were generalaly organized and used.

Note: if you are from the 2022-23 (MUDE year 1) you do not need to answer this question (write your year in the space provided).

Model answer

A good answer would have listed several file types or directories and clearly explained what they were used for. For example, a *.ipynb file was used to conduct analysis and execute code, or a *.md file was used to answer project questions.

A bad answer would have listed directories and files in an arbitrary way and provided a vague set of words that may or may not loosely correlate to specific things that happened to have occurred during the semester, but do not form any sort of coherent explanation about our activities.


Exercise 2: Estimation and uncertainty propagation¶

The height of a point is observed 5 times at a monthly interval, all observables are independent; see figure. It is assumed that the point is subsiding at a constant rate $v$.

A. In the figure below, sketch the estimated linear trend line and confidence intervals assuming:

  • the initial height $x_0$ at $t_1=0$ is unknown and estimated with best linear unbiased estimation; and
  • the subsidence rate is known to be $v=-0.2$ m/month.

Note that no calculations are required for this question. You may assume an arbitrary confidence level, and importance should be placed on the shape and relation to trend line.

Model answer

image.png


Assume that again the initial height is unknown, but now the subsidence rate is estimated with an independent set of observations, the precision of the estimated rate is $\sigma_{\hat{V}}=0.1$ m/month. The 'corrected' observables are this given by: $$Y_i = Y_{i,o}-\hat{V}\cdot t_i$$ The $Y_{i,o}$ are independent and have a precision of 0.1 m.

B. Specify the stochastic model with $Y=[Y_1, \dots, Y_5]^T$ as the observables.

Hint: first write the new observables vector as:

$ Y=T\begin{bmatrix}Y_{1,o}\\ Y_{2,o} \\ \vdots\\ Y_{5,o} \\ \hat{V}\\\end{bmatrix}$

where you have to define $T$.

Model answer

$\Sigma_{Y} = T \Sigma_{Y_o} T^T = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & -1 \\ 0 & 0 & 1 & 0 & 0 & -2 \\ 0 & 0 & 0 & 1 & 0 & -3 \\ 0 & 0 & 0 & 0 & 1 & -4 \\ \end{bmatrix} 0.1^2 I_6 \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & -1 \\ 0 & 0 & 1 & 0 & 0 & -2 \\ 0 & 0 & 0 & 1 & 0 & -3 \\ 0 & 0 & 0 & 0 & 1 & -4 \\ \end{bmatrix}^T$

$\Sigma_{Y_o}$ is the covariance matrix of the vector with original observables and estimated rate, which all have a precision of 0.1.


Exercise 3: Observation theory¶

The concentration of particulate matter (PM10) is observed 6 times in a row: 3 times with device A and 3 times with device B. The observations are all assumed to have the same standard deviation of 1 $\mu\hspace{-0.1cm}$ g/m3 and to be independent. Furthermore, it is assumed that the actual (true) concentration of PM10 did not change during the time that the 6 observations were made.

The observations from device A are: 26, 27 and 23 $\mu\hspace{-0.1cm}$ g/m3, respectively. The observations from device B are: 30, 29 and 27 $\mu\hspace{-0.1cm}$ g/m3, respectively.

It needs to be tested whether or not there is a constant offset between the observations of the 2 instruments. The null-hypothesis $\mathcal{H}_0$ is that there is no offset and the PM10 is constant during the time of the observations. The alternative hypothesis $\mathcal{H}_a$ reads that PM10 is constant during the time of the observations but that there is a constant offset $\nabla$ between the observations of both devices.

A. Specify the functional and stochastic model under the two hypotheses.

Model answer

$\mathcal{H}_0: \mathbb{E}(\begin{bmatrix}Y_1\\ Y_2 \\ Y_3 \\ Y_4 \\ Y_5 \\ Y_6 \\ \end{bmatrix})=\begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ \end{bmatrix} \begin{bmatrix} x \end{bmatrix}$

$\mathcal{H}_a : \mathbb{E}(\begin{bmatrix}Y_1\\ Y_2 \\ Y_3 \\ Y_4 \\ Y_5 \\ Y_6 \\ \end{bmatrix})=\begin{bmatrix} 1 & 0 \\ 1 & 0 \\ 1 & 0 \\ 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ \end{bmatrix} \begin{bmatrix} x \\ \nabla \end{bmatrix}$

For both hypotheses the stochastic model is $\Sigma_Y = I_6$.


B. Apply the appropriate test to check the validity of $\mathcal{H}_0$ with false alarm probability $\alpha=0.005$. Explain whether or not $\mathcal{H}_0$ is accepted?

Model answer

$\hat{x} = (A^T \cdot I \cdot A)^{-1} A^T \cdot I \cdot y = \frac{1}{6} \sum_{i=1}^6 y_i = 27$

$\hat{\epsilon} = y - A \hat{x} = \begin{bmatrix} -1 & 0 & -4 & 3 & 2 & 0 \end{bmatrix}^T$

$T_5 = \hat{\epsilon}^T \cdot I \cdot \hat{\epsilon} = 30$

$K_{\alpha}=16.75$ (look up in table for $\chi^2$-distribution with 5 degrees of freedom).

The null-hypothesis is rejected since $T_5>K_{\alpha}$.


C. Assume now that we used a too optimistic standard deviation, both devices in fact have a precision of 3 $\mu\hspace{-0.1cm}$ g/m3. Show/explain how this affects the decision in the previous question.

Model answer

Here the value of the test statistic was computed as $T_5 = \hat{\epsilon}^T \cdot I \cdot \hat{\epsilon}$, whereas it should have been $T_5 = \hat{\epsilon}^T \cdot \frac{1}{3^2} I \cdot \hat{\epsilon}$ (note that the estimated parameters are not affected by the different standard deviation). Hence, the test statistic should have been divided by 9, resulting in a value which would have been smaller than the critical value and thus in acceptance of $\mathcal{H}_0$.


Exercise 4: Numerical modelling¶

A. For an arbitrary function $f(x)$ that is continuously differentiable, derive the Forward Euler approximation, beginning with a Taylor series expansion. Express your answer in a series of steps with a short description (1 sentence max) of each step to go along with the derivation. Include a sketch that illustrates how the Forward Euler expression approximates the arbitrary function $f(x)$; label the figure appropriately.

Model answer

Note: the notation here can vary from your answer, as we used several ways of representing the increment during the year (e.g., $h$ versus $\Delta x$).

Given the Taylor series expanded about a point $a$, namely:

$f(a)+f'(a)(x-a)+\frac{f''(a)}{2!}(x-a)^2+O$.

Let's consider $(x-a)=\Delta x$ where $\Delta x$ is an incremental step in the $x$ direction and expand the series at any arbitrary point $x$. We can equivalently restate the Taylor series as

$f(x+\Delta x)=f(x)+f'(x)\Delta x + f''(x)\frac{\Delta x ^2}{2!} + ...$

Truncate the series to isolate the approximation of first derivative to find

$f(x+\Delta x) \thickapprox f(x) + f'(x) \Delta x$

Rearranging to approximate the first derivative (and dropping the $\thickapprox$ symbol), we find

$f'(x) = \frac{f(x+\Delta x)-f(x)}{\Delta x}$

In practice, we use this formulation to determine the next (forward) point by using this approximation to approximate the slope at a point, and use this to extrapolate the next by

$f(x_{n+1})=f(x_n)+f'(x_n)\Delta x$

Which effectively determines the point forward by approximating the slope at a point $x$ to estimate the function over a step $\Delta x$.

Graphically, the forward Euler estimates the slope at point $\Delta x$ and uses the slope to estimate the following point ($x + \Delta x$).

image.png


The next questions are completely unrelated to the previous question.

A colleague has mentioned that they are working on processing data from a sensor that appears to behave like a damped harmonic oscillator. They decided to model the sensor output using an explicit method. They mention that there is some instability, but don't know how to fix it. Consider the sketch provided by your colleague and then answer the following questions. The figure shows a numerical analysis ("Line 1") compared with the sensor output $y(t)$ ("Line 2").

instability_sketch.png

B. Describe what you would suggest to your colleague to help with their model, and justify your answer briefly (2 sentences max.).

Model answer

The colleague uses an explicit scheme, so switching to an implicit scheme like backward Euler is generally much more stable in these instances. You could also decrease the time step, which generally improves stability; we can also see clearly that the time steps in the sketch are large, which leads to a bad approximation of the function.


C. Which of the following should you evaluate to gain additinoal understanding about stability of the numerical approach?

  • Initial conditions of the system
  • Eigenvalue of the system
  • Time-step size
  • Order of the polynomial being modelled

Model answer

  • Time-step size

D. Recalling that the general expression for a damped oscillator is of the form

$$m\frac{d^2y(t)}{dt^2}+c\frac{dy(t)}{dt}+ky(t)=0$$

which method(s) might help to numerically model the signal? (you may selec more than one, or none)

  • Implicit (backward) Euler method
  • Riemann method
  • Runge-Kutta method
  • Trapezoidal rule
  • Gauss integration method
  • Simpson's rule

Model answer

  • Implicit (backward) Euler method
  • Runge-Kutta method

Exercise 5: Probability & Reliability¶

A climate scientist is performing a study about rainfall and the ice content in the clouds. $X$ is the daily rainfall in a city in mm. After several years of measurements, it has been discovered that the cumulative distribution function of $X$ is given by an exponential distribution with parameter $\lambda=0.05$.

A. What is $P[X \leq 20 \hspace{0.1cm}\text{mm}]$? Round to two decimal figures.

Model answer

$P[X \leq 20 \hspace{0.1cm}\text{mm}] = 1 - e^{- \lambda x} = 1 - e^{-0.05 \times 20}=0.63$.


B. The researcher wants to design for the value of $X$ which is exceeded with a probability of 0.01. What is the design values? Round to two decimal figures.

Model answer

$P[X \ge x] = 0.01 \to P[X \leq x] = 1 - 0.01 =0.99$

$0.99=1 - e^{- \lambda x} \to x=92.10$ mm.


The researcher also wants to investigate the content of ice in the clouds (kg/m2), denoted here by $Y$. After a field campaign, the following statistics of $Y$ are calculated.

Mean 55.5
Standard deviation 11.2
Minimum value 3.1
P25% 50.7
P50% 57.8
P75% 63.2
Maximum value 76.1

C. Which of the following distributions would be the best fit to the variable $Y$ based on the previous statistics:

  • Gaussian
  • Left-tailed Gumbel
  • Uniform

Model answer

  • Left-tailed Gumbel

D. Justify your choice with at least one reason that justifies your decision.

Model answer

Statistics show a left tail (compare the distance between the minimum value and the P25 and the P25 and P50). Neither the Uniform nor the Gaussian present a tail.


Now the researcher wants to dive into the relationship between $X$ and $Y$.

E. A high positive correlation between both random variables is observed. Provide an explanation of what that means in 1 or 2 sentences. Give a numeric range for positive correlation.

Model answer

If a high positive correlation exists between $X$ and $Y$, it means that high values of $X$ have a high association with high values of $Y$, and the other way around. This is, if a observe a high value of $X$, there is a very high change that I will also observe a high value of $Y$. Same holds for low values. Positive correlation is $\ge 0$ to $+1$.


F. The researcher models the joint probability of $X$ and $Y$ using a bivariate Gaussian distribution. Which probability is equivalent to the multivariate Gaussian cumulative distribution function $F_{X,Y}(X=x, Y=y)$?

  • $P[X \ge x \hspace{0.1cm}\textbf{AND}\hspace{0.1cm} Y \ge y]$
  • $P[X \ge x \hspace{0.1cm}\textbf{OR}\hspace{0.1cm} Y \ge y]$
  • $P[X \leq x \hspace{0.1cm}\textbf{AND}\hspace{0.1cm} Y \leq y]$
  • $P[X \leq x \hspace{0.1cm}\textbf{OR}\hspace{0.1cm} Y \leq y]$

Model answer

  • $P[X \leq x \hspace{0.1cm}\textbf{AND}\hspace{0.1cm} Y \leq y]$

The following plots show the possible contour plots of the bivariate PDF of $X$ and $Y$.

Use them to answer the following two questions.

image.png

G. In which plots are $X$ and $Y$ independent? (you may select more tha one, or none)

  • Plot a
  • Plot b
  • Plot c
  • Plot d

Model answer

  • Plot a
  • Plot d

H. In which plots do $X$ and $Y$ follow a multivariate Gaussian distribution? (you may select more than one, or none)

  • Plot a
  • Plot b
  • Plot c
  • Plot d

Model ansewr

  • Plot a
  • Plot b
  • Plot d