Exam 23/24 Q1ΒΆ

No description has been provided for this image No description has been provided for this image

CEGM1000 MUDE

Note about the printed exam (see PDF).

There were two mistakes:

  1. Questions 5, equation of CDF of $X$: second piecewise value should go to $x=2$ instead of 1
  2. Question 5h, multiple choice option C: "0.1.64" should read "1.64"

These mistakes were not announced during the exam because there was no projector or board to write them on, and it would have been more disruptive to the exam process to announce it and describe the changes. In addition, the mistakes did not have a significant impact on the computations.

Students who were taking this exam that started their MSc program in 2022 should have done the entire exam except question 1a.

Exercise 1: ProgrammingΒΆ

Consider the following file structure schematic, which describes the files on your computer:

β”œβ”€β”€ MUDE
    β”œβ”€β”€ Project_76
        β”œβ”€β”€ Difficult_Assignment.ipynb
        β”œβ”€β”€ auxiliary_files
            β”œβ”€β”€ data.csv
            β”œβ”€β”€ figure.png
            β”œβ”€β”€ functions.py

in which |-- Project_76 is a Git repository.

One of your group members just sent you a message on WhatsApp asking you to check their most recent commit to the file:

|-- Difficult_Assignment.ipynb

which they just committed to GitLab.

A. Which of the following would NOT be a way of incorporating the changes in the local repository on your computer?

  • Clone the repository
  • Pull from remote
  • Visit the file on gitlab.tudelft.nl
  • Pull from remote, then review the commits
  • Send the file over WhatsApp then save it in your local repository

Model answer

  • Visit the file on gitlab.tudelft.nl

You run the cells in

|-- Difficult_Assignment.ipynb

and get the following error:

image.png

B. What is the most likely problem for the shown error?

  • The filepath to functions.py is incorrect
  • The function find_squared_error() is undefined
  • The object a is a list
  • The variable data.mean() is undefined

Model answer:

  • The object a is a list

C. Describe in one sentence how you would fix this error. You may include short snippet of hand-written code, if you think it is necessary (you will not be graded on whether the syntax is 100% correct)

Model answer:

3 points if one of the following answer is given:

  • Change data.mean() to np.array(data).mean()
  • Change data.mean() to np.mean(data)
  • Replace input data with array, e.g., print(find_squared_error(np.array(a)))

You execute a cell with python code:

import awesome

in a Jupyter notebook for which the working directoy is part of an unknown file structure. So, this Jupyter notebook is NOT part of the file structure of the previous questions. The following error is returned:

image.png

D. Which of the following is NOT a plausible explanation for the shown error?

  • You have not installed the package with pip or conda yet
  • The package is only available via conda, not pip
  • You activated the wrong conda environment
  • The file awesome.py does not exist in your working directory

Model answer:

  • The package is only available via conda, not pip

Exercise 2: Uncertainty propagationΒΆ

A single beam echo sounder is used to measure the depth in a harbor. The principle is based on transmitting a sonar pulse and measuring the 2-way travel time. The depth can then be determined by multiplying half of the travel time with the propagation speed. The propagation speed $C$ of the water depends on the temperature $T$ and salt concentration $S$ as:

$ C = 1449.2 + 4.6T - 0.055T^2 + 0.0029T^3 + 1.34(S-35) $

All the variables are random variables, due to uncertainty in the temperature and salt concentration.

We are interested in the precision of the propagation speed, where it is known that:

$\mu_T = 15Β°C, \sigma_T=2Β°C$

$\mu_S = 0 kg/m^3, \sigma_S=0.5 kg/m^3$

$Cov(T,S)=0$

A. Approximate the standard deviation of the propagation speed (give your answer to 2 decimal places). Show how you arrived at your answer.

Model answer:

$C = q(T,S)$

$\sigma_C^2\approx \left( \frac{\partial q}{\partial T}\right)^2 \sigma^2_T + \left( \frac{\partial q}{\partial S}\right)^2 \sigma^2_S $

$\phantom{\sigma_C^2}= (4.6-0.11\mu_T+0.0087\mu_T^2)^2 \sigma^2_T + (1.34)^2\sigma^2_S$

$\phantom{\sigma_C^2}= (4.91)^2 2^2 + (1.34)^2 0.5^2$

$\sigma_C\approx 9.84 m/s$


B. In order to reduce $\sigma_C$, would it have more impact to reduce $\sigma_T$ or $\sigma_S$? Explain your answer.

Model answer:

$\sigma_T$: from result in question A. you can see that $\sigma^2_T$ is multiplied with a larger value than $\sigma^2_S$.

If due to an error in your calculation in question A. this was the other way around, the correct answer would be $\sigma_S$

Assume $T$ and $S$ are normally distributed. We are interested in the distribution of $C$. We apply Monte Carlo simulations to obtain a large number of sample values for $T$ and $S$ and compute the corresponding sample values of the propagation speed $C$. Below you see the resulting histogram (left) and the QQ-plot (right), where in both cases the normal distribution is used as the model distribution.

image.png


C. Based on the figures, do you conclude that the normal distribution can be used as the model distribution for the propagation speed? Explain your answer.

Model answer:

  • distribution is slightly skewed, according to QQplot not perfectly normal
  • if interested in 'average' then fit is quite good
  • fit is not so good in tails (in reality more data points in the right-tail)

D. The standard deviation of the simulated sample values of $C$ (from question C) is equal to 9.93 m/s. Give one reason why it is not the same as the answer you (should have) obtained in question A?

Model answer:

in question A, a first-order Taylor approximation is used, ignoring higher-order terms simulations will also not result in an exact result, but with a large number of samples it should become a very good approximation


Exercise 3: Observation TheoryΒΆ

Will the tunnel deform?

An engineer wants to test how a perfectly circular tunnel segment (see Figure) will deform when a uniform load is applied from above for a period of time.

The null hypothesis is that the segment will not deform such that cross-section remains circular over the full length of the segment.

The alternative hypothesis is that the segment deforms uniformly across the full length, such that each cross-section will have the same ellipse-shape.

After applying the load, the width $W_i [mm]$ and height $H_i [mm]$ of the cross-sections are measured at $l_i=0,1,…,10$ meters, such that we have 22 observations in total.

image.png


A. What is the functional model for the null-hypothesis? What is the unknown parameter??

Model answer:

$ \mathbb{E}(\begin{bmatrix}W_1\\ H_1 \\ W_2\\ H_2 \\ \vdots\\ W_{11}\\ H_{11} \end{bmatrix})=\begin{bmatrix}1\\ 1 \\ 1\\ 1 \\ \vdots\\ 1\\ 1 \end{bmatrix} d $

The unknown $d$ is the diameter, with units $mm$.


B.What is the functional model for the alternative hypothesis??

Model answer:

$\mathbb{E}(\begin{bmatrix}W_1\\ H_1 \\ W_2\\ H_2 \\ \vdots\\ W_{11}\\ H_{11} \end{bmatrix})=\begin{bmatrix}1 & 0 \\ 0&1 \\ 1 & 0 \\ 0&1 \\ \vdots\\ 1 & 0 \\ 0&1 \end{bmatrix} \begin{bmatrix} a \\ b \end{bmatrix} $

The unknown $a$ and $b$ are the width and height of the tunnel, with units $mm$.


C. After applying best linear unbiased estimation, we obtain $\hat \epsilon^T \Sigma_Y^{-1} \hat \epsilon = 31.5$ with the null-hypothesis, and $\hat \epsilon_a^T \Sigma_Y^{-1} \hat \epsilon_a = 29.6$ with the alternative hypothesis. Apply an appropriate test to decide between the null and alternative hypothesis, use a false alarm rate of 0.025. Show all your steps and explain what your decision will be based on the test outcome.

Model answer:

Generalized Likelihood Ratio Test: $ T_{q=1} = 31.5 - 29.6 = 1.9$

$ k_{\alpha}= 5.0239 $ (from table of $\chi^2$-distribution with $q=1$ and $\alpha = 0.025$

$ (T_q = 1.9) < (k_{\alpha}= 5.0239) $: null-hypothesis is accepted.


Height change over time

In another experiment, the goal is to investigate at which rate the height of the tunnel segment is changing over time once the load is applied. The rate is assumed to be constant.

Therefore the change of height $H_i [mm]$ with respect to the known initial height is measured at $t=1,2,…,4 months$ after the load is applied. The functional model is thus given by:

$\mathbb{E}(\begin{bmatrix}H_1\\ H_2 \\ H_3\\ H_4 \\ \end{bmatrix})=\begin{bmatrix}1 \\ 2 \\ 3 \\ 4 \\ \end{bmatrix} v $

with $v$ the unknown rate at which the height is changing in $mm/month$.

The first 2 measurements have a standard deviation of $\sigma$, the last 2 measurements have a standard deviation of $0.5\sigma$ due to a change in instrument. All measurements are independent.

It is required to obtain a 96% confidence level of $\hat v \pm 0.2 mm/month$.

D. What should $\sigma$ be to realize this? Round your answer to two decimal places.

Model answer:

$\sigma_{\hat{v}} = (\mathrm{A}^T \Sigma_Y^{-1}\mathrm{A})^{-1} = \sigma^2/105$

$\alpha = 0.04$, in table standard normal distribution look up value for $0.5\alpha$: $k=2.055$

$CI = 0.2 = k \cdot \sigma_{\hat{v}} = 2.055 \cdot \sigma /\sqrt{105}$

$\sigma = 1.00 \text{ mm}$


Exercise 4: Numerical modellingΒΆ

Given the differential equation:

$\frac{df(x)}{dx}=g(f(x))$

with

$g(f(x))=-f(x) \cdot \frac{cos(\pi f(x))}{3}$

In this assignment you'll apply numerical methods to solve a linearised version of this differential equation.


A. Find the taylor series expansion of $g(f(x))$ as a function of $f(x)$ about the point $f(x)=4$. Give the taylor series expansion up to and including both the first and second order. Calculate all derivatives and simplify your expression.

Model answer:

$g\left( {f\left( x \right)} \right) \approx {\left. {g\left( {f\left( x \right)} \right)} \right|_{f\left( x \right) = 4}} + {\left. {{{dg\left( {f\left( x \right)} \right)} \over {df\left( x \right)}}} \right|_{f\left( x \right) = 4}}\left( {f\left( x \right) - 4} \right) + {1 \over 2}{\left. {{{{d^2}g\left( {f\left( x \right)} \right)} \over {d{{\left( {f\left( x \right)} \right)}^2}}}} \right|_{f\left( x \right) = 4}}{\left( {f\left( x \right) - 4} \right)^2}$

$g\left( {f\left( x \right)} \right) \approx {\left. { - f\left( x \right) \cdot {{\cos \left( {\pi \cdot f\left( x \right)} \right)} \over 3}} \right|_{f\left( x \right) = 4}} + {\left. {\left( { - {{\cos \left( {\pi f\left( x \right)} \right)} \over 3} + f\left( x \right) \cdot {{\sin \left( {\pi f\left( x \right)} \right)} \over 3} \cdot \pi } \right)} \right|_{f\left( x \right) = 4}}\left( {f\left( x \right) - 4} \right) + {\left. {{1 \over 2}\left( { {{\sin \left( {\pi f\left( x \right)} \right)} \over 3} \cdot \pi + {{\sin \left( {\pi f\left( x \right)} \right)} \over 3}\pi + f\left( x \right) \cdot {{\cos \left( {\pi f\left( x \right)} \right)} \over 3} \cdot {\pi ^2}} \right)} \right|_{f\left( x \right) = 4}}{\left( {f\left( x \right) - 4} \right)^2}$

$g\left( {f\left( x \right)} \right) \approx {\left. { - f\left( x \right) \cdot {{\cos \left( {\pi \cdot f\left( x \right)} \right)} \over 3}} \right|_{f\left( x \right) = 4}} + {\left. {\left( { - {{\cos \left( {\pi f\left( x \right)} \right)} \over 3} + f\left( x \right) \cdot {{\sin \left( {\pi f\left( x \right)} \right)} \over 3} \cdot \pi } \right)} \right|_{f\left( x \right) = 4}}\left( {f\left( x \right) - 4} \right) + {\left. {{1 \over 2}\left( { {{\sin \left( {\pi f\left( x \right)} \right)} \over 3} \cdot \pi + {{\sin \left( {\pi f\left( x \right)} \right)} \over 3}\pi + f\left( x \right) \cdot {{\cos \left( {\pi f\left( x \right)} \right)} \over 3} \cdot {\pi ^2}} \right)} \right|_{f\left( x \right) = 4}}{\left( {f\left( x \right) - 4} \right)^2}$

$g\left( {f\left( x \right)} \right) \approx - 4 \cdot {{\cos \left( {\pi \cdot 4} \right)} \over 3} + \left( { - {{\cos \left( {\pi \cdot 4} \right)} \over 3} + 4 \cdot {{\sin \left( {\pi \cdot 4} \right)} \over 3} \cdot \pi } \right)\left( {f\left( x \right) - 4} \right) + {1 \over 2}\left( {4 \cdot {{\cos \left( {\pi \cdot 4} \right)} \over 3} \cdot {\pi ^2}} \right){\left( {f\left( x \right) - 4} \right)^2}$

$g\left( {f\left( x \right)} \right) \approx - 4 \cdot {1 \over 3} + \left( { - {1 \over 3} + 4 \cdot {0 \over 3} \cdot \pi } \right)\left( {f\left( x \right) - 4} \right) + {1 \over 2}\left( {4 \cdot {1 \over 3} \cdot {\pi ^2}} \right){\left( {f\left( x \right) - 4} \right)^2}$

$g\left( {f\left( x \right)} \right) \approx - {{f\left( x \right)} \over 3} + {{2{\pi ^2}} \over 3}{\left( {f\left( x \right) - 4} \right)^2}$

Up to the first order: $- \cfrac{ f\left( x \right)}{3}$

Up to the second order: $-\cfrac{ f\left( x \right)}{3} + \cfrac{2 \pi^{2} \left( f\left( x \right) - 4\right)^{2}}{3}$,

which is equivalent to $\cfrac{2 \pi^{2} \left(f\left( x \right)\right)^{2}}{3} - \cfrac{ f\left( x \right)}{3} - \cfrac{16 \pi^{2} f\left( x \right)}{3} + \cfrac{32 \pi^{2}}{3}$


B. Discretise the differential equation with the taylor series up to and including the first order and apply the Forward Euler.

Model answer:

$\cfrac{{f_{n + 1}} - {f_n}}{\Delta x} = -\cfrac{{f_n}}{3}$

${f_{n + 1}} = {f_n}\left( {1 - \cfrac{\Delta x}{3}} \right)$


C. Assess the stability of using Forward Euler for discretisation. Is your solution conditionally or unconditionally stable? Include the stability assessment and the criterion of the stability for this case.

Model answer:

The Forward Euler scheme is conditionally stable. Repeated application of the Forward Euler scheme gives:

${f_{n + 1}} = {f_n}\left( {1 - \cfrac{\Delta x}{3}} \right)$

${f_{n + 1}} = {f_{n - 1}}\left( {1 -\cfrac{\Delta x}{3}} \right)\left( {1 - \cfrac{\Delta x}{3}} \right) = {f_{n + 1}} = {f_{n - 1}}{\left( {1 - \cfrac{\Delta x}{3}} \right)^2}$

${f_{n + 1}} = {f_0}{\left( {1 - \cfrac{\Delta x}{3}} \right)^{n + 1}}$

Stability is satisfied if: $\left| 1 - \cfrac{\Delta x}{3} \right| < 1$

$\Delta x < 6$


D. Which other methods for numerical derivatives could you apply (name at least 3 more methods), and what would be the effect on stability and accuracy? Explain without making any calculations.

Model answer:

Other methods include:

  • Backward Euler
  • Central Difference
  • Second-derivative approximation
  • Mid-Point Quadrature RK2
  • Implicit Mid-Point RK2 (Guass-Legendre)
  • 4th order explicit Runga-Katta
  • Crank-Nicolson

In general, implicit methods are generally stable and more advanced methods reach higher accuracy.


Exercise 5: Probability and reliabilityΒΆ

$X$ and $Y$ are two (unit-less) quantities that have been obtained from field measurements in order to investigate certain properties.β€―The cumulative distribution function of $X$ is given by:

$$F_X(x)=\begin{cases}0, \ x<0\\\frac{x}{2}, \ 0 \leq x \leq 1 \\ 1, \ x>2 \end{cases}$$

A. What is $P[X \leq 0.5]$:

Model answer:

$P[X \leq 0.5]=x/2=0.25$


B. The engineer wants to design for the value of $X$ which is exceeded with a probability of $0.05$. What is the design value?

Model answer:

$P[X>x]=0.05 \to P[X \leq x]=1-0.05=0.95$

$0.95=x/2 \quad \rightarrow \quad x=1.90$


The distribution $F_Y(y)$ is unknown. However, the following statistics could be calculated from the observations in the field:

image.png

C. Which of the following distributions would be the best fit to the variable $Y$ based on the previous statistics:

  • Gaussian
  • Uniform
  • Lognormal

Model answer:

Based on the difference between the percentiles in the table, the data is right-skewed. Moreover, it is bounded in 0. Thus, we need a distribution which is bounded in 0 (Gaussian does not fulfill that) and with a right tail (only Lognormal fulfills this criterion).

  • Lognormal

D. Justify your previous choice.

Model answer:

Lognormal, because the variable is bounded in zero and presents positive skewness. If uniform is selected and it is mentioned that the variable is bounded in zero.


E. The engineer decides to fit by moments a Gumbel distribution to the observations. Compute the distribution parameters using the given formulas. Round the results to one decimal figure. (Hint: you may want to refer to the formula sheet)

Model answer:

Using the equations for the expectation and variance of the Gumbel distribution:

$ π‘‰π‘Žπ‘Ÿ(π‘Œ)= \frac{\pi}{6}\beta^2 \to \beta = \sqrt{\frac{6 Var(Y)}{\pi^2}}= \sqrt{\frac{6 \cdot 1.28^2}{\pi^2}} \approx 1$

$𝐸(π‘Œ)= \alpha+ \gamma \beta \to \alpha= 𝐸(π‘Œ)- \gamma \beta= 0.88βˆ’0.577 \approx 0.3$


If you have not computed the parameters for the Gumbel distribution in the previous question, use $\alpha=0.3$ and $\beta=1$ in the subsequent questions.

F. The engineer wants now to consider both $X$ and $Y$ in the design. Assume that $X$ and $Y$ are independent. What is $P[X\leq 0.5,Y\leq 1]$? Round your answer to two decimal places.

Model answer:

Assuming independence: $𝑃[𝑋 \leq 0.5, π‘Œ \leq 1]= 𝑃[𝑋 \leq 0.5]𝑃[π‘Œ \leq 1]$

Using the formula sheet:

$𝑧=\frac{π‘₯βˆ’\alpha}{\beta}=\frac{1βˆ’0.3}{1}=0.7$

$P[Y \leq 1] = e^{-e^{-z}} = e^{-e^{-0.7}} = 0.61$

Going back to the previous expression assuming independence:

$𝑃[𝑋 \leq 0.5, π‘Œ \leq 1]= 𝑃[𝑋 \leq 0.5]𝑃[π‘Œ \leq 1] = 0.25 \cdot 0.61=0.15$


You finally managed to get some paired observations of $X$ and $Y$ and plot them in the figure below.

image.png

G. Using the figure, what is $P[X> 0.5 \hspace{1mm} OR \hspace{1mm} Y > 1]$? The number of observations is 30.

Model answer:

$𝑃[𝑋>0.5 \ 𝑂𝑅 \ π‘Œ>1]=22/30=0.73$

$𝑃[𝑋>0.5 \ 𝑂𝑅 \ π‘Œ>1]=22/31=0.71$


Finally, the engineer decides to model the multivariate uncertainty of $X$ and $Y$ using a multivariate normal distribution, where $X$ and $Y$ are not independent. The engineer defines it in python as:

joint_distr = scipy.stats.multivariate_normal(mu, sigma)

H. What would be suitable values of mu and sigma in the code above? You don't need to compute each value but assess whether they are suitable.

  • mu = 0.88, sigma = 1.64
  • mu = 1, sigma = 1.28
  • mu = [1,0.88], sigma = [[0.33,0],[0,1.64]]
  • mu = [1,0.88], sigma = [[0.33,0.55],[0.55,1.64]]
  • mu = [1,0.88], sigma = [[0.33,0.55],[0,1.64]]

Model answer:

To define the multivariate distribution we need a vector of means (mu) and the covariance matrix (sigma). The covariance matrix has in the diagonal the variances of the variables. In the off diagonal, it contains the covariances.

Thus, options A and B are not feasible, since they give single values.

Option C would correspond to an independent case since the covariance is 0 and then, why to go for a multivariate distribution if there is not dependence? Also, it is stated that the variables are not independent.

Option E has an error in the covariance matrix, since covarianceXY is different to covarianceYX which is not possible.

  • mu = [1,0.88], sigma = [[0.33,0.55],[0.55,1.64]]