Resit 23/24 Q2¶

Exercise 1: Programming¶

A. Which of the following are correct statements about pandas (the Python library)?

It is useful for processing time series data
It is useful for computing discrete fourier transforms (DFT)
It uses a dictionary-like syntax
It has marsupial properties
It can integrate well with numpy

Model answer:

It is useful for processing time series data
It uses a dictionary-like syntax
It can integrate well with numpy

B. A colleague approaches you with a Python problem, telling you that they have two packages that are not compatible. Which of the following would be a reasonable approach to address this?

Reinstall Anaconda
Restart your computer
Delete your changes and re-clone your Gitlab repository
Reactivate your mude environment
Create a new environment called mude2

Model answer:

Create a new environment called mude2

C. A snippet of code:

rng = np.random.default_rng()
print(type(rng))
print(rng.random())

produces output:

<class 'numpy.random._generator.Generator'>
0.01628535642909701

Write the output of the following snippet of code, along with a brief explanation (the syntax and value do not need to be perfect):

rng.integers(5)

Model answer:

This code will produce a random integer from 0 to 4 (inclusive)
e.g. 1

Exercise 2: Finite Volume Method¶

Consider the following discretized form of a partial differential equation (PDE), which includes only the component in the x-direction:

$$ \phi^{n+1}_{i,j} = \phi^{n}_{i,j} + \cfrac{C \Delta t}{\Delta x} \left[\cfrac{\partial\phi^{n}_{i,j,E}}{\partial x} - \cfrac{\partial\phi^{n}_{i,j,W}}{\partial x} \right] $$

where:

$i,j$ represent the x- and y-coordinate, respectively
$\phi^{n+1}_{i,j}$ is the value of the quantity of interest at the next time step
$\phi^{n}_{i,j}$ is the value of the quantity of interest at the current time step
$\phi^{n}_{i,j,E}$ and $\phi^{n}_{i,j,W}$ (not indicated in the figure) are the values at the current time step at the West and East face, respectively
$C$ is a known constant
$\Delta t$ is the time step size
$\Delta x$ is the space step size

The domain is divided into square ($\Delta x = \Delta y$) finite volumes, one of which, volume $i,j$ is represented below.

A. What partial differential equation has been discretized?

Advection
Convection
Diffusion
Confusion

Model answer:

Diffusion

B. Write an expression to approximate the quantity at the East face of the volume $\cfrac{\partial\phi_{i,j,E}}{\partial x}$, which is needed in order to solve the discretized form of the PDE.

Model answer:

$\cfrac{\partial\phi_{i,j,E}}{\partial x} = \cfrac{\phi_{i+1,j} - \phi_{i,j}}{\Delta x}$

Exercise 3: Finite Element Method¶

Consider the following code for the computation of an element stiffness matrix with the finite element method in 1D.

A. How many nodes does this element have?

Model answer:
3 nodes

B. How many integration points are used for computing the element matrix?

Model answer:
2 integration points

C. What is the corresponding PDE?

$-a\frac{\partial^2u}{\partial x^2} + bu=0$
$-a\frac{\partial^2u}{\partial x^2} =b$
$-b\frac{\partial^2u}{\partial x^2} =a$
$au - b\frac{\partial^2u}{\partial x^2} =0$

Model answer:

$-a\frac{\partial^2u}{\partial x^2} + bu=0$

Exercise 4: Signal Processing¶

We start from a cosine signal with unit amplitude and a frequency of $f_c=3Hz$, and sample it at $f_s=5Hz$, for a duration of $T=2$ seconds. The $N=10$ discrete time samples are input to the Discrete Fourier Transform (DFT) and we directly plot the magnitude (modulus) of the output, hence $|X_k|$ with $k=0,…,N−1$. Create the resulting plot.

The corresponding Python code is:

import numpy as np
from matplotlib import pyplot as plt

fc=3
fs=5
dt=1/fs
T=2
N=T*fs
t=np.arange(0,T,dt)
xt=np.cos(2*np.pi*fc*t)
abs_fft = np.abs(np.fft.fft(xt))
plt.stem(abs_fft)

Model answer:

Output frequency range by fft is $[0,f_s)$; $T = 2$ seconds, hence frequency resolution $Df = ½ Hz$. So, we see (discrete) frequencies $[0, ½, 1, 1½, 2, 2½, 3, 3½, 4, 4½]$ in Hz, i.e. up to $f_s (=5Hz)$; or, frequencies $[0, ½, 1, 1½, 2, 2½]$ in $Hz$, when given only up to $f_s/2 (2.5 Hz)$. As the sampling frequency $f_s (=5 Hz)$ is below $2*f_c (f_c=3 Hz)$, we get aliasing. The $3 Hz$ gets mapped onto $2 Hz$ (that’s why it occurs at index 4 in the graph, and also at index 6, which corresponds to frequency $3 Hz$, or, when the spectrum is taken symmetrically from $[-f_s/2,f_s/2)$, it corresponds to $-2 Hz$.

Exercise 5: Time Series Analysis¶

We intend to simulate a time series of 200 samples at 1-day intervals (so m=200 and the time unit is a day) using a first-order auto-regressive $AR(1)$ random process $s_t$ as follows: $$ s_t = \beta s_{t-1}+e_t $$

where $t=1,...,m$ and a certain value for $\beta$ are the given $AR(1)$ parameters. We further assume $E(s_t) = 0$ and $D(s_t)=\sigma^2=2$

We simulate data using a normal distribution. To do so, we use the above recursive form, which needs initialization. To initialize the first data, we use:

$$ s_1=s(t=1)=randn(1) $$

Further, the standard deviation of the random noise $e_t$ can be obtained from: $$ \sigma_e^2 = \sigma^2(1-\beta^2) $$

A time series simulation versus time is shown in the plot below. The horizontal axis shows the time in units of day.

The following is a printed array of 10 consecutive samples from the time series:

10 consecutive sample values from time series: 
[ 2.49474675 -1.99859861 2.40207336 -0.78048668 1.85367808 
-2.27074481 2.62934417 -2.4597126 2.15011297 -1.68199175]

Given the information and the plot above, answer the following questions.

A. What can you say about the type of noise?

White
Colored
Grey

Model answer:

Colored

B. Sketch the normalized auto-covariance function (ACF) of the generated time series above. Include the following in your plot:

clearly labelled axes (ACF and time lag)
a standard stem plot for each time lag (include at least 5 stems)
besides being within appropriate numerical bounds, the absolute value of the ACF is not important; however, make sure it is possible to clearly distinguish whether the value is positive, negative or zero
you do not need to include the confidence interval

Model answer:

C. What would the parameters of $x$ be? Stated in a different way: what components are clearly visible in the time series?

Noise
Correlation
Trend
Drift
Bias
Seasonality
Irregularities/outliers
Offset
Derivative

Model answer:

Noise
Trend
Seasonality
Offset

Exercise 6: Optimization¶

Consider the following optimization problem: $$ Maximize \, Z = 3x_1+2x_2 $$

Subject to: $$ 2x_1+x_2\leq18 \\ -4x_1+5x_2\leq10 \\ x_1, x_2 \geq0 \ $$ Convert the problem into its augmented form and then, if needed, find a next basic solution. Explain the process and the steps taken. After finding a new basic solution (if applicable), determine the current values of the objective function and all decision variables. State if this new solution is the optimal solution or not.

Model answer:

Convert to augmented form:

$$Z - 3x_1 - 2x_2 + s_1 + s_2 = 0$$

$$2x_1 + x_2 + s_1 = 18$$
$$-4x_1 + 5x_2 + s_2 = 10$$

Fill the table:

	Z	x1	x2	s1	s2	b
Z	1	-3	-2	0	0	0
s1	0	2	1	1	0	18
s2	0	-4	5	0	1	10

Since both coefficients of the objective functions are negative (-3 and -2), the solution is not optimal.
$x_1$ enters the basis because it has the most negative coefficient.

Apply the minimum ratio test:

For $s_1$: $\frac{18}{2} = 9$, for $s_2$: $\frac{10}{-4} = -2.5$.
$s_1$ leaves the basis ($s_2$ is not a candidate because it is negative).

Change the current system to an equivalent one with the following steps:

Divide the second row by 2 to get the $x_1$ coefficient equal to 1. Result:

$$0 \quad 1 \quad \frac{1}{2} \quad \frac{1}{2} \quad 0 \quad 9$$

Multiply the new second row by 4 and sum it to the third row to get the $x_1$ coefficient equal to 0. Result:

$$0 \quad 0 \quad 7 \quad 2 \quad 1 \quad 46$$

Multiply the same row by 3 and sum it to the first row to get the $x_1$ coefficient equal to 0. Result:

$$ 1 \quad 0 \quad -\frac{1}{2} \quad \frac{3}{2} \quad 0 \quad 27 $$

Finally, combine the new rows. The new table is this:

	Z	x1	x2	s1	s2	b
Z	1	0	-1/2	3/2	0	27
x1	0	1	1/2	1/2	0	9
s2	0	0	7	2	1	46

This is not the optimal solution yet, because it's a maximization problem and there is still a negative coefficient in the row of the objective function in the column of a non-basic variable.

Exercise 7: Machine Learning¶

A. Consider the following kNN model trained on 100 noisy observations of a target function:

where points in red represent the neighborhood used to make a prediction at $x=3$. From classical decision theory, we are interested in minimizing the expected loss $$ \mathbb{E}\left[L\right] = \int\int\left(y(x)-t\right)^2p(x,t)\,\mathrm{d}x\mathrm{d}t $$ which from variational calculus yields the result: $$ y(x) = \mathbb{E}_t\left[t\vert x\right] $$

Regarding the aforementioned decision theory concepts and the kNN model above, which __ONE of the following statements is true?__

Looking at how $\mathbb{E}\left[L\right]$ changes as a function of $k$ we see that using higher values of $k$ will in general lead to lower losses, as that corresponds to more flexible models
When using kNN to predict at the edges of the dataset (around $x=0$ or $x=6.3$), we expect the function $\mathbb{E}_t\left[t\vert x\right]$ to become much more sensitive to small changes in $x$ because there will be less samples in their neighborhood
We can decompose $\mathbb{E}\left[L\right]$ into a bias term, a variance term and a noise term. For kNN, the value of $k=1$ corresponds to the model with the highest possible bias, and therefore also the highest possible variance
The function $y(x)$ obtained by kNN is an approximation of $y(x)=\mathbb{E}_t\left[t\vert x\right]$: the distribution $p(t\vert x=3)$ is approximated by the samples in red, while the expectation is approximated by averaging over these samples

Model answer:

The function $y(x)$ obtained by kNN is an approximation of $y(x)=\mathbb{E}_t\left[t\vert x\right]$: the distribution $p(t\vert x=3)$ is approximated by the samples in red, while the expectation is approximated by averaging over these samples

B. Regarding the use of feedforward neural networks for regression, which ONE of the following statements is true?

$L_2$ regularization can be used to alleviate overfitting, consisting in the inclusion of a $\frac{\lambda}{2}\mathbf{w}^\mathrm{T}\mathbf{w}$ term to the loss function. The lower $\lambda$ is, the more rigid the model will become (higher bias)
The choice of activation function can make a neural network model more or less flexible. At the high-bias limit, setting the activations of all layers to linear will make the network behave exactly like a simple linear regression model
Neural networks can be seen as extensions of linear basis function models: in basis function models the mappings $\phi(\mathbf{x})$ are linear functions of $\mathbf{x}$ while in neural networks those can become nonlinear
Neural networks are universal approximators and can therefore fit any continuum function to any arbitrary level of accuracy. This means that by introducing enough flexibility to the network we can always bring $\mathbb{E}\left[L\right]$ as close to zero as we want

Model answer:

The choice of activation function can make a neural network model more or less flexible. At the high-bias limit, setting the activations of all layers to linear will make the network behave exactly like a simple linear regression model

C. Regarding the use of Stochastic Gradient Descent (SGD) for training regression models, which \textbf{ONE} of the following statements is true?

With minibatch SGD, each model update is performed with a number of samples larger than the original size of the dataset, which means some samples will appear more than once in the batch
The stochastic nature of SGDs comes from the fact that updates are done with random subsets of the dataset. Nevertheless, the algorithm guarantees the full dataset will always be seen by the model, at which point we say that an epoch has passed
An important parameter in SGD is the learning rate $\eta$. High values of $\eta$ lead to more gradual (slower) changes in $\mathbf{w}$ after every update. Lower values are therefore preferred because that leads to faster convergence
When using SGD, it is interesting to keep track of the training progress by computing the loss function on all training samples, even though model weights are updated exclusively on gradients computed using the validation dataset

Model answer:

The stochastic nature of SGDs comes from the fact that updates are done with random subsets of the dataset. Nevertheless, the algorithm guarantees the full dataset will always be seen by the model, at which point we say that an epoch has passed

Exercise 8: Extreme Value Analysis¶

General information about extreme value distributions is provided below. Think of it like a formula sheet. You may not need this information for all questions, and may not need all of it.

Recall that the Generalized Extreme Value distribution can be written as follows:

$$ P[X < x] = \exp\left( - \left[ 1 + \xi \frac{x - \mu}{\sigma} \right]^{-1/\xi} \right) $$

subject to the condition:

$$ \left( 1 + \xi \frac{x - \mu}{\sigma} \right) > 0 $$

where the random variable $X$ is defined by the location, scale, and shape parameters $\mu$, $\sigma$, and $\xi$, respectively.

The design value $x$ for annual (yearly) probability of non-exceedance $p_y$ is:

$$ x = \begin{cases} \mu - \frac{\sigma}{\xi}\left[ 1 - \left( -\ln(1-p_y) \right)^{-\xi} \right] & \xi \neq 0 \\ \mu - \sigma \ln\left(1-p_y\right) & \xi = 0 \end{cases} $$

with the design life probability over $DL$ years given as:

$$ p_{DL} = 1 - (1 - p_y)^{DL} $$

Recall that the Generalized Pareto distribution can be written as follows for a random variable $X$ with parameters $th$ (threshold), shape $\xi$, and scale $\sigma_{th}$:

$$ P[X < x | X > th] = \begin{cases} 1 - \left(1 + \xi \frac{x - th}{\sigma_{th}}\right)^{-1/\xi} & \xi \neq 0 \\ 1 - \exp\left(-\frac{x - th}{\sigma_{th}}\right) & \xi = 0 \end{cases} $$

The design value $x_N$ for an $N$-years return level is:

$$ x_N = \begin{cases} th + \frac{\sigma_{th}}{\xi}\left[(\lambda_N)^{\xi} - 1\right] & \xi \neq 0 \\ th + \sigma_{th} \ln(\lambda_N) & \xi = 0 \end{cases} $$

where $\lambda$ is the average number of excesses during the observation period, and $N$ is the $N$-years return level.

As a consultant, you need to design a water treatment plant. There are different variables that need to be considered to this end: (1) the incoming water discharge to treat denoted as $Q$, (2) the concentration of organic matter denoted as $C$, and (3) the concentration of inorganic matter denoted as $M$.

You start studying the discharge $Q$, and you want to model the maximum yearly discharge (our random variable of interest). You managed to get a time series and when you start processing the sampled observations, you detect a trend around the mean (see Figure).

A. Which of the following properties is clearly not satisfied by the data:

Independent
Identically distributed
Asymmetry
Mutually exclusive
Collectively exhaustive

Model answer:

Identically distributed

B. Can you apply Extreme Value Analysis to those observations of $Q$ as they are?

Model answer:

Since you are not sure how to proceed with the discharge $Q$ you move to the next variable, the concentration $C$.

C. You have been studying $C$ for a long time (many years) and you know the distribution it follows, $C∼F(c)$. What is the distribution of monthly maxima of $C$? Assume that $C$ are daily observations and $N$ is the number of observations per year.

$F(c)^N$
$F(c)^{N/12}$
$F(c)^{12N}$
$F(c)^{N/30}$

Model answer:

$F(c)^{N/12}$

Finally, you move to the last variable, the concentration $M$. A colleague has already done a study for you and characterized the monthly maxima of $M$ using a Generalized Extreme Value distribution (GEV).

Your colleague determined the following values of the parameters:
$\mu=3$, $\sigma=0.5$ and $\xi=0.35$, in mg/L.

D. Compute the design value for a return period of 25 years.

Model answer:

$$ x = \mu - \frac{\sigma}{\xi} \left[1 - \left(-\ln(1-p_y)\right)^{-\xi}\right] = 3 - \frac{0.5}{0.25} \left[1 - \left(-\ln\left(1-\frac{1}{25 \times 12}\right)\right)^{-0.25}\right] = 9.3 \text{ mg/L} $$

Exercise 9: Risk and Reliability¶

The risk curve illustrated below shows the risk of people getting sick after drinking groundwater that has been contaminated after a "factory spill," a situation that is barely tolerable.

Provide an example of what can be done to improve the situation and be sure to include a statement of how one or more of the following should be quantitatively changed: consequence, probability, limit line.

Your answer should look like this (one sentence each, max):

example of what to do
quantitative effect on consequence, probability or limit line

Example answers:

prevent people from drinking water after the spill; reduces consequences
install a spill containment system; reduces consequences
improve the reliabiltiy of the chemical thing; reduce probability of failure
change the policy to accept more people getting sick; increase limit line

Resit 23/24 Q2¶

.markdown {width:100%; position: relative} article { position: relative }

Exercise 1: Programming¶

Exercise 2: Finite Volume Method¶

Exercise 3: Finite Element Method¶

Exercise 4: Signal Processing¶

Exercise 5: Time Series Analysis¶

Exercise 6: Optimization¶

Exercise 7: Machine Learning¶

Exercise 8: Extreme Value Analysis¶

Exercise 9: Risk and Reliability¶