23/24 Q2¶

CEGM1000 MUDE

Part 1: Programming¶

1a

Review the code below, that has one part cut out (YOUR_CODE_HERE), then choose the answer that would generate the output provided (note the length of the blank space is irrelevant):

image.png

The output found was:

image-2.png

answer:

(D) i,j in enumerate(bands)


1b

Which of the following would be useful to include in a .gitignore file in a MUDE repository? (more than one answer is possible)

  • (A) *.csv

  • (B) .csv

  • (C) *.md

  • (D) *.ipynb_checkpoints

Answer:

(A) *.csv

  • This will ignore all csv files. Gnerally a good idea to store data files elsewhere

(D) *.ipynb_checkpoints

  • These files are only needed when running the notebook; for practical purposes they never need to be committed to a git repo

Why not B and C:

(B): This will only ignore a file named ".csv"

(C): Bad idea to ignore: This is how you submit your assignments


1c

You open a Jupyter notebook for Week 2.9, and see that a DataFrame is defined as df and used to store the data imported from a csv file. Choose the piece of code that provides summary information about the data, including relevant statistics

  • (A) df.head()

  • (B) df.describe()

  • (C) df.summarize()

  • (D) df.tail()

  • (E) df.sanitize()

Answer:

(B) df.describe()


Part 2: Finite Element Method¶

Heat Equation with Finite Element¶

Consider the heat equation on a 1-D domain $\Omega$

$$ \frac{\partial T}{\partial t}= \alpha \frac{\partial^2 T}{\partial x^2} $$

where T(x,t) is the temperature as a function of space and time and $\alpha$ the thermal diffusivity, with Dirichlet boundary conditions (prescribing temperature) on one end of the doman and Neumann boundary conditions (prescribing the heat flux) on the other end.

The strong form PDE can be rewritten as a weak form and then discretized in space with finite elements to arrive at a semi-discrete system of equations:

$$ \mathbf{A}_1 \frac{\partial \mathbf{v}}{\partial t}+ \mathbf{A}_2 \mathbf{v}= \mathbf{b} $$

where $\mathbf{A}_1$ and $\mathbf{A}_2$ are matrices and b and v are vectors.

Let $\mathbf{N}$, $\mathbf{B}$ and $\mathbf{C}$ be row vectrs containing the shape functions, shape function derivatives and second derivatives of shape functions respectivly, i.e. $\mathbf{B}=\frac{\partial \mathbf{N}}{\partial x}$ and $\mathbf{C}= \frac{\partial^2 \mathbf{N}}{\partial x^2}$.

2a

What is the meaning of $\mathbf{v}$?

  • (A) Heat fluxes between teh elements

  • (B) Heat fluxed at the integration points

  • (C) Temperature values at the nodes

  • (D) Temperature values at the integration points

Answer

(C) Temperature values at the nodes


2b

What are the contents of $\mathbf{A}_1$?

  • (A) $\int_{\Omega} \mathbf{N}^T \mathbf{B} d\Omega$

  • (B) $\int_{\Omega} \mathbf{N}^T \mathbf{N} d\Omega$

  • (C) $\int_{\Omega} \mathbf{B}^T \mathbf{B} d\Omega$

  • (D) $\int_{\Omega} \mathbf{B}^T \mathbf{N} d\Omega$

Anwser

(B) $\int_{\Omega} \mathbf{N}^T \mathbf{N} d\Omega$

2c

What are the contents of $\mathbf{A}_2$?

  • (A) $\int_{\Omega} \mathbf{N}^T \alpha \mathbf{N} d\Omega$

  • (B) $\int_{\Omega} \mathbf{B}^T \alpha\mathbf{B} d\Omega$

  • (C) $\int_{\Omega} \mathbf{N}^T \alpha\mathbf{C} d\Omega$

  • (D) $\int_{\Omega} \mathbf{C}^T \alpha\mathbf{N} d\Omega$

Anwser

(B) $\int_{\Omega} \mathbf{B}^T \alpha\mathbf{B} d\Omega$


2d

When would we get $\mathbf{b}$=0?

  • Always because there is no source term in the strong form equation

  • When Dirichlet boundary conditions are equal to zero

  • When Neumann boundary conditions are equal to zero


Part 3: Finite Methods¶

For different modelling approaches, continuous problems are discretized in different ways. Connect the three methods below to the type of discretization they are most strongly related to.

Use three straight lines to connect the term on the left that best matches the term on the right, below. If you make mistakes or need to correct your choice, use the space below to write a note that clarifies your final answer. If multiple answers are provided, you will not receive any credit.

Anwser

FEM --- Solution is discretized

FDM --- Derivatives are discretized

FVM --- Conservation is discretized


Part 4: Signal Processing¶

We start from a cosine signal with a frequency of $f=3$ Hz, and sample it as $f_s =n8$ Hz, for a duration of T=2s. The N=16 discrete time samples are input to the Discrete Fourier Transform (DFT) and we directly plot the magnitude (modulus) of the output, hence |$X_k$| with $k=0,...,N−1$ Create the resulting plot.

Anwser

image.png

  • No aliasing. Frequencies apear correctly at 3 and -3 Hz.

  • Output frequency range by fft is $[0,fs); T = 2$ seconds, hence frequency resolution Df = $\frac{1}{2}$ Hz. So, we see (discrete) frequencies [0, $\frac{1}{2}$, 1, $1\frac{1}{2}$, 2, $2\frac{1}{2}$, 3, $3\frac{1}{2}$, 4, $4\frac{1}{2}$, 5, $5\frac{1}{2}$, 6, $6\frac{1}{2}$, 7, $7\frac{1}{2}$] in Hz, i.e. up to fs (=8 Hz); or, frequencies [0, $\frac{1}{2}$, .., 4] in Hz, when given only up to fs/2 (=4 Hz). As the sampling frequency fs (=8 Hz) is larger than 2 * fc (fc=3 Hz), we do not get aliasing. Peak at index 6 in the graph corresponds to 3 Hz, and peak at index 10 corresponds to frequency 5 Hz, or, when the spectrum is taken/interpreted symmetrically from [-fs/2,fs/2), it corresponds to -3 Hz.


Part 5: Time-Series Analysis¶

We intent to simulate a time series of 200 samples at 1-day intervals (so m=200 and time unit is a day) using a first-order auto-regressive $AR(1)$ random process $s_t$ as follows:

$$ s_t= \beta s_{t-1}+e_t $$

where $t=1, ..., m$ and $\beta = -0.9$ is the given $AR(1)$ parameter. We further assume $E(s_t)=0$ and $D(s_t)=\sigma^2= 2$. The series is simulated using a normal distribution where it is initialized using $s_1 = s(t=1)= randn(1)$ as the first point. Further the standard deviation of $e_t$ can be obtained from $\sigma^2_e = \sigma^2 (1-\beta^2).

Given the information above, anwser the following True/False or multiple choice questions.

5a The time series is non-stationary

  • (A) True

  • (B) False

Anwser

(B) False


5b

The time series is characterized by colored noise

  • (A) True

  • (B) False

Anwser

(A) True

The time series is characterized by colored noise.

There is almost certainly going to be a dominant frequency, which is more or less set by the equally spaced day increments and the fact that value of beta is so high and negative: this means that from point to point the signal will oscillate back and forth a similarly values at each epoch.


5c

Select which of the four plots below shows the correct autocovariance function for the time series

image.png

Anwser

(C)

The beta of -0.9 indicates that there will be a strong and opposite correlation of each value with previous epoch. This eliminates C (and probably A). Since the epoch will also be related (and opposite) to previous values (although not explicit), we expect the covariance to decrease gradually (and alternate!), indicating B is the right answer. D is definitely not the right answer because a value of +/-1 indicates complete dependence; this seems to be the case of a time series where the values oscillate back and forth alternately to the exact same values.


Part 6: Optimization¶

Imagine you are managing a warehouse with limited storage space. The warehouse has different types of products, each requiring a specific amount of storage space. Your goal is to maximize the total value of the stored products while respecting the storage capacity constraints.

6a

You have a list of available products, $i$, each associated with a value of profit, and each product has a known storage space requirement per unit. The warehouse has a fixed total storage capacity.

Your task is to formulate a model that helps decide how many units of each product to store in the warehouse to maximize the total value of the stored products, while ensuring that the total storage space used does not exceed the warehouse capacity.

You are free to define and describe your own variables and symbols in your model formulation, as long as you explain what they represent.

Anwser

Let:

  • $x_i$ be the decision variable representing the number of units of product

  • $v_i$ be the profit of product $i$

  • $s_i$ be the storage space requirement for product

  • $C$ be the total storage capacity of the warehouse.

The objective is to maximize the total value:

Maximize $\sum_i v_i \cdot x_i$

​

Subject to the constraint that the total storage space used does not exceed the warehouse capacity:

$$ \sum_i s_i \cdot x_i \leq C $$

Additionally, the decision variables should be non-negative. They should also be considered integer if the type of product

$$x_i \geq 0 \;\; \forall i $$

6b

Discuss the type of variables that you proposed for the formulation - are there different options? What is that choice depending on?

Anwser

This problem can be formulated as an Integer Linear Programming (ILP) problem as well, where the decision variables are integers representing the number of units of each product to store. The goal is to find the values of $x_i$ that maximize the objective function while satisfying the given constraints.

The variables can also be continuous, it really depends on the amounts that we are talking about. If one speaks about cars in a garage, integer variables should be used, but small boxes in a big warehouse can probably be represented by continuous variables since the error that you make by considering fractional units is negligible.


Part 7: Machine Learning¶

7a

Minimizing the expected loss leads to the classical result $y(\mathbf{x}) = E_t [t|\mathbf{x}]$, the conditional expectation of $t$ gven $\textbf{x}$. How can this important result be interpreted?

  • (A) When making new predictions, it suffices to average $t$ over all possible values of $\mathbf{x}$

  • (B) Making predictions involves two steps: first fix a value of $\mathbf{x}$ and then average out the noise in $t$

  • (C) The value $y(\mathbf{x})$ for a new prediction should be picked to be one of the values of $t$ in the dataset

Anwser

(B) Making predictions involves two steps: first fix a value of $\mathbf{x}$ and then average out the noise in $t$


7b

You then consider a few options for how to model $y(\mathbf{x})$ When comparing parametric and non-parametric models, which one of the following statements is true?

  • (A) Non-parametric models can be efficient because new predictions only depend on the most recent value of $y(\mathbf{x})$ probed from the model

  • (B) Parametric $k$-Nearest Neighbors models can be described by two parameters, namely $k$ and the size of the neighborhood used to make predictions

  • (C) Parametric models can be advantageous because their training datasets can be fully discarded after training: predictions depend exclusively on $x$ and the trained parameters

  • (D) Neural networks can be seen as non-parametric models because the choice of activation function cannot be parametrized. One cannot assign a numerical value to this choice

Anwser

(C)Parametric models can be advantageous because their training datasets can be fully discarded after training: predictions depend exclusively on $x$ and the trained parameters


7C

You decide to go for a feedforward neural network. The next step is to train the model by making use of your limited dataset of N observations as efficiently as you can. Which of the following strategies makes most sense?

  • (A) Use all N samples for training, but making sure the model $y(x)$ is as flexible as possible in order to avoid overfitting

  • (B) Use 80 $\%$ of your N samples to train the model, $10\%$ for hyperparameter calibration and $10\%$ for a final assessment of the model

  • (C) Use all N samples and train two separate models: one as rigid as possible and the other as flexible as possible. Finally, combine the two models into one by averaging their weights

  • (D) Use $40\%$ of the N samples to train a first model and $40\%$ to train a second one with the same level of flexibility, with the remaining $ 20\%$ being left as validation data. When making new predictions, randomly pick one of the two models and use it.

Anwser

(B) Use 80 $\%$ of your N samples to train the model, $10\%$ for hyperparameter calibration and $10\%$ for a final assessment of the model


7d

With the correct dataset setup, you perform data normalization in order to facilitate training. About this part of the workflow, which of the following options would work best?

  • (A) Concatenate training, validation and test data into a single matrix and normalize the full dataset in one go

  • (B) Normalize the training and validation datasets separately, since they will be used to compute different loss functions

  • (C) First normalize only the training dataset, then use the resulting normalization coefficients for the validation and test datasets

Anwser

(C) First normalize only the training dataset, then use the resulting normalization coefficients for the validation and test datasets


7e

Finally, you use Stochastic Gradient Descent and observe how the training and validation losses of one of your neural networks evolve with the number of epochs. Both of the losses decrease at first but then the validation loss starts to increase. What makes most sense?

  • (A) Stop training and set the final model to be the one with the lowest historical validation loss

  • (B) Restart training but now use more epochs and a higher learning rate

  • (C) Add an extra hidden layer to the network, just before the output layer and with the same activation function, in order to avoid overfitting. Then resume training

Anwser

(A) Stop training and set the final model to be the one with the lowest historical validation loss


Part 8: Extreme Value Analysis¶

As a scientist, you are assessing with your team the performance of a thingamajig. As everybody knows, thingamajigs are very sensitive to wind, and you need this thingamajig to withstand high wind speeds.

Your colleague is then performing Extreme Value Analysis on the wind speeds in the location where the intervention takes place and found a time series of wind speeds of approximately 5 years.

General information about extreme value distributions is provided below. Think of it like a formula sheet. You may not need this information for all questions, and may not need all of it.

Recall that the Generalized Extreme Value distribution can be written as follows:

$$ P[X < x>] = exp(-[1+ \xi \frac{x-\mu}{\sigma}]^{-1/\xi}$ $(1 + \xi \frac{x-\mu}{\sigma})> 0 $$

whereby the random variable X is defined by the location, scale and shape parameters $\mu, \sigma , \xi$ respectively. The design value x for annual (yearly) probability of non-exceedance py is:

$$ x = \begin{cases} -\mu - \frac{\sigma}{\xi}[1-[-\ln (1-p_y)]^{-\xi}] & \text{if } \xi \neq 0 \\ % & is your "\tab"-like command (it's a tab alignment character) -\mu-\sigma \ln[1-p_y] & \text{if } \xi = 0 \end{cases} $$

with the design life probability over DL years given as:

$$ p_{DL}= 1-(1-p_y)^DL $$

Recall that the Generalized Pareto distribution can be written as follows for random variable X with parameters th (threshold), shape $\xi$ and scale $\sigma_{th}$:

$$ P[X < x | X > th]= \begin{cases} - 1 - (1 + \frac{\xi(x-th)}{\sigma_{th}})^{-1/\xi}& \text{if } \xi \neq 0 \\ % & is your "\tab"-like command (it's a tab alignment character) -1- exp(-\frac{x-th}{\sigma_th}) & \text{if } \xi = 0 \end{cases} $$

and the design value $x_N$ for an $N$ -years return level.

$$ x_N= \begin{cases} - th + \frac{\sigma_{th}}{\xi}[(\lambda N)^{\xi}-1] & \text{if } \xi \neq 0 \\ % & is your "\tab"-like command (it's a tab alignment character) - th + \sigma_{th} \ln(\lambda N) & \text{if } \xi = 0 \end{cases} $$

where $\lambda$ is the average number of excesses during the observation period, and N is the N -years return level.

8a

State which sampling method you would recommend, along with a (short!) justification why (2 sentences max.)

Anwser

If Yearly Maxima is applied, only 5 samples are extracted which are too few to quantify the distribution function. Peak Over Threshold maximizes the amount of data that you extract. Another possibility: block maxima using a smaller block size, such as monthly maxima, although not the optimal approach: you might sample extremes during months were there are no extremes (summer period).


Regardless of your advice, your colleague is familiar with Peak Over Threshold, so they decide to go for it. They ask you for advice to choose the parameters of the method: the threshold and the declustering time. They apply two sets of parameters: threshold = 12m/s and a declustering time (dl) of 12h and threshold = 12m/s and a declustering time (dl) of 48h. They show you a zoom of the time series with the extracted extremes (see Figure below).

image.png

8b

Based only on the above figure, what declustering time (dl) would you advise your colleague to use? State a specific value and provide a (short!) justification.

Answer

One of the basic assumptions of EVA is that the sampled events need to be independent and identically distributed. When using a dl = 12h, more than one value is sampled within the extreme events around the 17th and 21th March. (Partial credit if they chose the wrong one and justify that it provides more samples so more information is extracted from the time series.)


8c

Your colleague applied the Peak Over Threshold method and sampled 35 extremes in the time series of 5 years, to which a Generalized Pareto distribution is fit with scale parameter =1 and shape parameter $\lambda=0.2$. Remember that the threshold used was 12 m/s. Find the design value for wind speed for a design lifetime of 100 years. State your assumptions and calculation steps clearly.

Answer

Two options: assume a return period or assume a probability of failure along the design life to compute the yearly probability of failure. Solution below for a return period of 100 years. The average number of excesses is:

$$ \lambda= 35/5=7 \; \textrm{ excesses} $$

Applying the inverse of the Generalized Pareto distribution for $\xi \neq 0$

$$ x_N= th + \frac{\sigma_th}{\xi}[(\lambda N)^{\xi}- 1] $$$$ x_{100years}= 12+\frac{1}{0.2}[(7 \cdot 100)^{0.2}-1]= 25.5 m/s $$

Return period here is not provided and so needs to be assumed.


Part 9: Risk & Reliability¶

An event A is defined as $F_X(X>x_A)$ and event B is defined as $F_Y (Y >y_B)$, where X and Y are random variables defined by continuous parametric distributions (F is the CDF). Consider the probability of interest $P (A \cap B)$ (intersection), read the following two statements and select whether they are true or false.

9a This is a series system.

  • (A) True

  • (B) False

Answer

(B) False

The intersection is equivalent to a parallel system (both criteria must be satisfied).


9b

If there is negative dependence between X and Y, the probability would increase.

  • (A) True

  • (B) False

Answer

(B) False

Negative dependence decreases the likelihood of high values of one variable occurring with another, which, for this case, would also decrease the joint probability of A and B occurring.


9c

A regulatory authority is deciding how to establish risk-based criteria to protect against fatalities during train accidents (there have been too many over the last few decades).

Sketch an FN-curve that illustrates the following: a recently completed risk analysis; a limit line; clearly labelled x and y axes; specify units, but not values or equations. The y-axis should be labelled using both words and an equation. Points will be given for correctness, not beauty.

Answer

A sufficient sketch is illustrated below. Common mistakes include:

  • not showing a risk line that exceeds the limit line
  • not labelling the y-axis with the exceedance probability equation

The limit line did not need an equation, but in some cases credit was given if that was provided in place of the y-axis equation.

image.png