The code below defines a ParkingLot
Python class implementing the functioning of a parking lot in a simplistic way. The following description is provided from the documentation:
ParkingLot
objects are created by specifying the overall capacity of the parking lot, stored in self.total_spots
.self.spots_available
is used to keep track of the current number of free spots; this value changes in time based on whether a vehicle leaves and enters the facility. Upon creation, ParkingLot objects are empty, i.e., there are no cars in the parking lot.book_spot
and free_spot
are used to either decrease or increase by one the count of spots available, e.g., whenever a vehicle exists or enters the parking lot. Checks are put in place with if statements to ensure that the self.spots_available
counter does not go below zero or exceeds the parking lot capacity.visualize
is used to print the current number of available spots, or that there are no spots available. This method is called once when the object is first created, and whenever the number of available spots changes.A. There are four different errors in the implementation below. Localize these errors (you can use the line numbers as reference), define their type (e.g., syntax error, exception or logical error) and explain briefly how to fix them.
Model answer:
B. With respect to the ParkingLot
code, which lines define the constructor of the class?
Model answer:
C. With respect to the ParkingLot
code, which line contains an attribute of the class?
Model answer:
D. Imagine that a correct implementation of the ParkingLot
class is available in the my_classes.py
file. Given the code below,
which of the following statement is FALSE
Model answer:
E. We want to add a check on the code defining the ParkingLot
class to limit the overall capacity of the parking facility to 200 parking slots. The code should stop and return an error when this occurs.
Which of these statements would implement the check correctly?
Regarding these results, mark all the options that are TRUE; consider that each wrong answer will result in negative points, but the lowest score for this sub-question is 0 (we will not subtract points from the rest of the exam):
Model answer:
As an engineer in a consultancy firm, you need to perform the Extreme Value Analysis on the hourly wave height measurements from a buoy in the middle of the North Sea. You have applied both sampling methods in order to find a probability distribution to model the maximum wave height observed per year: Block Maxima and Peak Over Threshold. Recall that the objective is to select statistically independent and identically distributed samples; the Block Maxima method uses fixed increments of time and the Peak Over Threshold methods filter maxima using threshold values and a minimum time between peaks. In the following table, the outputs of the describe function (from pandas) applied on the sampled maxima are shown.
A. Using the summary provided, which statement is the most valid?
Model answer:
Assume that a colleague did the sampling using the Peak Over Threshold method and asks your advice on the result, as shown in the plot below. The plot represents the sampled extremes for 3 weeks using different values for the threshold and the declustering time.
B. Based on such plot, which pair of values would you choose?
Model answer:
C. Explain why.
Model answer:
Basic assumption for EVA is that the sampled events are independent and identically distributed. With th=3m and dcl=24h, three events are sampled within the same cluster (group) of extreme observations. That means that they come from the same storm and, thus, they are not independent.
For the same problem as in Part 2, assume that you now have the extreme observations. You want to fit a parametric distribution to them, and you can do it using different methods.
A. Which of the following methods cannot be applied to this end?
Model answer:
Since you are not sure about the distribution that would be a good model for your data, you fit a Generalized Pareto distribution and a Normal distribution. After fitting them, you assess their goodness of fit using the Q-Q plot and the Kolmogorov-Smirnov test.
B. Based on the results above, which distribution fits best your data? How have you determined it? Explain your answer.
Model answer:
GPD best fit. QQplot: GPD closer to 45-degrees line, so better fit to observations. KS test: p-value of GPD > p-value Normal
For the same problem as in Part 2 and 3, assume you choose the Gumbel distribution (see formula sheet).
A. Compute the distribution parameters using the information from Peak Over Threshold analysis. Round to two decimals.
Model answer:
From the table:
From the formula sheet:
Use the second equation to determine $\beta$:
Then use the first one to determine $\alpha$:
In case you did not get an answer to the previous question, assume you obtained α=1 and β=2.
B. Calculate P(X>5) using Gumbel distribution.
Results with $\alpha = 1$ and $\beta = 2$:
$z=(x- 𝛼)/𝛽 =(5-1)/2=2$
$P(X<5)=exp(-exp(-2))=0.87$
$P(X>5)=1-P(X<5)=0.13$
Results with $\alpha = 4.70$ and $\beta = 0.68$:
$z=(x- 𝛼)/𝛽 =(5-4.7)/0.68=0.44$
$P(X<5)=exp(-exp(-0.44))=0.53$
$P(X>5)=1-P(X<5)=0.47$
C. Calculate the value of the wave height (X) whose exceedance probability is 0.05.
Model answer:
Results with $\alpha = 1$ and $\beta = 2$:
$P(X>x)=0.05$
$P(X<x)=0.95$
$P(X<x)=exp(-exp(-z))=0.95$
$-exp(-z)=-0.051 => z=2.97$
$x = 2z + 1 = 6.94$
Results with $\alpha = 4.70$ and $\beta = 0.68$:
$P(X>x)=0.05$
$P(X<x)=0.95$
$P(X<x)=exp(-exp(-z))=0.95$
$-exp(-z)=-0.051 => z=2.97$
$x = 0.68z + 4.7 = 6.72$
You are also interested in the wave periods (Y) for your design. Assume that they follow an exponential distribution with λ=0.05.
D. Assume that wave heights (X) and wave periods (Y) are independent, compute the probability of P(X<5,Y<7).
Model answer:
Since they are independent $P(X<5,Y<7)= P(X<5)P(Y<7)$.
Results with $\alpha = 1$ and $\beta = 2$:
$P(X<5)=exp(-exp(-(5-1)/2)=0.873$
$P(Y<3)=1-exp(-0.05x7)=0.295$
$P(X<5)P(Y<7)=0.873 x 0.295 = 0.26$
Results with $\alpha = 4.70$ and $\beta = 0.68$:
$P(X<5)=exp(-exp(-(5-4.7)/0.68)=0.526$
$P(Y<3)=1-exp(-0.05x7)=0.295$
$P(X<5)P(Y<7)=0.526 x 0.295 = 0.16$
E. Assume that wave heights (X) and wave periods (Y) are independent, compute the probability of P(X<5∣Y<7).
Model answer:
Since they are independent $P(X<5|Y<7)= P(X<5)=0.873$ ($\alpha = 1$ and $\beta = 2$) or 0.525 ($\alpha = 4.70$ and $\beta = 0.68$)
F. However, you know wave heights and periods are not independent. Given the plot below, calculate the empirical value of P(X>6,Y>8). Note that there are 35 observations in the plot.
Model answer:
There are two points in the area of observations $X>6$ and $Y>8$ (upper right rectangle). Thus: $P(X>6,Y>8)=2/35=0.06$
A. When can you actually say that a numerical model is verified?
Model answer:
Right answer is 1. Answer 2 is for validation, while answer 3 is partly true for a verified model, it depends on how many simpler configurations you can compare with.
B. By definition, a sensitivity analysis implies:
Model answer:
C. A sensitivity analysis is not useful for:
Model answer:
D. Which of the following is generally true? A model that has this form: $\dot x+3x=0$ and that has been derived from Newton’s second law can be classified as
Model answer:
A. Using Taylor expansion, derive the backward Euler approximation for the first derivative. Show the truncation error introduced by the approximation.
Model answer:
$f(x−h)≈f(x)−hf′(x)+O(h^2)$
$hf′(x)=f(x)−f(x−h)+O(h^2)$
$f′(x)= \frac{f(x)−f(x−h)}{h} + O(h)$
Truncation error is O(h) due to division by h
B. Derive the discrete form of the following ODE using the backward Euler approximation and calculate first 5 time steps of the solution using dt=0.2:
$y′=y+tcost$
$y(0)=1$
Model answer:
$ y$n+1$ = y_n + \Delta t (y$n+1$ + t$n+1$cos(t$n+1$))$
$ y$n+1$ = (y_n + \Delta t $tn+1$ cos($tn+1$))/(1 - \Delta t)$
The distance x between a fixed benchmark and a moving benchmark on a landslide is measured at times t=0,2,4,6,8,10 months. The observations are shown in the figure. The precision of the first 3 observations is 0.5 mm, the precision of the last 3 observations is 0.2 mm. All observables are independent and normally distributed.
It is assumed that normally the distance is changing at a constant rate. It is known, however, that at t=5 months there was a sudden slip of the landslide, causing an additional change in distance at that time.
Two engineers are asked to estimate the parameters that characterize the motion of the landslide.
A. Give the functional model (in the form E(Y)=A⋅x) and stochastic model corresponding to the null hypothesis of Engineer 1.
Model answer:
For $t\leq4$ we have: $x(t) = x_0 + v_0 * t_i$
For $t\geq6$ we have: $x(t) = x_0 + v_0 * t_i + s$
$\Sigma_Y = diag(0.25,0.25,0.25,0.04,0.04,0.04)$
B. Give the functional model corresponding corresponding to the null hypothesis of Engineer 2 in the form E(Y)=A⋅x. Explain (describe) what the 4 unknown parameters are.
Hint: there are multiple options, you only need to give one option. For example you can choose either $x_s$ or $s$ as one of the unknown parameters.
Model answer:
Option 1:
For $t\leq4$ we have: $x(t) = x_0 + v_0 * t_i$
For $t\geq6$ we have: $x(t) = x_0 + v_s * (t_i-5)$
The unknown parameters are the distances $x_0=x(0)$ and $x_s=x(5)$ [mm], and the constant rates of change $v_0$ and $v_s$ [mm/month] before and after the slip, respectively.
Option 2:
same as option 1, but instead of $x_s$ estimate the distance change during the slip event $s$ (note that $x_s = x_0 + 5·v_0 + s$):
Option 3:
formulate the model with the same parameters as model 1, plus the differential velocity $v_d = v_s - v_0$ where $v_s$ is the constant rate of change after the slip event.
C. Which test do the engineers need to apply to check the validity of their model?
Model answer:
Overall model test
D. What are the corresponding critical values used by the engineers with α=0.05?
Model answer:
E. Do you generally expect the test statistic of Engineer 1 to be larger or smaller than the test statistic of Engineer 2? Explain.
Model answer:
With the model of Engineer 2 you have one extra parameter, hence smaller residuals (better fit) can be expected: hence the test statistic with model 2 is expected to be smaller. Fitting 2 lines with a different slope to the first set and the second set, should indeed result in a better fit. Note: that is also why the critical value is smaller.
A company is producing instruments for measuring the CO2 concentration [ppm] in air. Their top range model has a precision of 2 ppm (for a single measurement). Scientists need to know the CO2 concentration with a 98% confidence interval of ± 0.75 ppm. Therefore they use repeated measurements to estimate the concentration.
The company would like to improve their instrument, such that only 5 repeated measurements will be needed to meet the requirement. What is the required precision of a single measurement in that case?
Model answer:
$ k$0.5$\alpha$$ = 2.326$ from the standard normal distribution: $P(Z<k$0.5$\alpha$$)=1−0.5\alpha$
Required is k0.5$\alpha$$1/5\sigma$new = 0.75 , from which we can compute that the precision must be $\sigma$new = 0.72 ppm.
Monte Carlo Simulations are a broad class of computational algorithms that rely on repeated random sampling to obtain a set of numerical results, and ultimately to quantify the uncertainty in the output.
A. State-of-the art algorithms for generating random samples from a distribution are based on the optimized pseudo-number generation of identically distributed samples. If the input variable has a Gaussian distribution, which is the most appropriate sampling method to be used?
Model answer:
B. Why are the other two methods not appropriate? State one reason for each.
Model answer:
The second step of the Box-Mueller algorithm requires to compute
$r = \sqrt(-2 ln[1-u_1])$
$\theta = 2 \pi u_2$
with u being samples generated from the random variables:
$U_1 \approx U(0, 1)$
$U_2 \approx U(0, 1)$
r is often written in the code as
r = np.sqrt(-2 * np.log(U1))
A. Explain why it is correct to use the $log(u_1)$ instead of $log(1-u_1)$
Model answer:
Because u1 is a random number between 0 and 1, therefore also 1-u1 is a random number between 0 and 1.
B. You have generated enough samples from $N(\mu,\sigma^2)$, so that your output mean is now converged to a value. Make two sketches of the empirical PDFs (probability density functions) for samples with a very small variance, and the other with a large variance.
Model answer: