Exam 22/23 Q1¶

CEGM1000 MUDE

Part 1: Coding¶

A. Briefly describe the three main types of errors you encountered while working with Python:

  • syntax errors
  • exceptions
  • logical errors

Add one example for each of them in your description. [200 words maximum]

Model answer:

  • Syntax errors are mistakes in the use of the Python language, somewhat analogous to spelling or grammar mistakes in English or Dutch. They arise from language issues in code that the interpreter actually tried to evaluate. They parser displays a little arrow pointing at the earliest point in the line where the error was detected. Examples include leaving out a keyword or misspelling it.
  • Exceptions are errors that appear during the code execution, even if a statement or expression is syntactically correct. Developers can create exceptions themselves as well and raise them during code execution. Exceptions are handled in try – except blocks. Examples include ZeroDivisionError (e.g., dividing something by zero) or TypeError (e.g., adding a float to a string).
  • Logical errors are the most difficult to discover. They occur when the programs run without crashing, because no syntax or runtime error has occurred. Yet the code produces an incorrect result due to a mistake in the program’s logic. We won’t get an error message, and we need to resort to testing (e.g., assertion) to discover these bugs. Examples of logical errors include using the wrong variable name in an operation or indenting a block to the wrong level.

B. The code below computes an average test score from a series of individual tests carried out on a generic item of a certain production line (e.g., car components or appliances). Each test yields an integer number from 0 to 10. If the average score is >= 8, and if no individual score is below 5, the item is accepted for further processing; otherwise it is discarded.

The code below implements this computation, but has three errors. Identify the 3 errors (use the line numbers as reference) and explain briefly how to fix them.

image.png

Model answer:

  1. Line 15: There is no ":" after the for statement ; Syntax Error
  2. Line 18: The line total_score += score is wrongly indented ; Logical Error
  3. Line 22: if avg_score >= 8: should be if avg_score < 8: ; alternatively, invert the content of the if and the else blocks ; Semantic Error

C. Select which of the following 3 statements are TRUE concerning assertions and exceptions? (you can select more than one statement)

  1. We use assertions to catch generic Exceptions that can be raised at runtime
  2. The message in an assertion statement is optional
  3. An assertion statement is equivalent to: image-2.png

Remember, the generic assertion statement is defined as follows: image.png

Model answer:

  1. False
  2. True
  3. True

D. What will happen when running this piece of code?

image.png

(select only one answer)

  • The code will not run due to a Syntax Error
  • The code will run, but it will stop at runtime because we did not catch the right type of Exception, which is ZeroDivisionError
  • The code will run, but it will stop at runtime because we did not catch the right type of Exception, which is ValueError
  • The code will run, and it will print "End of code!"
  • The code will run, and it will print "You cannot divide by zero!" followed by "End of code!"

Model answer:

  • The code will run, and it will print "You cannot divide by zero!" followed by "End of code!"

Consider the code below, defining a generic Rocket class: image.png


E. Identify the following (you may use the line numbers as a reference):

  1. The constructor of the class
  2. A method of the class
  3. An object of the class
  4. An attribute of the class

Model answer:

  1. Lines 11-14 define the constructor of the class, e.g., the init method.
  2. Lines 7-9 define the move_up method of the class.
  3. new_rocket and rocket are instances (e.g., objects) of the Rocket class.
  4. x and y are attributes of the Rocket class.

F. What will happen when running the code in lines 21-22? (select only one answer)

  • The code will not run, unless we replace rocket.y with new_rocket.y
  • The code will run and it will print the same random numbers multiple times
  • The code will run and it will print different random numbers
  • The code will run, but it will raise a NameError exception

Model answer:

  • The code will run and it will print different random numbers

G. Imagine the Rocket class is now contained in a module named space.py. Which of the following are correct import statements? (you may select more than one statement)

  • import space
  • from space import Rocket as RocketFromSpace
  • import Rocket from space
  • from space import * as RocketFromSpace

Model answer:

  • import space
  • from space import Rocket as RocketFromSpace

Part 2: Probability¶

In the space below, describe the key steps that would be required to create a continuous parametric probability distribution that represents annual maximum wave height. State your assumptions, then list your analysis steps (be brief!), followed by a statement about whether or not there may be any limitations to this approach. An outline of what your answer should look like is shown here:

State assumptions (2 sentences max for this item)

  1. How to select maxima
  2. Choose a distribution
  3. How to check the distribution fit

State limitations (2 sentences max for this item)

Model answer:

  • Sampling: Block Maxima (BM) or Peak over Threshold (PoT)
  • Choose distribution: Generalized Pareto for excesses from POT; GEV for Block Maxima.
  • Mention a Goodness of Fit measure: QQplot, Kolmogorov-Smirnov test.
  • Limitations: PoT usually provides more data points than BM, so it is usually better for short time series. Also, threshold and declustering time need to be carefully selected in PoT to ensure that the sampled events are independent and identically distributed.

Part 3: Probability¶

X and Y are two (unit-less) quantities that have been measured in the lab in order to investigate certain properties. Rather than using a ‘standard’ parametric distribution, theoretical cumulative distribution functions for X and Y have been fitted satisfactorily to datao btained after many years of measurements. These are given by $F_X(x)$ and $F_Y(y)$ below: image.png


A. What is $P(X ≤ −0.99)$?

Model answer:

$P(x≤−0.99)=F_X(−0.99)=(−0.99+1)/2=0.005$


B. What is the design value of Y?

Model answer:

  • $P(Y>y)=1−F_Y(y)=1−(1−$e-y$)=1−1+$e-y$=$e-y$=0.001$
  • $y = - ln(0.001) = 6.91$

A certain group of engineers want to design for values of Y with exceedance probability of 0.001. That is $P(Y>y)=0.001$. Assume X and Y are independent.


C. What is $P(X≤−0.1,Y≤1)$? (approximate using at least 4 decimal places)

Model answer:

Independent, $P(X≤−0.1,Y≤1)=P(X≤−0.1)P(Y≤1)=0.45∗0.6321=0.2845$


D. What is $P(X>0.8∣Y>10)$?

Model answer:

Independent, $P(X>0.8│Y>10)=P(X>0.8)=0.1$


As it turns out, the joint cumulative distribution of X and Y (denoted as FXY) has also been approximated from measurements with sufficient accuracy and is given: image.png

E. What is $P(X>0.8,Y>4.6)$ in the case the joint distribution is as above? (approximate using at least 4 decimal places)

Model answer:

$P(X>0.8,Y>4.6)=1−P(X<0.8)−P(Y<4.6)+P(X<0.8,Y<4.6)=1−0.9−0.9899+0.8918=0.0019$


Part 4: Probability¶

A csv-file is used to create a pandas dataframe, df. The command df.describe() returns the following:

image.png


A. Based on the data summary above, which of the following distributions would be a good first choice to represent the data?

  • Normal/Gaussian
  • Exponential
  • Uniform
  • Gumbel

Model answer:

  • Gumbel

B. Which of the following is a suitable justification for your answer to the previous question?

  • The median is bigger than the mean
  • The data seems to have a right skew
  • The data seems to have a left skew
  • I remember this from the sample exam
  • None of the above

Model answer:

  • The data seems to have a right skew

C. You decide to use the Gumbel distribution to fit the data. Compute the distribution parameters using the data summary above

  • $\mu = \alpha + \beta\gamma$
  • $\sigma ^ 2 = (\pi ^2/6) \beta ^2$

$\beta = 112.8$ & $\alpha =147.7$ for $\gamma = 0.5772$


Part 5: Mathetical modelling¶

A. Assume you are given an assignment to model a specific system of interest (could be any system). To model it, you need to make some assumptions. Among the many criteria and constraints at stake, what is the main criteria that should drive your decision process to make such assumptions? (choose one answer)

  • Available time to develop the model
  • Available budget given, that covers the hours and any other expense of yours to actually develop the model
  • The purpose of the model, that should fit within a trade-off between complexity, affordability and accuracy
  • Data available to develop the model

Model answer:

  • The purpose of the model, that should fit within a trade-off between complexity, affordability and accuracy

B. When would you need to work on an inverse problem? (choose one answer)

  • Whenever you don’t have enough data available to calibrate the model
  • Whenever you want to improve your model
  • Whenever you want to identify from measured data some unknown values of specific properties/parameters of your model
  • Whenever the inverse problem is well-posed and a unique solution is available

Model answer:

  • Whenever you want to identify from measured data some unknown values of specific properties/parameters of your model

C. As engineer and scientist, what should be our final goal after we develop a model? (choose one answer)

  • To validate the model
  • To verify and calibrate the model
  • To perform a sensitivity analysis on the model
  • To verify and perform a sensitivity analysis on the model

Model answer:

  • To validate the model

Part 6: Numerical Methods¶

A. Using Taylor expansion, derive the forward Euler approximation for the first derivative. Show the truncation error introduced by the approximation.

Model answer:

  • $f(x+h)≈f(x)+hf′(x)+O(h^2)$
  • $hf′(x)=f(x+h)−f(x)+O(h^2)$
  • $f′(x)=(f(x+h)−f(x))/h+O(h)$ Truncation error is $O(h)$ due to division by ℎ

B. Derive the discrete form of the following ODE using the forward Euler approximation and calculate first 5 timesteps of the solution using dt = 0.2:

  • $y′=y+tcos(t)$
  • $y(0)=1$

Model answer:

  • $($yn+1$ - $yn$)/\Delta t = $yn$ + $tn$cos($tn$)$
  • yn+1 = yn$ + \Delta t($yn$ + $tn$cos($tn$))$

image.png


C. A colleague proposes using the backward Euler method. List one advantage and one disadvantage of this approach compared to forward Euler.

Model answer:

  • Advantage: the backward Euler is unconditionally stable, while the forward Euler method is only stable for small timesteps.
  • Disadvantage: the backward Euler method could result in nonlinear equations that need to be solved iteratively; or being an implicit method resulting in higher computational load.

Part 7: Sensing and Observation Theory¶

We can use two instruments to measure the water level at a given location and time. In the following, you may assume that all measurements are independent and the water level does not change between subsequent measurements. Instrument A has precision of 3 mm, instrument B has a precision of 8 mm.


A. It needs to be decided whether to take one measurement with the most precise instrument (option 1), or we take one measurement with each instrument and estimate the water level from both measurements (option 2). By how much will the precision of the estimated water level improve or deteriorate if option 2 is used instead of option 1?

Model answer:

image.png


B. Instrument A is expensive and complex to use, whereas instrument B is cheap and simple to use. One of the instruments must be selected for future use. Therefore it is assessed how many measurements are needed with each instrument to obtain an acceptable precision. What will be the 99% confidence interval of the estimated water level if we use 4 repeated measurements to estimate the water level with instrument A? And how many measurements do we need to take with instrument B to obtain at least the same (or tighter) confidence interval?

Model answer:

  • Correct formula for confidence bounds: $\hat x ± k·\sigma_\hat x$
  • Correct k-value: $P(Z>k)=0.5α≈2.58$ (answer between 2.57 and 2.58 is correct)
  • Precision with instrument A: $\sigma_\hat x = \sigma_A/\sqrt m_A = 3/2$
  • Precision with instrument B:$\sigma_\hat x = \sigma_B/\sqrt m_B = 8/\sqrt m_B$
  • Required: $k·\sigma_A/\sqrt m_A > k·\sigma_B/\sqrt m_B$
  • From this you can find $m_B = 29$

Part 8: Sensing and Observation Theory¶

Once per month the height of a fixed benchmark site on a volcano is determined using GNSS. The volcanologist uses as a null hypothesis that the height is constant. After 6 months (i.e., using 6 observations), she wants to test her hypothesis with the overall model test and a probability of false alarm of 0.025.


A. What would be the threshold value she needs to use?

Model answer:

For degrees of freedom use $q=m−n=6−1=5$, with $\alpha=0.025$ the threshold value from the table (Central $\chi ^2$-distribution) follows as $12.8325$.


B. Assume the null hypothesis is rejected. The volcanologist now wants to test whether a sudden height change occurred at a certain time $t_c$ after the first observation at $t_0$ (i.e., height is constant before $t_c$, then changes at $t_c$ and after that remains constant again). Specify the hypotheses and describe a testing procedure which allows to identify whether such deformation occurred and at which time it did. No need to do any calculations.

image.png Recursive w-test: calculate absolute value of $w$-test statistic with each $C_i$, the alternative hypothesis with maximum value is accepted IF it exceeds the threshold value.

Part 9: Simulation and Stochastic Processes¶

We are interested in modelling the local weather and decide to build a discrete-state, discrete-time Markov chain model for this purpose. We start simple by building an hour-by-hour simulator,in which we distinguish between 3 types of weather as process states:

  • S1: clear sky
  • S2: cloudy (but dry)
  • S3: rainy

We formulate the probabilities of the future state Sn+1 = {Sn+11, Sn+12, Sn+13} at time $n+1$ (not an exponent) using the following equation:

  • $p($Sn+1$) = T·p($Sn$)$

Using historic weather data for the same month as weare interested in, we can conclude the following:

  • whenever the sky is clear, it is cloudy the next hour 5% of the time, and it is raining the next hour 3% of the time.
  • when the sky is cloudy, 20% of the time we have clear sky the next hour and in 10% we have ra in the nexthour.
  • when it is rainy, 20% of the time we have clear sky the next hour, and 30% of the time it is cloudy.

While having breakfast between 7:00 and 8:00 you observe a clear sky.


A. Based on the observations above, build the Markov transition matrix T for this given weather data.

Model asnwer:

image.png


B. Evaluate the probability that it rains between 9:00 and 10:00.

Model asnwer:

We are interested in the probability of state S3 at time 9, given the state S1 at time 7:

  • p(S7) = {1, 0, 0}T
  • p(S8) = T·p(S7) = first column of T
  • p(S93) = [T]row3·[T]column1 = 0.03 0.92 + 0.10 0.05 + 0.50 * 0.03 = 0.0476

C. Explain how Monte Carlo simulation can be used to predict the probability that there will be rain at some point in the afternoon (between 12:00 and 18:00), given your observations at breakfast. In your answer, identify the model, the model input and the model output, as well as the place of the Markov chain in this simulation.

Model answer:

There are different variations possible to answer this question, particularly in the definition of the model and input. Here is one possible answer:

  1. Starting from the framework of [INPUT]--> [MODEL] --> [OUTPUT], we can define the model as the predictor of rain or no rain (TRUE/FALSE) in the afternoon.

  2. The INPUT of the model is the weather pattern for the day, which can be sampled using the Markov chain sampler, starting from the observation in the morning. This means the input is a (sample from a) stochastic process, which therefore contains uncertainty. (+1, +1)

  3. The OUTPUT of the model is boolean or binary, indicating the occurrence of rain in the afternoon (TRUE/FALSE) of (1/0). (+1)

  4. The MODEL is the interpretation of the weather pattern: the check if there is rain between 12:00 and 18:00 in the afternoon. (+1)

  5. Because there is uncertainty in the input (in the weather pattern for the day), there will be uncertainty in the output. The likelihood of rain in the afternoon is a measure of this uncertainty. Monte Carlo simulationcan be used to quantify the uncertainty in the output of models as an effect of uncertainty in the input by repeated evaluation of the model for input samples, followed by a statistical evaluation of the output.

  6. In particular, the likelihood of rain in the afternoon can be evaluated by sampling many (N) weather patterns as model input and evaluating the number of times it rains in the afternoon N_rain. The ratio N_rain/N is an estimation of the likelihood of rain in the afternoon. (+1)

  7. Alternatively, the Markov chain sampler can be formulated as part of the model, but this would imply that the input of the model is deterministic and the output is stochastic, i.e. the uncertainty is in the model itself. This can still be solved with Monte Carlo analysis, and the procedure is identical, but the concept of the propagation of uncertainty is lost.


D. Explain how you can evaluate the average number of hours of rain per day using your model.

Model answer:

  • Answer example 1: Simulate a Markov process for a very long time (e.g. N = 10.000 hours) and count the number of times n the rain state appears in this series. The average number of hours rain per day is then calculated as n/N*24 hours.

  • Answer example 2: Start from a certain probability state. Evaluate the probability of the next state , where is the Markov transition matrix, in a loop until the probability converges to a steady state .The probability it rains at any given hour is given by component . The average number of hours of rain in a day is given as.

  • Answer example 3: Simulate many (e.g. N=1000) separate and independent Markov processes of 24 hour length. This simulation should have a burn-in period (e.g. 124-hours and ignore the first 100) to be independent. Count the number of hours of rain for each of the 24-hour processes. Take the average of the counted numbers of hours.


E. Which of the following stochastic processes can be used for this model of rain intensity? (multiple answers possible)

  • Bernoulli processes
  • Poisson processes
  • Markov processes
  • Gaussian processes

Model asnwer:

  • Markov processes
  • Gaussian processes