Report#

Part 2&3. Yearly Maxima & Peak Over Threshold#

2.1 Apply Yearly Maxima (YM) and Peak Over Threshold (POT) to the dataset. Plot the selected maxima by both methods on the time series. Identify the differences.

Using YM, 45 events are obtained, one observation per year, while applying POT, 136 extremes are sampled. This means that, in many years, there is more than one extreme event of precipitation and using YM they are not sampled. Note that if we would want to limit our analysis to a seasonal event that occurs every year, Yearly Maxima could be a better choice.

Grading scheme (3 points):

  • Reporting the correct samples extremes (figure and number): 1 point each method, total 2 points

  • Comparison between the results of both methods: 1 point

2.2 Fit the appropriate distributions to the sampled maxima from both methods. Report the parameters. Interpret the parameter of the distribution fitted to the yearly maxima. Assess and compare the goodness of fit of the distributions of both methods

When sampling extremes using Block Maxima, YM in this case, the distribution of those maxima is described by a Generalized Extreme Value distribution (GEV). The shape parameter of the obtained GEV distribution is -0.19 (note the change of sign with respect to the one given by Scipy) which means that the obtained GEV is a Reverse Weibull and presents an upper bound. The location parameter is 157.4 and the scale parameter is 43.8.

Regarding the goodness of fit, the tail of the fitted distribution seems to follow the trend of the tail of the empirical distribution. Higher discrepancies are observed around 150 mm of precipitation.

When sampling extremes using POT, the distribution of those maxima is described by a Generalized Pareto distribution (GPD). The parameters of the obtained GPD are 0.10 (shape) and 29.2 (scale). The location is the threshold. and The fitted GPD distribution seems to capture the general trend of the empirical distribution. However, it starts to deviate at the end of the tail from approximately 225mm. The fit of the GPD is then a bit worse than the one of the GEV.

Grading scheme (3.5 points):

  • Reporting right parameters of GEV: 0.5 points

  • Goodness of fit assessment of GEV (including figure): 1 point

  • Reporting right parameters of GPD: 0.5 points

  • Goodness of fit assessment of GPD (including figure): 1 point

  • Comparison of the fit of both distributions: 0.5 points

Part 4. Return levels#

4.1 Report the return levels associated to a return period of 100 years derived from both methods. Plot the return level plot for both methods. Compare the results and answer the following questions:

  • Which differences do you observe between the results with the two approaches?

  • Why do you think those differences happen?

  • Which design value would you pick? Justify your answer.

  • How can you improve the performed analysis?

The return levels for a return period of 100 years using YM and POT are 292 and 352 mm, respectively.

In this case, POT is significantly more conservative than YM. It is also possible to see these differences in the return level plot: the tails of the fitted distributions diverge from approximately 220 mm. From that point, the return levels predicted by the POT method become more conservative than those predicted by YM. This effect increases for increasing values of the return period.

POT quantifies the distribution using way more observations than YM which may lead to differences in the fitted distribution. In this case, there are significant differences in the shape of the tail which leads to significantly difference inferred return levels. Note that this is case specific.

If no additional information about the physical phenomenon we are studying is available, the safest option would be to use the most conservative approach. In this case, it is POT, but that is case specific. Also, we can argue that the GPD was quantified using more data, which makes the fitting more reliable. On the other hand, we have not performed here a formal analysis to choose the parameters of POT (threshold and declustering time), so we should check whether they are appropriate as the results are dependent on the selected parameters.

As mentioned in the previous point, the parameters of the POT (threshold and declustering time) should be further analyzed to determine whether they are appropriate (whether the sampled extremes are independent and identically distributed).

Grading scheme (3.5 points):

  • Report return level plot: 0.5 points

  • Each subquestion: 0.5 points

Part 5. BONUS (not mandatory!)#

5.1 Analyze whether the selected parameters to perform POT (threshold and declustering time) are appropriate.

If the numbers of sampled events per year follow a Poisson distribution, it can be assumed that the sampled extremes are independent. One of the properties of the Poisson distribution is that the mean = variance = parameter of the distribution. The sampled number of extremes per year does not seem to fulfill this property as mean = 3.24 and variance = 6.28, so it is unlikely that they follow a Poisson distribution.

The empirical probability mass function (pmf) and the fitted Poisson pmf are plotted in the previous figure. The number of exceedances per year does not seem to follow a Poisson distribution. Moreover, when applying the chi-squared test to determine whether the fit to the data is a good representation of the data, significant differences are observed.

With all the above, it can be concluded that the values selected for the threshold and the declustering time are not optimal since the number of exceedances per year does not clearly follow a Poisson distribution and, thus, the samples events may not be independent.

Grading scheme (additional, up to 1.5 points):

  • Compute mean and variance of the number of events each year: 0.75 points

  • Fit and plot the Poisson distribution and assess the fit: 0.75 points

By Patricia Mares Nasarre, Delft University of Technology. CC BY 4.0, more info on the Credits page of Workbook