Workshop 2.7: Extreme temperature¶

CEGM1000 MUDE: Extreme Value Analysis, Week 2.7, Wednesday, Jan 8, 2025.

In this session, you will work with the uncertainty of extreme temperatures in the airport of Barcelona to assess the extreme loads induced by temperature in a steel structure in the area. You have daily observations of the maximum temperature for several years. The dataset was retrieved from the Spanish Agency of Metheorology AEMET. Your goal is to design the structure for a lifespan of 50 years with a probability of failure of 0.1 during the design life.

The goal of this project is:

Compute the required design return period for the steel structure.
Perform monthly Block Maxima and fit the a distribution to the sampled observations.
Assess the Goodness of fit of the distribution.
Compute the return level plot.
Compute the design return level.

In [ ]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from scipy import stats 
from math import ceil, trunc

plt.rcParams.update({'font.size': 14})

Task 1: Data Exploration¶

First step in the analysis is exploring the data, visually and through its statistics. We import the dataset and explore its columns.

In [ ]:

T = pd.read_csv('temp.csv', delimiter = ',', parse_dates = True).dropna().reset_index(drop=True)
T.columns=['Date', 'T'] #rename columns
T.head()

The dataset has two columns: the time stamp of the measurements and the cumulative daily precipitation. We set the first columns as a datetime as they are the dates of the measurements.

In [ ]:

T['Date'] = pd.to_datetime(T['Date'], format='mixed')
T['Date']

Once formatted, we can plot the timeseries and the histogram.

In [ ]:

fig, axes = plt.subplots(1,2, figsize=(12,5), layout='constrained')

axes[0].hist(T['T'], label = 'T', density = True, edgecolor = 'darkblue')
axes[0].set_xlabel('PDF')
axes[0].set_xlabel('Maximum daily temperature [degrees]')
axes[0].grid()
axes[0].legend()
axes[0].set_title('(a) Histogram')

axes[1].plot(T['Date'], T['T'],'k', label = 'P')
axes[1].set_xlabel('Time')
axes[1].set_ylabel('Maximum daily temperature [degrees]')
axes[1].grid()
axes[1].set_title('(b) Time series')
axes[1].legend()

Task 2.1: Based on the above plots, briefly describe the data.

Task 2: Sample Monthly Maxima¶

Task 2.2: Sample monthly maxima from the timeseries and plot them on the timeseries. Plot also the histograms of both the maxima and the observations.

In [ ]:

# Extract year and month from the Date column
T['Year'] = T['Date'].dt.year
T['Month'] = T['Date'].dt.month

# Group by Year and Month, then get the maximum observation
idx_max = #your code here
max_list = T.loc[idx_max]

In [ ]:

#Your code here: plot

Task 2.3: Look at the previous plots. Are the sampled maxima independent and identically distributed? Justify your answer. What are the implications for further analysis?

Task 2.4: Compute PDF and the empirical cumulative distribution function of the observations and the sampled monthly maxima. Plot the ECDF in log-scale. Where are the sampled monthly maxima located with respect with the CDF of all the observations?

In [ ]:

def ecdf(var):
    #your code here
    return [y, x]

#your plot here

Task 3: Distribution Fitting¶

We did this a lot at the end of Q1---refer to your previous work as needed!

Task 3: Fit a distribution to the monthly maxima. Print the values of the obtained parameters and interpret them:

Do the location and scale parameters match the data?
According to the shape parameter, what type of distribution is this?
What type of tail does the distribution have (refer to EVA Chapter 7.2 of the book)?
Does the distribution have an upper bound? If so, compute it!

Use scipy.stats built in functions (watch out with the parameter definitions!), similar to Week 1.7 and use the DataFrame created in Task 2.

In [ ]:

params_T = #your code here
print(params_T)

Task 4: Goodness of Fit¶

Task 4: Assess the goodness of fit of the selected distribution using the exceedance probability plot in semi-log scale.

Consider the following questions:

How well do the probabilities of the fitted distribution match the empirical distribution? Is there an over- or under-prediction?
Is the tail type of this GEV distribution appropriate for the data?

In [ ]:

#your code here

Task 5: Return Levels¶

It was previously indicated that the structure should be designed for a lifespan of 50 years with a probability of failure of 0.1 along the design life.

Task 5.1:

Compute the design return period using both the Binomial and Poisson model for extremes. Compare the obtained return.

Task 5.2:

Considering that you have sampled monthly maxima, compute and plot the return level plot: values of the random variable in the x-axis and return periods on the y-axis. Y-axis in logscale.

In [ ]:

RT_range = #range of values of return level
monthly_probs = #compute monthly probabilities
eval_temp = #compute the values of the random variable for those probabilities

plt.figure(figsize=(10, 6))
plt.plot(eval_temp, RT_range, 'k')
plt.xlabel('Temperature [mm]')
plt.ylabel('RT [years]')
plt.yscale('log') 
plt.grid()

Task 5.3: Compute the design value of temperature. Choose the return level you prefer within the Poisson and Binomial model.

In [ ]:

RT_design = #value of the return period
monthly_design = #compute the monthly probability
design_T = #compute the design value

print('The design value of temperature is:',
      f'{design_T:.3f}')

End of notebook.

Workshop 2.7: Extreme temperature¶

.markdown {width:100%; position: relative} article { position: relative }

Task 1: Data Exploration¶

Task 2: Sample Monthly Maxima¶

Task 3: Distribution Fitting¶

Task 4: Goodness of Fit¶

Task 5: Return Levels¶