Group Assignment 1.7: Distribution Fitting¶

CEGM1000 MUDE: Week 7, Friday Oct 18, 2024.

Case 1: Wave impacts on a crest wall¶

What's the propagated uncertainty? *How large will the horizontal force be?*

In this project, you have chosen to work on the uncertainty of wave periods and wave heights in the Alboran sea to estimate the impacts on a crest wall: a concrete element installed on top of mound breakwater. You have observations from buoys of the significant wave height ($H$) and the peak wave period ($T$) each hour for several years. As you know, $H$ and $T$ are hydrodynamic variables relevant to estimate wave impacts on the structure. The maximum horizontal force (exceeded by 0.1% of incoming waves) can be estimated using the following equation (USACE, 2002).

$$ F_h = \left( A_1 + A_2 \frac{H}{A_c} \right) \rho g C_h L_{0p} $$

where $A_1=-0.016$ and $A_2=0.025$ are coefficients that depend on the geometry of the structure, $A_c=3m$ is the elevation of the frontal berm of the structure, $\rho$ is the density of water, $g$ is the gravity acceleration, $C_h=2m$ is the crown wall height, and $L_{0p}=\frac{gT^2}{2\pi}$ is the wave length in deep waves. Thus, the previous equation is reduced to

$$ F_h = 255.4 H T^2 -490.4 T^2 $$

The goal of this project is:

Choose a reasonable distribution function for $H$ and $T$.
Fit the chosen distributions to the observations of $H$ and $T$.
Assuming $H$ and $T$ are independent, propagate their distributions to obtain the distribution of $F_h$.
Analyze the distribution of $F_h$.

Importing packages¶

In [ ]:

import numpy as np
import matplotlib.pyplot as plt

from scipy import stats 
from math import ceil, trunc

plt.rcParams.update({'font.size': 14})

1. Explore the data¶

First step in the analysis is exploring the data, visually and through its statistics.

In [ ]:

# Import
_, H, T = np.genfromtxt('dataset_HT.csv', delimiter=",", unpack=True, skip_header=True)

# plot time series
fig, ax = plt.subplots(2, 1, figsize=(10, 7), layout = 'constrained')
ax[0].plot(H,'k')
ax[0].set_xlabel('Time')
ax[0].set_ylabel('Wave height, H (m)')
ax[0].grid()

ax[1].plot(T,'k')
ax[1].set_xlabel('Time')
ax[1].set_ylabel('Water period, T (s)')
ax[1].grid()

In [ ]:

# Statistics for H

print(stats.describe(H))

In [ ]:

# Statistics for d

print(stats.describe(T))

Task 1:

Describe the data based on the previous statistics:

Which variable presents a higher variability?

What does the skewness coefficient means? Which kind of distribution functions should we consider to fit them?

2. Empirical distribution functions¶

Now, we are going to compute and plot the empirical PDF and CDF for each variable. Note that you have the pseudo-code for the empirical CDF in the reader.

Task 2:

Define a function to compute the empirical CDF. Plot your empirical PDF and CDF.

In [ ]:

def ecdf(YOUR_INPUTS):
    #your code
    return YOUR_OUTPUT

In [ ]:

# Your plot here

Task 3:

Based on the results of Task 1 and the empirical PDF and CDF, select one distribution to fit to each variable. For $H$, select between Exponential or Gaussian distribution, while for $T$ choose between Uniform or Gumbel.

3. Fitting a distribution¶

Task 4:

Fit the selected distributions to the observations using MLE.

Hint: Use Scipy built in functions (watch out with the parameters definition!).

In [ ]:

#Your code here

4. Assessing goodness of fit¶

Task 5:

Assess the goodness of fit of the selected distribution using:

One graphical method: QQplot or Logscale. Choose one.

Kolmogorov-Smirnov test.

Hint: You have Kolmogorov-Smirnov test implemented in Scipy.

In [ ]:

#Your code here

Task 6:

Interpret the results of the GOF techniques. How does the selected parametric distribution perform?

5. Propagating the uncertainty¶

Using the fitted distributions, we are going to propagate the uncertainty from $H$ and $T$ to $F_h$ assuming that $H$ and $T$ are independent.

Task 7:

Draw 10,000 random samples from the fitted distribution functions for $H$ and $T$.
Compute $F_h$ for each pair of samples.
Compute $F_h$ for the observations.
Plot the PDF and exceedance curve in logscale of $F_h$ computed using both the simulations and the observations.

In [ ]:

# Here, the solution is shown for the Lognormal distribution

# Draw random samples
rs_H = #your code here
rs_T = #your code here

#Compute Fh
rs_Fh = #your code here

#repeat for observations
Fh = #your code here

#plot the PDF and the CDF

Task 8:

Interpret the figures above, answering the following questions:

Are there differences between the two computed distributions for $F_h$?
What are the advantages and disadvantages of using the simulations?

If you run the code in the cell below, you will obtain a scatter plot of both variables. Explore the relationship between both variables and answer the following questions:

Task 9:

Observe the plot below. What differences do you observe between the generated samples and the observations?
Compute the correlation between $H$ and $T$ for the samples and for the observartions. Are there differences?
What can you improve into the previous analysis? Do you have any ideas/suggestions on how to implement those suggestions?

In [ ]:

fig, axes = plt.subplots(1, 1, figsize=(7, 7))
axes.scatter(rs_H, rs_T, 40, 'k', label = 'Simulations')
axes.scatter(H, T, 40, 'r','x', label = 'Observations')
axes.set_xlabel('Wave height, H (m)')
axes.set_ylabel('Wave period, T (s)')
axes.legend()
axes.grid()

In [ ]:

#Correlation coefficient calculation here

End of notebook.

Group Assignment 1.7: Distribution Fitting¶

.markdown {width:100%; position: relative} article { position: relative }

Case 1: Wave impacts on a crest wall¶

Importing packages¶

1. Explore the data¶

2. Empirical distribution functions¶

3. Fitting a distribution¶

4. Assessing goodness of fit¶

5. Propagating the uncertainty¶

.markdown {width:100%; position: relative} article { position: relative }