Group Assignment 1.7: Distribution Fitting¶

No description has been provided for this image No description has been provided for this image

CEGM1000 MUDE: Week 7, Friday Oct 18, 2024.

Case 3: Discharges on a structure¶

What's the propagated uncertainty? *How large will be the discharge?*

In this project, you have chosen to work on the uncertainty of water depths ($h$) and water velocities ($u$) on top of a hydraulic structure to estimate the discharge. You have observations from physical experiments of waves impacting a breakwater during a wave storm scaled up to prototype scale. You can further read on the dataset here. Remember that the discharge can be computed as

$$ q = u S $$

where $S$ is the section the flow crosses. Thus, assuming a discharge width of 1m, we can simplify the previous equation as

$$ q = u h $$

The goal of this project is:

  1. Choose a reasonable distribution function for $d$ and $h$.
  2. Fit the chosen distributions to the observations of $d$ and $h$.
  3. Assuming $d$ and $h$ are independent, propagate their distributions to obtain the distribution of $q$.
  4. Analyze the distribution of $q$.

Importing packages¶

In [ ]:
import numpy as np
import matplotlib.pyplot as plt

from scipy import stats 
from math import ceil, trunc

plt.rcParams.update({'font.size': 14})

1. Explore the data¶

First step in the analysis is exploring the data, visually and through its statistics.

In [ ]:
# Import
h, u = np.genfromtxt('dataset_hu.csv', delimiter=",", unpack=True, skip_header=True)

# plot time series
fig, ax = plt.subplots(2, 1, figsize=(10, 7), layout = 'constrained')
ax[0].plot(h,'k')
ax[0].set_xlabel('Time')
ax[0].set_ylabel('Water depth, h (m)')
ax[0].grid()

ax[1].plot(u,'k')
ax[1].set_xlabel('Time')
ax[1].set_ylabel('Water velocity, u (m/s)')
ax[1].grid()
In [ ]:
# Statistics for h

print(stats.describe(h))
In [ ]:
# Statistics for u

print(stats.describe(u))

Task 1:

Describe the data based on the previous statistics:

  • Which variable presents a higher variability?
  • What does the skewness coefficient means? Which kind of distribution functions should we consider to fit them?
  • 2. Empirical distribution functions¶

    Now, we are going to compute and plot the empirical PDF and CDF for each variable. Note that you have the pseudo-code for the empirical CDF in the reader.

    Task 2:

    Define a function to compute the empirical CDF.

    In [ ]:
    def ecdf(YOUR_INPUT:
        #Your code
        return YOUR_OUTPUT
    
    In [ ]:
    #Your plot
    

    Task 3:

    Based on the results of Task 1 and the empirical PDF and CDF, select one distribution to fit to each variable. For $h$, select between Uniform or Gaussian distribution, while for $u$ choose between Exponential or Gumbel.

    3. Fitting a distribution¶

    Task 4:

    Fit the selected distributions to the observations using MLE.

    Hint: Use Scipy built in functions (watch out with the parameters definition!).

    In [ ]:
    #Your code here
    

    4. Assessing goodness of fit¶

    Task 5:

    Assess the goodness of fit of the selected distribution using:

  • One graphical method: QQplot or Logscale. Choose one.
  • Kolmogorov-Smirnov test.
  • Hint: You have Kolmogorov-Smirnov test implemented in Scipy.

    In [ ]:
    #Your code here
    

    Task 6:

    Interpret the results of the GOF techniques. How does the selected parametric distribution perform?

    5. Propagating the uncertainty¶

    Using the fitted distributions, we are going to propagate the uncertainty from $h$ and $u$ to $q$ assuming that $h$ and $u$ are independent.

    Task 7:

    1. Draw 10,000 random samples from the fitted distribution functions for $h$ and $u$.

    2. Compute $q$ for each pair of samples.

    3. Compute $q$ for the observations.

    4. Plot the PDF and exceedance curve in logscale of $q$ computed using both the simulations and the observations.

    In [ ]:
    # Here, the solution is shown for the Lognormal distribution
    
    # Draw random samples
    rs_h = #Your code here
    rs_u = #Your code here
    
    #Compute Fh
    rs_q = #Your code here
    
    #repeat for observations
    q = #Your code here
    
    #plot the PDF and the CDF
    

    Task 8:

    Interpret the figures above, answering the following questions:

    • Are there differences between the two computed distributions for $q$?
    • What are the advantages and disadvantages of using the simulations?

    If you run the code in the cell below, you will obtain a scatter plot of both variables. Explore the relationship between both variables and answer the following questions:

    Task 9:

    1. Observe the plot below. What differences do you observe between the generated samples and the observations?

    2. Compute the correlation between $h$ and $u$ for the samples and for the observartions. Are there differences?

    3. What can you improve into the previous analysis? Do you have any ideas/suggestions on how to implement those suggestions?

    In [ ]:
    fig, axes = plt.subplots(1, 1, figsize=(7, 7))
    axes.scatter(rs_h, rs_u, 40, 'k', label = 'Simulations')
    axes.scatter(h, u, 40, 'r','x', label = 'Observations')
    axes.set_xlabel('Wave height, H (m)')
    axes.set_ylabel('Wave period, T (s)')
    axes.legend()
    axes.grid()
    
    In [ ]:
    #Correlation coefficient calculation here
    

    End of notebook.

    Creative Commons License TU Delft MUDE

    © Copyright 2023 MUDE Teaching Team TU Delft. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.