Analysis flow velocity dataset#
Case 3: Flow velocity of the river Thames#
A photo from near the measuring station at Kingston, taken from here. Photo of station 39001 from the National River Flow Archive (UK Centre for Ecology & Hydrology). All rights reserved unless stated otherwise.
What is the propagated uncertainty? How fast does the river Thames flow at Kingston?
In this project, you have chosen to work on the uncertainty of water depths (\(h\) [m]) and flow rate (\(q\) [m³/s]) of a river. You have observations every 15 minutes from a station at the river Thames at Kingston, UK, over one month. The data has been taken from the Department for Environment Food & Rural Affairs’ Data Services platform here. Remember that the discharge can be computed as
where \(u\) [m/s] is the flow velocity and \(S\) [m²] is the cross-sectional area of the flow. The cross-sectional area, simplified to a rectangle, can be computed as \(S = h*w\) , where \(w\) [m] is the width of the river at the measuring station. Here, we have already corrected for the datum of the the water depths \(h\), so the values in the dataset are relative to river bed. For simplicity, we can at the station.
Thus, assuming a discharge width of \(w=100\) m, we can simplify the previous equation as
The goal of this project is:
Choose a reasonable distribution function for \(q\) and \(h\).
Fit the chosen distributions to the observations of \(q\) and \(h\).
Assuming \(q\) and \(h\) are independent, propagate their distributions to obtain the distribution of \(u\).
Analyze the distribution of \(u\).
Importing packages#
import numpy as np # For math
import matplotlib.pyplot as plt # For plotting
from scipy import stats # For math
from math import ceil, trunc # For plotting
# This is just cosmetic - it updates the font size for our plots
plt.rcParams.update({'font.size': 14})
1. Explore the data#
The first step in the analysis is exploring the data, visually and through statistics.
Tip: In the workshop files, you have used the pandas .describe()
function to obtain the statistics of a data vector. scipy.stats
has a similar function.
import os
from urllib.request import urlretrieve
def findfile(fname):
if not os.path.isfile(fname):
print(f"Downloading {fname}...")
urlretrieve('http://files.mude.citg.tudelft.nl/GA1.4/'+fname, fname)
findfile('dataset_thames.csv')
# Import the data from the .csv file
h, q = np.genfromtxt('dataset_thames.csv', delimiter=",", unpack=True, skip_header=True)
# Plot the time series for the water depth
fig, ax = plt.subplots(2, 1, figsize=(10, 7), layout = 'constrained')
ax[0].plot(h,'k')
ax[0].set_xlabel('Time')
ax[0].set_ylabel('Water depth $h$ [m]')
ax[0].grid()
# Plot the time series for the flow rate
ax[1].plot(q,'k')
ax[1].set_xlabel('Time')
ax[1].set_ylabel('Flow rate $q$ [m³/s]')
ax[1].grid()
# Statistics for h
print(stats.describe(h))
# Statistics for q
print(stats.describe(q))
Task 1:
Describe the data based on the previous statistics:
Which variable features a higher variability? Also consider the magnitudes of the different variables.
What does the skewness coefficient represent? Which kind of distribution functions should we consider to fit based on this coefficient?
2. Empirical distribution functions#
Now, we are going to compute and plot the empirical PDF and CDF for each variable. Note that you have the pseudo-code for the empirical CDF in the reader.
Task 2:
Define a function to compute the empirical CDF. Plot your empirical PDF and CDF.
# def ecdf(YOUR_CODE_HERE):
# """Write a function that returns [non_exceedance_probabilities, sorted_values]."""
# YOUR_CODE_HERE # may be more than one line
# return [non_exceedance_probabilities, sorted_values]
### YOUR PLOTS HERE ###
Task 3:
Based on the results of Task 1 and the empirical PDF and CDF, select one distribution to fit to each variable.
For \(h\), select between a Uniform or lognormal distribution.
For \(q\) choose between a Gaussian or Gumbel distribution.
3. Fitting a distribution#
Task 4:
Fit the selected distributions to the observations using MLE (Maximum Likelihood Estimation).
Hint: Use Scipy’s built-in functions (be careful with the parameter definitions!).
### YOUR CODE HERE ###
4. Assessing goodness of fit#
Task 5:
Assess the goodness of fit of the selected distribution using:
One graphical method: QQplot or Logscale. Choose one.
The Kolmogorov-Smirnov test.
Hint: The Kolmogorov-Smirnov test is implemented in Scipy.
### YOUR PLOTS HERE ###
### YOUR CODE HERE ###
Task 6:
Interpret the results of the GOF techniques. How does the selected parametric distribution perform?
5. Propagating the uncertainty#
Using the fitted distributions, we are going to propagate the uncertainty from \(h\) and \(q\) to \(u\) assuming that \(h\) and \(q\) are independent.
Task 7:
Draw 10,000 random samples from the fitted distribution functions for \(h\) and \(q\).
Compute \(u\) for each pair of the generated samples.
Compute \(u\) for the observations.
Plot the PDF and exceedance curve in logscale of \(u\) computed using both the simulations and the observations.
# Draw random samples
rs_h = ### YOUR CODE HERE ###
rs_q = ### YOUR CODE HERE ###
# Compute u
rs_u = ### YOUR CODE HERE ###
# Repeat for observations
u = ### YOUR CODE HERE ###
# Plot the PDF and the CDF
Task 8:
Interpret the figures above, answering the following questions:
Are there differences between the two computed distributions for \(q\)?
What are the advantages and disadvantages of using the simulations?
If you run the code in the cell below, you will obtain a scatter plot of both variables. Explore the relationship between both variables and answer the following questions:
Task 9:
Observe the plot below. What differences do you observe between the generated samples and the observations?
What can you improve the previous analyses? Do you have any ideas/suggestions on how to implement those suggestions?
fig, axes = plt.subplots(1, 1, figsize=(7, 7))
axes.scatter(rs_h, rs_q, 40, 'k', label = 'Simulations')
axes.scatter(h, q, 40, 'r', marker = 'x', label = 'Observations')
axes.set_xlabel('Water depth $h$ [m]')
axes.set_ylabel('Flow rate $q$ [m³/s]')
axes.legend()
axes.grid
plt.savefig("scatterplot.png",dpi=300)
By Max Ramgraber, Patricia Mares Nasarre and Robert Lanzafame, Delft University of Technology. CC BY 4.0, more info on the Credits page of Workbook.