Analysis wind gusts dataset

Analysis wind gusts dataset#

Case 1: Wind gust factor in Delft#

What’s the propagated uncertainty? How large is the wind gust factor?

In this project, you have chosen to work on the uncertainty of of the wind gust fraction at 10m height in Delft. You have observations of the wind gust speed \(G\) [m/s] and the baseline wind speed \(v\) [m/s] every hour for the entire month of August 2025. The data has been accessed from here. The wind gust factor \(F\) [-] is computed as the fraction

\[ F = \frac{G}{v} \]

As you may have experienced yourself, the Netherlands can be a pretty windy place. The wind gust factor quantifies by what factor the wind gust top speeds exceed the base wind speed.

The goal of this project is:

Choose a reasonable distribution function for \(G\) and \(v\).
Fit the chosen distributions to the observations of \(G\) and \(v\).
Assuming \(G\) and \(v\) are independent, propagate their distributions to obtain the distribution of \(F\).
Analyze the distribution of \(F\).

Importing packages#

import numpy as np              # For math
import matplotlib.pyplot as plt # For plotting
from scipy import stats         # For math
from math import ceil, trunc    # For plotting

# This is just cosmetic - it updates the font size for our plots
plt.rcParams.update({'font.size': 14})

1. Explore the data#

The first step in the analysis is exploring the data, visually and through statistics.

Tip: In the workshop files, you have used the pandas .describe() function to obtain the statistics of a data vector. scipy.stats has a similar function.

import os
from urllib.request import urlretrieve

def findfile(fname):
    if not os.path.isfile(fname):
        print(f"Downloading {fname}...")
        urlretrieve('http://files.mude.citg.tudelft.nl/GA1.4/'+fname, fname)

findfile('dataset_wind_gusts.csv')

# Import the data from the .csv file
v, G = np.genfromtxt('dataset_wind_gusts.csv', delimiter=",", unpack=True, skip_header=True)

# Plot the time series for the wind speed v
fig, ax = plt.subplots(2, 1, figsize=(10, 7), layout = 'constrained')
ax[0].plot(v,'k')
ax[0].set_xlabel('Time')
ax[0].set_ylabel('Wind speed $v$ [m/s]')
ax[0].grid()

# Plot the time series for the wind gust speed G
ax[1].plot(G,'k')
ax[1].set_xlabel('Time')
ax[1].set_ylabel('Gust speed $G$ [m/s]')
ax[1].grid()

# Statistics for v
print(stats.describe(v))

# Statistics for G
print(stats.describe(G))

Task 1:

Describe the data based on the previous statistics:

Which variable features a higher variability? Also consider the magnitudes of the different variables.
What does the skewness coefficient represent? Which kind of distribution functions should we consider to fit based on this coefficient?

2. Empirical distribution functions#

Now, we are going to compute and plot the empirical PDF and CDF for each variable. Note that you have the pseudo-code for the empirical CDF in the reader.

Task 2:

Define a function to compute the empirical CDF. Plot your empirical PDF and CDF.

# def ecdf(YOUR_CODE_HERE):
#     """Write a function that returns [non_exceedance_probabilities, sorted_values]."""
#     YOUR_CODE_HERE # may be more than one line
#     return [non_exceedance_probabilities, sorted_values]

### YOUR PLOTS HERE ###

Task 3:

Based on the results of Task 1 and the empirical PDF and CDF, select one distribution to fit to each variable.

For \(v\), select between a lognormal or exponential distribution.

For \(G\) choose between a Gaussian or beta distribution.

3. Fitting a distribution#

Task 4:

Fit the selected distributions to the observations using MLE (Maximum Likelihood Estimation).

Hint: Use Scipy’s built-in functions (be careful with the parameter definitions!).

### YOUR CODE HERE ###

4. Assessing goodness of fit#

Task 5:

Assess the goodness of fit of the selected distribution using:

One graphical method: QQplot or Logscale. Choose one.
The Kolmogorov-Smirnov test.

Hint: The Kolmogorov-Smirnov test is implemented in Scipy.

### YOUR PLOTS HERE ###

### YOUR CODE HERE ###

Task 6:

Interpret the results of the GOF techniques. How does the selected parametric distribution perform?

5. Propagating the uncertainty#

Using the fitted distributions, we are going to propagate the uncertainty from \(v\) and \(G\) to \(F\) with a Monte Carlo approach assuming that \(v\) and \(G\) are independent.

Task 7:

Draw 10,000 random samples from the fitted distribution functions for \(v\) and \(G\).
Compute \(F\) for each pair of the generated samples.
Compute \(F\) for the observations.
Plot the PDF and exceedance curve in logscale of \(F\) computed using both the simulations and the observations.

Hint: The distributions you have chosen may generate \(v\) or \(G\) values close to zero or even negative. Since you are computing a fraction, this may cause numerical issues. A hack to avoid that might be to set all values below a threshold, say, \(0.1\) [m/s] to the threshold value.

# Draw random samples
rs_v = ### YOUR CODE HERE ###
rs_G = ### YOUR CODE HERE ###

# Compute F
rs_F = ### YOUR CODE HERE ###

# Repeat for observations
F = ### YOUR CODE HERE ###

# Plot the PDF and the CDF

Task 8:

Interpret the figures above, answering the following questions:

Are there differences between the two computed distributions for \(F\)?
What are the advantages and disadvantages of using the simulations?

If you run the code in the cell below, you will obtain a scatter plot of both variables. Explore the relationship between both variables and answer the following questions:

Task 9:

Observe the plot below. What differences do you observe between the generated samples and the observations?
What can you improve into the previous analysis? Do you have any ideas/suggestions on how to implement those suggestions?

fig, axes = plt.subplots(1, 1, figsize=(7, 7))
axes.scatter(rs_v, rs_G, 40, 'k', label = 'Simulations')
axes.scatter(v, G, 40, 'r', marker = 'x', label = 'Observations')
axes.set_xlabel('wind speed $v$ [m/s]')
axes.set_ylabel('Gust speed $G$ [m/s]')
axes.legend(loc = "upper right")
axes.grid()
plt.savefig("scatterplot.png",dpi=300)

By Max Ramgraber, Patricia Mares Nasarre and Robert Lanzafame, Delft University of Technology. CC BY 4.0, more info on the Credits page of Workbook.