# Fitting extreme values

## Part 1: Introduction and set up

In this workshop, you will work with data on the concentration of Chlorofill ($Chla$) in offshore waters. As you probably know, this variable is used to measure the amount of algae that is present in the water. If there are very high concentrations of $Chla$, it can be an indication that there is an overgrowth of algae due to an excess of nutrients, too little water movement and renewal, between others, potentially leading to situations such as eutrophization.

Your colleagues have been monitoring the concentrations of $Chla$ ($mg/m^3$) for around 9 months in the location where you want to build an offshore farm as you want to ensure that these concentrations are not too high for the fishes to grow appropriately. To this end, you will model the weekly extremes of $Chla$ and compute the probability of being above recommended values for the fishes growth.

In [None]:
import pandas as pd
import numpy as np
from scipy import stats
import os
import matplotlib.pyplot as plt
from urllib.request import urlretrieve

In [None]:
def findfile(fname):
    if not os.path.isfile(fname):
        print(f"Downloading {fname}...")
        urlretrieve('https://github.com/TUDelft-MUDE/source-files/raw/main/file/'+fname, fname)

findfile('chla.csv')

In [None]:
data = pd.read_csv('chla.csv', delimiter = ',', skiprows=0)
data['time'] = pd.to_datetime(data['time'], format='mixed')
data.columns = ['time', 'Chla']

data.head()

## Part 2: Data exploration

<div style="background-color:#AABAB2; color: black; width:90%; vertical-align: middle; padding:15px; margin: 10px; border-radius: 10px">
<p>

**Task 2.1:**
    
Explore the data: plot the timeseries and histogram of the data.
Briefly described the data: how does the distribution look like? Do you see any trends or seasonality that could be relevant for our analysis?
    
</p>
</div>

In [None]:
### YOUR CODE LINES HERE

## Part 3: Selecting maxima

<div style="background-color:#AABAB2; color: black; width:90%; vertical-align: middle; padding:15px; margin: 10px; border-radius: 10px">
<p>

**Task 3.1:**
    
Sample weekly maxima from the timeseries and plot them on the timeseries. How many samples do you obtain? Plot also the histograms of both the maxima and the observations. Describe the differences between both histograms.

</p>
</div>

In [None]:
# Extract week from the Date column
data["time"] = pd.to_datetime(data["time"])
data["week"] = data["time"].dt.isocalendar().week

# Extract index of the weekly maxima
weekly_max = data.groupby("week")["Chla"].idxmax()

### YOUR CODE LINES HERE

## Part 4: Distribution fitting

Don't hesitate to go back to Q1 - week 4. You did it there a lot!

<div style="background-color:#AABAB2; color: black; width:90%; vertical-align: middle; padding:15px; margin: 10px; border-radius: 10px">
<p>

**Task 4.1:**

Fit the appropriate distribution to the weekly maxima (check the book if you don't remember which one!). Print the values of the obtained parameters and interpret them:
<ol>
    <li>Do the location and scale parameters match the data?</li>
    <li>According to the shape parameter, what type of distribution is this?</li>
    <li>What type of tail does the distribution have (check the book and the description of the distribution in Scipy.stats)?</li>
    <li>Does the distribution have an upper bound? If so, compute it!</li>


</p>
</div>

In [None]:
### YOUR CODE LINES HERE

<div style="background-color:#AABAB2; color: black; width:90%; vertical-align: middle; padding:15px; margin: 10px; border-radius: 10px">
<p>

**Task 4.2:**

Assess the goodness of fit of the selected distribution using the exceedance probability plot in semi-log scale.
    
Consider the following questions:
<ol>
    <li>How well do the probabilities of the fitted distribution match the empirical distribution? Is there an over- or under-prediction?</li>
    <li>Is the tail type of this GEV distribution appropriate for the data?</li>

</p>
</div>

In [None]:
### YOUR CODE LINES HERE

## Part 5: Return levels

<div style="background-color:#AABAB2; color: black; width:90%; vertical-align: middle; padding:15px; margin: 10px; border-radius: 10px">
<p>

**Task 5.1:**

Given that the fishes might have problems growing with concentrations over 10 $mg/m^3 \ ^*$, what is the weekly probability of exceeding such concentration? What is the return period associated to that probability?

</p>
</div>

$^*$: This value is not realistic and only meant for academic purposes.

In [None]:
### YOUR CODE LINES HERE

<div style="background-color:#AABAB2; color: black; width:90%; vertical-align: middle; padding:15px; margin: 10px; border-radius: 10px">
<p>

**Task 5.2:**

Plot the return level plot in months. This is, plot in the x-axis the values of the random variable, $Chla$, and in the y-axis the return periods in months and in log-scale. You can plot it from 0 to 20 $mg/m^3$.

</p>
</div>

In [None]:
### YOUR CODE LINES HERE

<div style="background-color:#AABAB2; color: black; width:90%; vertical-align: middle; padding:15px; margin: 10px; border-radius: 10px">
<p>

**Task 5.3:**

Given that the farm that you want to design has a production plan (design life) of 5 years, what is the probability of exceeding $Chla$ concentrations of 10 $mg/m^3$ along the whole design life of the farm?
</p>
</div>

In [None]:
### YOUR CODE LINES HERE

<div style="background-color:#AABAB2; color: black; width:90%; vertical-align: middle; padding:15px; margin: 10px; border-radius: 10px">
<p>

**Task 5.4:**

Based on the result, what can you say about the viability of the fish farm? You can do some extra calculations to strengthen your argumentation.

</p>
</div>


In [None]:
### YOUR CODE LINES HERE

> By Patricia Mares Nasarre, Delft University of Technology. CC BY 4.0, more info on the Credits page of Workbook