scipy.stats and 3D plots

`scipy.stats` and 3D plots#

Part 1: Introduction and set up of the model#

In this programming assignment, you will work in creating plots in 3D and the projections of those plots in 2D using contours. These plotting tools are useful, for instance, when displaying the results of bivariate distribution.

Imagine you are a scientist interested in the atmospheric processes that lead to precipitation. You have been studying the relationship between the rain rate (\(RR\), mm/h) and the ice content in the clouds (\(IC\), kg/m2). You have determined that the bivariate distribution \(F(RR, IC)\) is well described using a bivariate Gaussian distribution with the following parameters: \(\mu_{RR} = 10\) mm/h and \(\sigma_{RR}=2\) mm/h, \(\mu_{IC} = 700\) kg/m2 and \(\sigma_{IC}=300\) kg/m2, and \(Cov(RR, IC)=300 \ mm/h \times kg/m2\).

import pandas as pd
import numpy as np
from scipy.stats import multivariate_normal
import matplotlib.pyplot as plt

\(\text{Task 3.1:}\)

Define a bivariate Gaussian distribution between \(RR\) and \(IC\) given the above information. Draw 100 random samples from the defined distribution and plot them.

Follow the following steps:

Define the vector of means
Define the covariance matrix
Define the bivariate Gaussian distribution with scipy.stats.multivariate_normal and draw 100 samples using the rvs method

# Define the vector of means
mu1 = 10 #mu_1
mu2 = 700 #mu_2
mu = [mu1, mu2] #vector of means

# Define the covariance matrix
s1 = 2  #sigma_x1
s2 = 300  #sigma_x2
covariance = 300 #correlation coefficient
sigma = np.array([[s1**2 , covariance], [covariance, s2**2]]) #Covariance matrix

# Draw 100 samples from a bivariate Gaussian distribution
samples = multivariate_normal(mean=mu, cov=sigma).rvs(size=100)

# Scatter plot against observations
fig, axs = plt.subplots(1, 1)
axs.scatter(samples[:,0], samples[:,1], 40, 'k')
axs.set_ylabel('${IC}$ [kg/m$^2$]')
axs.set_xlabel('${RR}$ [mm/h]')
fig.set_size_inches(5, 5)
axs.grid()

../../_images/69e46222906daf014cfabf2a473a53c916f7e8cb2f6938d1db55a0c365157996.png

\(\text{Solution 3.1:}\)

Note that negative values of \(RR\) might be obtained using the model as it is not bounded, although they do not have physical sense.

Part 2. 3D plots#

In order to create 3D plots, we usually need to evaluate a function over a 2D grid. To do so, we can use the function np.meshgrid which takes two 1D arrays of values (usually representing axes x and y) and turns them into 2D grids, so you can evaluate functions over a grid of (x, y) points.

Let’s see it with an example!

Imagine that I want to calculate \(Z\) as function of \(A\) and \(B\), \(Z = A^2+B\), with uniformly distributed values of \(A\) and \(B\) and plot it spatially. Thus, \(A\) and \(B\) are the axis x and y, while \(Z\) is the z-axis. We need to generate a grid with values of \(A\) and \(B\) to evaluate \(Z\). We can do it as follows:

# Define the range of values for A and B
n = 5 #number of values or size of the mesh
values_A = np.linspace(0,30,n)
values_B = np.linspace(-5, 5,n)

# Define the grid
A2,B2 = np.meshgrid(values_A,values_B)

print(f"The values to evaluate A are: {values_A}")
print(f"The values to evaluate B are: {values_B}\n")
print(f"The grid defined for A is: {A2}")
print(f"The shape of A2 is: {A2.shape}\n")
print(f"The grid defined for B is: {B2}")
print(f"The shape of B2 is: {B2.shape}")

The values to evaluate A are: [ 0.   7.5 15.  22.5 30. ]
The values to evaluate B are: [-5.  -2.5  0.   2.5  5. ]

The grid defined for A is: [[ 0.   7.5 15.  22.5 30. ]
 [ 0.   7.5 15.  22.5 30. ]
 [ 0.   7.5 15.  22.5 30. ]
 [ 0.   7.5 15.  22.5 30. ]
 [ 0.   7.5 15.  22.5 30. ]]
The shape of A2 is: (5, 5)

The grid defined for B is: [[-5.  -5.  -5.  -5.  -5. ]
 [-2.5 -2.5 -2.5 -2.5 -2.5]
 [ 0.   0.   0.   0.   0. ]
 [ 2.5  2.5  2.5  2.5  2.5]
 [ 5.   5.   5.   5.   5. ]]
The shape of B2 is: (5, 5)

In the above prints, you can see that the values where we wanted to evaluate A have been repeated 5 times in rows while the values where we wanted to evaluate B have been repeated 5 times in columns to create a grid. This allows to cover all the combinations of values of A and B.

In order to move back from that grid to a 2d array (required for evaluating the function \(Z\)) with all the combinations of \(A\) and \(B\), we can use np.concatenate to transform both A2 and B2 into a 1D array. Afterwards, it is possible to join them in a single array.

A2_1d = np.concatenate(A2.T)
print(f"The grid defined as a 1D-array for A is: {A2_1d}")
print(f"The shape of A2_1d is: {A2_1d.shape}\n")

B2_1d = np.concatenate(B2.T)
print(f"The grid defined as a 1D-array for B is: {B2_1d}")
print(f"The shape of B2_1d is: {B2_1d.shape}")

The grid defined as a 1D-array for A is: [ 0.   0.   0.   0.   0.   7.5  7.5  7.5  7.5  7.5 15.  15.  15.  15.
 15.  22.5 22.5 22.5 22.5 22.5 30.  30.  30.  30.  30. ]
The shape of A2_1d is: (25,)

The grid defined as a 1D-array for B is: [-5.  -2.5  0.   2.5  5.  -5.  -2.5  0.   2.5  5.  -5.  -2.5  0.   2.5
  5.  -5.  -2.5  0.   2.5  5.  -5.  -2.5  0.   2.5  5. ]
The shape of B2_1d is: (25,)

A2B2 = np.array([A2_1d, B2_1d]).T
print(f"The grid defined as a 2D-array for A and B is: \n{A2B2}")
print(f"The shape of A2B2 is: {A2B2.shape}")

The grid defined as a 2D-array for A and B is: 
[[ 0.  -5. ]
 [ 0.  -2.5]
 [ 0.   0. ]
 [ 0.   2.5]
 [ 0.   5. ]
 [ 7.5 -5. ]
 [ 7.5 -2.5]
 [ 7.5  0. ]
 [ 7.5  2.5]
 [ 7.5  5. ]
 [15.  -5. ]
 [15.  -2.5]
 [15.   0. ]
 [15.   2.5]
 [15.   5. ]
 [22.5 -5. ]
 [22.5 -2.5]
 [22.5  0. ]
 [22.5  2.5]
 [22.5  5. ]
 [30.  -5. ]
 [30.  -2.5]
 [30.   0. ]
 [30.   2.5]
 [30.   5. ]]
The shape of A2B2 is: (25, 2)

Now we’ve converted our mesh into a 2D-array, we can evaluate the function \(Z = A^2+B\).

Z = A2B2[:,0]**2 + A2B2[:,1]
print(f"The values of Z evaluated at (A,B) points are: \n {Z}")
print(f"Shape of Z: {Z.shape}\n")

The values of Z evaluated at (A,B) points are: 
 [ -5.    -2.5    0.     2.5    5.    51.25  53.75  56.25  58.75  61.25
 220.   222.5  225.   227.5  230.   501.25 503.75 506.25 508.75 511.25
 895.   897.5  900.   902.5  905.  ]
Shape of Z: (25,)

Once we have calculated \(Z\), we need the shape of the original meshgrid again for plotting! Luckily the .reshape method comes to the rescue:

mesh_plot = Z.reshape(A2.shape).T
print(f"The grid defined as a mesh for plotting is: \n{mesh_plot}")
print(f"The shape of the mesh is: {mesh_plot.shape}")

The grid defined as a mesh for plotting is: 
[[ -5.    51.25 220.   501.25 895.  ]
 [ -2.5   53.75 222.5  503.75 897.5 ]
 [  0.    56.25 225.   506.25 900.  ]
 [  2.5   58.75 227.5  508.75 902.5 ]
 [  5.    61.25 230.   511.25 905.  ]]
The shape of the mesh is: (5, 5)

\(\text{Task 3.3:}\)

Using the defined bivariate Gaussian distribution, but now compute and plot the bivariate PDF (using method pdf) as a 3D plot where the x- and y-axis are the values of \(RR\) and \(IC\) and the z-axis are the densities.

\(\text{Tip:}\)

Hint 1: Use the function meshgrid of numpy to define a grid to evaluate the PDF.

Hint 2: use plot_surface of matplotlib to display the 3D plot.

# Define the mesh of values where we want to evaluate the random variables
n = 200 #size of the mesh
values_RR = np.linspace(0,mu[0]+3.8*s1,n)
values_IC = np.linspace(0,mu[1]+3.8*s2,n)

# Define the grid
X1,X2 = np.meshgrid(values_RR,values_IC)
X = np.array([np.concatenate(X1.T), np.concatenate(X2.T)]).T

# Evaluate the PDF
Z = multivariate_normal(mean=mu, cov=sigma).pdf(X)

# Create the 3D plot
plt.rcParams.update({'font.size': 14})
fig = plt.figure(figsize=(15,10))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X1, X2, Z.reshape(X1.shape).T, color='#F792EF', edgecolor='black', linewidth=.25)
ax.set_xlabel('${RR}$ [mm/h]',labelpad=20)
ax.set_ylabel('${IC}$ [kg/m$^2$]',labelpad=20)
ax.set_zlabel('$f(x_1,x_2)$',labelpad=20)
ax.set_title(f'Bivariate Normal pdf', fontsize=20)
ax.view_init(ax.elev, ax.azim+20)
plt.show()

../../_images/14a6b5a6a20b5f69c5f2fdbdcde84e04770fa54c84994886dfa7b893304f0d48.png

\(\text{Task 3.4:}\)

Using the defined bivariate Gaussian distribution, compute and plot the bivariate PDF as contours where densities are projected. This is, the x- and y-axis are the values of \(RR\) and \(IC\) and the contours in the plot represent values of densities.

\(\text{Tip:}\)

Hint 1: you can reuse the grid defined in the previous task.

Hint 2: use contour of matplotlib to draw the contours.

# Create contours plot
fig = plt.figure(figsize=(6,5))
ax = fig.add_subplot(111)
contours = ax.contour(X1, X2, Z.reshape(X1.shape).T, 25, cmap='YlGnBu', vmin=0)
ax.grid()
ax.set_xlim([0, 15])
ax.set_ylim([0, 1800])
ax.set_xlabel('${RR}$ [mm/h]')
ax.set_ylabel('${IC}$ [kg/m$^2$]')
fig.colorbar(contours, ax=ax, label='Probability Density')

<matplotlib.colorbar.Colorbar at 0x7f8acd07b650>

../../_images/e8146863d1f9ba701e34636cb831a9fd518ec05e547d95635aeabe7ef4c64944.png

End of solution.

By Tom van Woudenberg and Patricia Mares Nasarre, Delft University of Technology. CC BY 4.0, more info on the Credits page of Workbook.

scipy.stats and 3D plots

Contents

scipy.stats and 3D plots#

Part 1: Introduction and set up of the model#

Part 2. 3D plots#

`scipy.stats` and 3D plots#