2.1. Random vectors, covariance and correlation#
When dealing with multiple random variables we need to consider that these may not be independent. Instead of considering the individual random variables, we will then need to work with a random vector \(X= [\begin{array}{llll} X_1 & X_2 & \ldots &X_m \end{array}]^T\), which has a multivariate (or: joint) distribution. We will first introduce the covariance and correlation coefficient of two random variables, and then introduce the multivariate normal distribution.
Covariance and correlation#
The covariance \(Cov(X_1,X_2)\) is a measure of the joint variability of the two random variables \(X_1\) and \(X_2\). It gives us information about whether and how the two variables are correlated and it can be either positive or negative (hence, \(Cov(X_1,X_2) \lessgtr 0\)).
By definition
where \(\mathop{{}\mathbb{E}}(X_1)\) is the expected value of the first random variable, for instance.
Recall: the expected value of a random variable \(X\) is
where \(f_X(x)\) is the probability density function of \(X\).
If \(Cov(X_1,X_2)>0\) it means that high (low) values of \(X_1\) occur together with high (low) values of \(X_2\), therefore the covariance is defined as POSITIVE.
On the other hand, if \(Cov(X_1,X_2)<0\) it means that high (low) values of \(X_1\) occur together with low (high) values of \(X_2\). In this case the covariance is defined as NEGATIVE.
Note that:
In order to better understand the relation between the two random variables we can also compute the Pearson correlation coefficient
which in fact is a measure of the strength of the linear relationship among the variables.
The correlation coefficient by definition takes a value between -1 and 1. If \(\rho_{ij} =0\) the random variable are uncorrelated, this is the case if the random variables are independent. If \(\rho_{ij}= \pm 1\) the variables are fully correlated: knowing the value of one variable, means that the value of the other variable is also known, and the two variables have a linear relation. A positive correlation coefficient means that if one variable increases, the other one tends to increase as well; conversely a negative correlation means that an increase of one variable is accompanied by a tendency of the other variable to decrease.
Examples are shown in Fig. 2.3, for an example with a large number of repeated measurements; for this example \( \sigma_{X_1}=1\) and \(\sigma_{X_2}=2\) (the mean values are indicated as \(\mu_1\) and \(\mu_2\)). The larger standard deviation of the second measurement results in a larger spread in the vertical direction. Obviously, the measurements fluctuate around the means.
Generally it is quite difficult to find cases in which \(\rho=\pm 1\) or even \(\rho=0\). In fact it is more likely to have values of \(\rho\) that range between \(-1\) and \(1\) and, based on this value, we can determine the strength of the linear dependence.

Fig. 2.3 Scatterplots of outcomes of (\(X_1,X_2\)) with different correlation coefficients.#
The interactive element below allows you to play around with the correlation value yourself. Observe how the distribution’s density contours, or the scattered data, changes when you adapt the correlation value.
Fig. 2.4 Interactively change the correlation coefficient to visualize the effect on density contours or samples of the bivariate Gaussian distribution.#
Covariance matrix#
When considering a random vector \(X= [\begin{array}{llll} X_1 & X_2 & \ldots &X_m \end{array}]^T\), we can ‘collect’ all covariances in the so-called covariance matrix:
Note that the covariance matrix is symmetric, since \(Cov(X_i,X_j)= Cov(X_j,X_i)\).
If all measurements are independent, all covariances will be equal to zero, and the covariance matrix becomes a diagonal matrix with the variances on the diagonal.
Attribution
This chapter was written by Sandra Verhagen. Find out more here.