Homework 02
- Overall Info
- Due date: 2024-04-22
- Returns in terms of a
*.py
file with comments for the code - By group of 5-6 students
1. Monte Carlo Convergence
We compute the following expectation
where \(K\) is a constant, \(F_X\) is the cdf
of \(X\) and \(f_X=dF_X/dx\) is the pdf
of \(X\).
We assume throughout that \(X\sim \mathcal{N}(\mu, \sigma^2)\) is a guaussian distribution with mean \(\mu\) and standard deviation \(\sigma\).
Such a random variable is declared in scipy
as follows
1.1 Define a function expectation
that takes as input the random variable RV
, the constant K
and return the value of the integral using quad
.
1.2 Define a function mc_expectation
that takes as input the random variable RV
, the constant K
and the natural number N
(the number of samples) that retuns a numpy
vector [I_0, ..., I_{N-1}]
where
where x = [x_0, \ldots, x_{N-1}]
is a random sample from the distribution of \(X\).
1.3 For \(N = 100000\), with a plotly graph, plot
- The constant function
expectation(RV, K)
forn=0, ..., N-1
- 10 functions
mc_expectation(RV, K, N)
(they already return a vectorI = [I_0, ..., I_{N-1}]
)
Comment on the speed of convergence of monte carlo method (you can vary \(N\) as well as the number of monte carlo samples you compute).
2. Quantile: exact vs MC methods
From the lecture we know
- Value at Risk:
- The value at risk is defined as \(V@R_{\alpha}(X) = q_X(1-\alpha) = F_X^{-1}(1-\alpha)\) where \(q_X\) is the quantile of \(X\).
Leads to three methods to compute it.
- You already have a
ppf
function at hand for the quantile - You compute the inverse of the
cdf
usingroot
- You have just random sample of the distribution and compute the empirical quantile.
2.1 Define a quantile00
function that takes as input a known RV
and a number 0<u<1
and return the quantile with a root finding method to invert the cdf
2.2 Compare the speed of computation for different RV
(normal, student, normal inverse guaussian from the scipy.stats
library) of this function with respect to the ppf
function of those random variable.
2.3 Sometimes, you do not have access to the ppf
or cdf
explicitely but just have \(N\) random samples \(x^N = (x_0^N, ..., x^N_{N-1})\) of this random variable.
Numpy has a functionality to return the quantile of a random sample (there are many ways to interpolate it, see documnetation).
Given a random sample (as a numpy array) x=[x_0, ..., x_{N-1}]
the quantile is given by np.quantile(x, u)
.
This empirical quantile converges to the normal quantile.
Experiment with different random samples from a normal random variable (rvs
) and plot the convergence as a function of \(N\) of the empirical quantile to the theoretical quantile.
Empirical Quantile
Note that the general definition of the (right) quantile is given by
If you have a a random sample \((x_0, \ldots, x_{N-1})\) of \(X\), denote by
It holds that \(F^N \to F_X\) if \(F_X\) is continuous (is not easy to show, it holds in general gut in a distributional sense).
Hence, the quantile \(q^N\) of \(F^N\) converges too to \(q_X\) as \(N\) is large.
If you denote by \((y_0, \ldots, y_{N-1})\) the reordering of \((x_0, \ldots, x_{N-1})\) from the smallest to the largest value, then it holds
where
3. Average Value at Risk
We discussed in the lecture that the value at risk (which is a quantile) is not appropriate to estimate the risk. It has been replaced by the average value at risk which has different representations
3.1 Implement the AVaR0
, AVaR1
and AV@R2
functions with input RV
and alpha
and return with quad
the result for the first, second and third representation using ppf
and/or minimize
depending on the representation
3.2 Implement the mc_AVaR0
, mc_AVaR1
and mc_AVaR2
functions with input x = [x_0, \ldots, x_{N-1}]
numpy random sample of \(X\) and alpha
using Monte carlo and empirical quantile for each representations
3.3 Compare numerically the speed and accuracy of each method.
4. Multidimensional
Usually in risk managment, the random variable \(X\) is a combination of many random variables
where the vector \((X^1, \ldots, X^d)\) has a given cdf
or random samples, and \(w^1, \ldots, w^d\) are numbers representing the contribution of each factor \(X^k\) to the total risk \(X\).
Throughout, we will consider that \((X^1, \ldots, X^d)\) is a \(d\)-dimensional normal Gaussian random variable.
Multivariate Gaussian distribution
The multivariate Gaussian distribution is the multidimensional extension of a Gaussian distribution for a vector of random variables \(X=(X^1, \ldots, X^d)\). The parameters are the mean \(\mathbf{\mu} = (\mu^1, \ldots, \mu^d)\) and the covariance matrix (positive semi definite matrix) \(\mathbf{\Sigma} \in \mathbb{R}^{d\times d}\).
The pdf
of this multidimensional random variable is given by
Denoting by \(\sigma^k = \sqrt{\mathbf{\Sigma}^{k,k}}\) the square root of the diagonal of \(\mathbf{\Sigma}\), then each random variable \(X^k\) is a normal distribution \(\mathcal{N}(\mu^k, (\sigma^k)^2)\).
However, the different components of the random vector might be dependent as
4.1 Extend the functions mc_AVaR0
, mc_AVaR1
and mc_AVaR2
to multidimensional samples. Use the following two case scenarios to compare accuracy and execution time by varying \(N\) the number of samples
# case of 2 dimensions
import numpy as np
from scipy.stats import multivariate_normal
mu = np.array([0.2, 0.5])
Sigma = np.array(
[
[ 1. , -0.26315789],
[-0.26315789, 1. ]
]
)
w = np.array([2, 5])
RV_2dim = multivariate_normal(mean = mu, cov = Sigma)
# generate N random samples (you get a numpy array Nx2)
N = 10
samples = RV_2dim.rvs(N)
# case of 5 dimensions
mu = np.array([0.2, 0.5, -0.1, 0, 0.6])
Sigma = np.array(
[
[1. , 0.2688825 , 0.401427 , 0.19473116, 0.66256879],
[0.2688825 , 1. , 0.3907619 , 0.43373298, 0.43199657],
[0.401427 , 0.3907619 , 1. , 0.27893741, 0.61330745],
[0.19473116, 0.43373298, 0.27893741, 1. , 0.46849892],
[0.66256879, 0.43199657, 0.61330745, 0.46849892, 1. ]
]
)
w = np.array([2, 5, -2, 3, 6])
RV_5dim = multivariate_normal(mean = mu, cov = Sigma)
# generate N random samples (you get a numpy array Nx5)
N = 10
samples = RV_5dim.rvs(N)
Hint: Do not forget that numpy gives you access to the dot
product to compute \(\sum w^k X^k\).
4.2 Optional: you can try for the two dimensional case to implement with dblquad
the direct computation of the AVaR1
to compare speed and accuracy.
Any dimensional multidimensional Gaussian distributions
To create a multidimensional guassian vector, you need to provide the mean vector \(\mathbf{\mu} = (\mu^1, \ldots, \mu^d)\) as well as the covariance matrix \(\mathbf{\Sigma}\) which is usually calibrated to data.
In our case we can generate some of these distribution using numpy
and scipy
# we import the multivariate normal as well as a function to generate Sigma
from scipy.stats import multivariate_normal, random_correlation
import numpy as np
# create a random vector of positive eigenvalues and then random covariance
d=4
eigenvalues = np.random.rand(d)
Sigma = random_correlation.rvs(eigenvalues) # covariance matrix
mu = no.random.rand(d) # vector of mean
RV = multivariate_normal(mean = mu, cov = Sigma)
# generate random samples (N random vectors or dimension d each so Nxd numpy array)
N = 1000
samples = RV.rvs(N)
print(samples)