\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \(\newcommand {\intertext }[1]{\text {#1}\notag \\}\) \(\let \Hat \hat \) \(\let \Check \check \) \(\let \Tilde \tilde \) \(\let \Acute \acute \) \(\let \Grave \grave \) \(\let \Dot \dot \) \(\let \Ddot \ddot \) \(\let \Breve \breve \) \(\let \Bar \bar \) \(\let \Vec \vec \) \(\require {mathtools}\) \(\newenvironment {crampedsubarray}[1]{}{}\) \(\newcommand {\smashoperator }[2][]{#2\limits }\) \(\newcommand {\SwapAboveDisplaySkip }{}\) \(\newcommand {\LaTeXunderbrace }[1]{\underbrace {#1}}\) \(\newcommand {\LaTeXoverbrace }[1]{\overbrace {#1}}\) \(\newcommand {\LWRmultlined }[1][]{\begin {multline*}}\) \(\newenvironment {multlined}[1][]{\LWRmultlined }{\end {multline*}}\) \(\let \LWRorigshoveleft \shoveleft \) \(\renewcommand {\shoveleft }[1][]{\LWRorigshoveleft }\) \(\let \LWRorigshoveright \shoveright \) \(\renewcommand {\shoveright }[1][]{\LWRorigshoveright }\) \(\newcommand {\shortintertext }[1]{\text {#1}\notag \\}\) \(\newcommand {\vcentcolon }{\mathrel {\unicode {x2236}}}\) \(\newcommand {\toprule }[1][]{\hline }\) \(\let \midrule \toprule \) \(\let \bottomrule \toprule \) \(\def \LWRbooktabscmidruleparen (#1)#2{}\) \(\newcommand {\LWRbooktabscmidrulenoparen }[1]{}\) \(\newcommand {\cmidrule }[1][]{\ifnextchar (\LWRbooktabscmidruleparen \LWRbooktabscmidrulenoparen }\) \(\newcommand {\morecmidrules }{}\) \(\newcommand {\specialrule }[3]{\hline }\) \(\newcommand {\addlinespace }[1][]{}\)

Extreme value distributions

This version: 2024-05-29

First version: 2024-05-29

Summary

This article explores two essential extreme value distributions: the Generalized Extreme Value (GEV) and the Generalized Pareto (GPD). The GEV distribution is particularly suited for modeling the distribution of sample maxima, while the GPD captures the behavior of losses exceeding a predefined threshold. We provide a clear explanation of each distribution, along with illustrative examples using financial data (S&P 500 index returns).

1 Extreme value distributions

This section is based on (Alexander, 2008, Section I.3.3.9) and some information can also be found in (Alexander, 2009, Section IV.3.4.2). Extreme value distributions are commonly used to quantify probabilities of exceptional losses and to improve the accuracy of estimates of very low quantiles, such as the 0.001 quantile or even smaller.

In practice, we often encounter the following two extreme value distributions:

• generalized extreme value (GEV) distribution. This distribution can fit any data but is commonly used to fit the distribution of sample maxima (or minima),
• generalized Pareto (GPD) distribution which captures losses over a certain predefined threshold.

In the following sections, we will delve deeper into the properties and applications of both the GEV and GPD distributions, providing examples and insights into their use in real-world use cases.

1.1 Generalized Extreme Value (GEV) distribution

GEV is a distribution of extreme values, where by each extreme, it is meant the maximum of a subsample. Thus it can be understood as a distribution of maxima.

Let’s assume we compute the maximum loss each week, i.e., we compute many maximum weekly loss values \(X\), each defined as \({X_k} = \max [{x_{k}},{x_{k + 1}},{x_{k + 2}},{x_{k + 3}},{x_{k + 4}}]\) (each \(x_k\) being 1-day loss¹), where these 5-day windows are non-overlapping. Then \(X_k\) has a GEV distribution, so GEV is the distribution of maximum values. Note that GEV does not capture the distribution of 5-day losses, only the 1-day maximum loss in each week.

A good alternative GEV example is from weather forecasting, where GEV can be used to model extreme 1-day precipitation each week.

In Figure 1.1 we show the distribution of the maximum 1-day loss each week in the S&P 500 index and the fitted GEV distribution (dataset 01-2001 till 02-2024). Note that the positive values are losses (i.e. each ’return’ shows -PnL rather than PnL). It is easy to see that on a real-world dataset the fitted GEV distribution closely matches the empirical distribution.

The PDF of GEV is defined as

\begin{equation*} {f_X}(x) = \frac {1}{\sigma }t{(x)^{\xi + 1}}{e^{ - t(x)}}, \end{equation*}

where

\begin{equation*} t(x) = \begin{cases} \left [ 1 + \xi \left ( \frac {x - \mu }{\sigma } \right ) \right ]^{-\frac {1}{\xi }} & \text {if } \xi \neq 0 \\ e^{-\frac {x-\mu }{\sigma }} & \text {if } \xi = 0 \end {cases}. \end{equation*}

The CDF is simply

\begin{equation*} {F_X}(x) = {e^{ - t(x)}}. \end{equation*}

Here,

• \(\mu \) is the location parameter,
• \(\sigma \) is a scale parameter,
• \(\xi \) is a shape parameter, also known as tail index.

Note: The shape parameter in python’s scipy.genextreme distribution is understood as having the opposite sign to \(\xi \). I.e. shape(scipy)=-xi.

¹ i.e. \(x>0\) is loss, \(x<0\) is a profit.

1.2 Generalized Pareto distribution (GPD)

As explained above, this distribution captures the distribution of losses that exceed certain loss threshold \(u>0\). As before, we will use +ve quantities for losses and we assume the loss variable is \(X\).

If we know the underlying CDF \(F_X\) of \(X\), we can write the CDF of the excess losses \((X-u)\) as

\begin{equation*} G(x) = \Pr \left [ {X - u \le x|X > u} \right ] = \frac {{F_X(x + u) - F_X(u)}}{{1 - F_X(u)}}. \end{equation*}

For many choices of the underlying distributions \(F\), \(G\) will belong to a class of Generalized Pareto distribution given by the PDF (1.1) and CDF (1.2)

\begin{equation} \label {eq:GPD_pdf} g(x) = \frac {1}{\sigma }{\left ( {1 + \xi \frac {{x - \mu }}{\sigma }} \right )^{\left ( { - \frac {1}{\xi } - 1} \right )}}, \end{equation}

\begin{equation} \label {eq:GPD_cdf} G(x) = \begin{cases} 1 - {\left ( {1 + \xi \frac {{x - \mu }}{\sigma }} \right )^{ - \frac {1}{\xi }}} & \text {if } \xi \ne 0 \\ 1 - \exp \left ( { - \frac {{x - \mu }}{\sigma }} \right ) & \text {if } \xi = 0 \end {cases}. \end{equation}

The meaning of the parameters \(\mu , \sigma , \xi \) is the same as in the case of GEV distribution described in the Section 1.1. The support of the distribution is \([\mu , + \infty )\) so we often set \(\mu =0\) to make the excess loss distribution effectively start at \(u\).

As an example, we fit GPD to one day S&P 500 (01-2001 to 02-2024) return series with threshold \(u=0.02\), i.e. to losses that are at least 2%. Apparently, the resulting fit is quite good, the empirical data and the fit is well visible in Figure 1.2 and Figure 1.3 where the empirical cumulative distribution function and fitted-GPD CDF are plotted.

2 Code

# imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from scipy.stats import genpareto, genextreme, ecdf
import yfinance as yf # Yahoo finance data downloader (pip install yfinance)

Next step is to get the S&P data, compute daily returns and convert these to losses

# use yfinance to download daily S&P data
data = yf.download('^GSPC', start='2001-01-01', end='2024-02-28')
prices = data['Close']
print(data.head())

# compute daily losses (1d loss = 1d return with opposite sign)
returns = prices.pct_change()
losses = -returns

2.1 GEV

We start by computing max 1-day losses in 5-day windows. Then GEV distribution is fitted.

# GEV distribution
# create samples of max 1d loss in each 5d window

window = 5 # 5d window of daily returns

# Calculate how much padding is needed
padding_needed = window - (len(losses) % window)

# Pad the data with NaN (or any other desired value) at the end
padded_daily_losses = np.pad(losses.values, (0, padding_needed), 'constant', constant_values=np.nan)

losses_samples = np.reshape(padded_daily_losses, (-1, window))
losses_5d_window = np.nanmax(losses_samples, axis = 1)

shape, loc, scale = genextreme.fit(losses_5d_window)
print(shape, loc, scale)

and we can plot the empirical distribution and the fitted one

x = np.arange(-0.02, 0.12, 0.001)

fig, ax = plt.subplots(ncols = 2, figsize = (12,3))
pd.Series(genextreme.pdf(x, shape, loc, scale), index = x).plot(ax = ax[1], linestyle = '-')
pd.Series(losses_5d_window).hist(bins = 50, ax = ax[0])
ax[0].set_title('Histogram of 1-day max losses in 5 day window')
ax[0].grid(linestyle = ':')
ax[0].set_xlabel('loss (return)')
ax[1].set_title('Fitted GEV')
ax[1].grid(linestyle = ':')
ax[1].set_xlabel('loss (return)')
plt.tight_layout()

3 GPD

Let’s firs compute the excess losses, considering the excess loss threshold is \(u=2\%\)

loss_thr = 0.02
pareto_losses = losses[losses>loss_thr]
pareto_excess_losses = pareto_losses-loss_thr
pareto_excess_losses

Next, let’s fit the GPD distribution to these excess losses. We fix \(\mu =0\), to make the excess loss distribution start at \(0\), i.e. (losses start at \(2\%\)).

shape, loc, scale = genpareto.fit(pareto_excess_losses, floc=0.0) #fit| given mu = 0
print(shape, loc, scale)

Then, we can generate the density plots in Figure 1.2.

x = np.arange(0.0000, 0.1, 0.0001)

fig, ax = plt.subplots(ncols = 2, figsize = (12,3))
pd.Series(pareto_excess_losses).plot.hist(bins = 40, ax = ax[0])
pd.Series(genpareto.pdf(x, shape, loc, scale), index = x).plot(ax = ax[1],




   linestyle = '-', color = 'r')
ax[0].set_title('Histogram excess losses $X-u$')
ax[0].grid(linestyle = ':')
ax[0].set_xlabel('excess loss ($X-u$)')
ax[0].set_ylabel(None)
ax[1].set_title('pdf of GPD fitted to excess losses $X-u$')
ax[1].grid(linestyle = ':')
ax[1].set_xlabel('excess loss ($X-u$)')
plt.tight_layout()

and the comparison of ECDF and fitted GPD CDF, similar to Figure 1.3.

fig, ax = plt.subplots(figsize = (5,3))
pd.Series(ecdf(pareto_excess_losses).cdf.probabilities,


  index = ecdf(pareto_excess_losses).cdf.quantiles).plot(ax = ax)
pd.Series(genpareto.cdf(x, shape, loc, scale), index = x).plot(ax = ax, color = 'r')
ax.set_title('Excess losses $X-u$: ECDF and fitted GPD CDF')
ax.grid(linestyle = ':')
ax.set_xlabel('excess loss ($X-u$)')
ax.set_ylabel(None)
ax.legend(['ECDF', 'fitted GPD CDF'])
plt.tight_layout()

References

Alexander, Carol (Apr. 2008). Market risk analysis I. en. The Wiley Finance Series. Chichester, England: John Wiley & Sons.
— (Jan. 2009). Market risk analysis IV. en. The Wiley Finance Series. Chichester, England: John Wiley & Sons.