public:courses:finance:computational_finance:probability_review

Week2 - Probability Review

  • A random variable (rv) X can take value on a sample space \(S_X\).
  • It is distributed following a probability distribution function (pdf).
  • Can only take a finite number of values.
  • \(\forall x \in S_X : 0 \le p(x) \le 1\)
  • \(\forall x \notin S_X: p(x) = 0\)
  • \(\sum\limits_{x \in S_X} p(x) = 1\)

Bernoulli distribution

  • We note X=1 on success and X=0 on failure.
  • \(Pr(X=1)=\pi\) and \(Pr(X=0)=1-\pi\)
  • Then we have the pdf: \(p(x)=Pr(X=x)=\pi^x(1-\pi)^x\) for \(x \in {0,1}\)
  • In that case we have a probability curve \(f(x)\)
  • And we can measure probability on intervals A: \(Pr(X \in A) = \int_A f(x) dx\)
  • \(\forall x: f(x) \ge 0\) and \(\int_{-\infty}^\infty f(x) dx = 1\)

Uniform distribution over [a,b]

  • assuming b>a here.
  • We note: \(X \sim U[a,b]\) and we have the pdf: \(f(x) = \begin{cases} \frac{1}{b-a} & \text{for } a \le x \le b \\ 0 & \text{otherwise}\end{cases}\)
  • The CDF function F for a rv X is: \(F(x) = Pr(X \le x)\)
  • \(x_1 \lt x_2 \Rightarrow F(x_1) \le F(x_2)\)
  • \(F(-\infty) = 0\) and \(F(\infty) = 1\)
  • \(Pr(X \ge x) = 1 - F(x)\)
  • \(Pr(x_1 \le x \le x_2) = F(x_2) - F(x_1)\)
  • \(\frac{d}{dx} F(x) = f(x)\) if X is a continuous rv.
  • Also note that for a continuous rv: \(Pr(X\le x) = Pr(X\lt x)\) and \(Pr(X=x)=0\)
  • Given an X rv with continuous CDF \(F_X(x) = Pr(X \lt x)\): The \(\alpha\)* 100% quantile of \(F_X\) for \(\alpha \in [0,1]\) is the value \(q_\alpha\) such that \(F_X(q_\alpha) = Pr(X \lt q_\alpha) = \alpha\).
  • The area to the left of \(q_\alpha\) is \(\alpha\) under the probability curve.
  • If the inverse CDF function exists, then: \(q_\alpha = F_X^{-1}(\alpha)\)
  • The 50% quantile is also called the median
  • For a dist U[0,1] for instance we have \(F(x)=x \Rightarrow q_\alpha=\alpha\)
  • If X is a rv such as \(X \sim N(0,1)\), then: \(f(x) = \phi(x) = \frac{1}{\sqrt{2\pi}} exp\left( - \frac12 x^2 \right)\) for \(-\infty \le x \le \infty\).

\[\Phi(x) = Pr(X \le x) = \int_{-\infty}^x \phi(z)dz\]

  • We have the important ranges:

\[Pr(-1 \le x \le 1) \approx 0.67\] \[Pr(-2 \le x \le 2) \approx 0.95\] \[Pr(-3 \le x \le 3) \approx 0.99\]

  • In Excel:
    • we can use the function NORMSDIST to get the \(\Phi(z)\) or the \(\phi(z)\) values.
    • we can use the function NORMSINV to get the \(\Phi^{-1}(\alpha)\) value.
  • In R:
    • We use pnorm to compute \(\Phi(z)\)
    • We use qnorm to compute \(\Phi^{-1}(z)\)
    • We use dnorm to compute \(\phi(z)\)
  • Other noticeable relations on the std distribution:

\[Pr(X\le z) = 1 - Pr(X \ge z)\] \[Pr(X\ge z) = Pr(X \le -z)\] \[Pr(X\ge 0) = Pr(X \le 0) = 0.5\]

  • Expected Value or Mean: Center of mass
  • Variance and standard deviation: spread about mean
  • Skewness: symmetry about mean
  • Kurtosis: Tail thickness
  • For discrete rv: \(E[X] = \mu_X = \sum\limits_{x \in S_X} x \cdot p(x)\)
  • For continuous rv: \(E[X] = \mu_X = \int_{-\infty}^\infty x \cdot f(x) dx\)
  • If \(X \sim N(0,1)\) then \(\mu_X = \int_{-\infty}^\infty x \cdot \frac{1}{\sqrt{2\pi}} e^{-\frac 12 x^2} dx = 0\)
  • Let g(X) be some function of the rv X. Then
    • For discrete rv: \(E[g(X)] = \sum\limits_{x \in S_X} g(x) \cdot p(x)\)
    • For continuous rv: \(E[g(X)] = \int_{-\infty}^\infty g(x) \cdot f(x) dx\)
  • \(g(X) = (X - E[X])^2 = (X - \mu_X)^2\)
  • \(Var(x) = \sigma_X^2 = E[g(X)] = E[(X-\mu_X)^2] = E[X^2] - \mu_X^2\)
  • \(SD(X) = \sigma_X = \sqrt{Var(X)}\)
  • Note that Var(X) is in squared units of X, whereas SD(X) is in the same unit as X.
  • Concretely:
    • For discrete rv: \(\sigma_X^2 = \sum\limits_{x \in S_X} (x - \mu_X)^2 \cdot p(x)\)
    • For continuous rv: \(\sigma_X^2 = \int_{-\infty}^\infty (x - \mu_X)^2 \cdot f(x) dx\)
  • If \(X \sim N(\mu_X,\sigma_X^2)\), then:

\[f(x) = \frac{1}{\sqrt{2\pi \sigma_X^2}} exp\left( - \frac 12 \left(\frac{x-\mu_X}{\sigma_X} \right)^2\right)\]

  • Note that we still have 67% of probability in the range \([\mu_X - \sigma_X, \mu_X + \sigma_X]\).
  • For this general normal distribution, we also have the relation with the standard normal distribution quantile function: \(q_\alpha = \mu_X + \sigma_X \cdot \Phi^{-1}(\alpha) = \mu_X + \sigma_X \cdot z_\alpha\)
  • In Excel:
    • NORMDIST(x,mu_X,sigma_X,cummulative): if commulative==true, computes \(Pr(X \le x)\), otherwise compute \(f(x) = \frac{1}{\sqrt{2\pi \sigma_X^2}} exp\left( - \frac 12 \left(\frac{x-\mu_X}{\sigma_X} \right)^2\right)\)
    • NORMINV(alpha, mu, sigma) computes \(q_\alpha = \mu_X + \sigma_X \cdot z_\alpha\)
  • In R:
    • simulate data: rnorm(n,mean,sd)
    • compute CDF: pnorm(q, mean, sd)
    • compute quantiles: qnorm(p,mean,sd)
    • compute density: dnorm(x,mean, sd)
  • Typically for return rate computation, if we consider: \(R_A \sim N(\mu_A,\sigma_A^2)\) and \(R_B \sim N(\mu_B,\sigma_B^2)\), then typically, if \(\mu_A > \mu_B\), then we will also find that \(\sigma_A > \sigma_B\).
  • If we model a return \(R_t \sim N(0.05,(0.50)^2)\). Then even if we know that \(R_t \ge -1\), we will compute that: \(Pr(R_t < -1) = 0.018\) (which is wrong!).
  • normal distribution is more appropriate for cc returns:
    • \(r_t = ln(1+R_t)\)
    • \(r_t\) can take on values less than -1.
  • \(X \sim N(\mu_X,\sigma_X^2), -\infty \lt X \lt \infty\)
  • Then we can define \(Y = exp(X) \sim lognormal(\mu_X,\sigma_X^2), 0 \lt Y \lt \infty\)
  • \(E[Y] = \mu_Y = exp(\mu_X + \frac{\sigma_X^2}{2})\)
  • \(Var[Y] = \sigma_Y^2 = exp(2\mu_X + \sigma_X^2)(exp(\sigma_X^2)-1)\)
  • positive skew is when we have a long “right tail”, eg. the main “blob” is on the left.
  • in R we have : rlnorm, plnorm, qlnorm and dlnorm.
  • \(g(X) = ((X - \mu_X)/\sigma_X)^3\)
  • \(Skew(X) = E\left[ \left(\frac{X - \mu_X}{\sigma_X} \right)^3 \right]\)
  • Skew(X)>0 is when we have a long “right tail”, eg. the main “blob” is on the left.
  • Skew(X)<0 is when we have a long “left tail”, eg. the main “blob” is on the right.
  • For symmetry distributions Skew(X)=0
  • For log normal distribution: \(Y \sim lognormal(\mu_X,\sigma_X^2)\) we have:

\[Skew(Y) = (exp(\sigma_X^2) +2) \sqrt{exp(\sigma_X^2) -1} \gt 0\]

  • \(g(X) = ((X-\mu_X)/\sigma_X)^4\)
  • \(Kurt(X) = E\left[ \left( \frac{X-\mu_X}{\sigma_X}\right)^4 \right]\)
  • For a general normal distribution \(X \sim N(\mu_X,\sigma_X^2)\) we get \(Kurt(X)=3\)
  • We then define the Excess kurtosis = Kurt(X) - 3.
    • If Excess kurtosis(X) > 0 ⇒ X has fatter tails than normal distribution
    • If Excess kurtosis(X) < 0 ⇒ X has thinner tails than normal distribution
  • Similar to normal distribution but with fatter tails (eg. larger kurtosis).
  • It has an additional parameter called the **degree of freedom“ “v”.
  • We note \(X \sim t_v\), and the pdf is:

\[f(x) = \frac{\Gamma(\frac{v+1}{2})}{\sqrt{2\pi}\Gamma(\frac v2)} \left( 1 + \frac{x^2}{v}\right)^{- \frac{v+1}{2}}, ~~ -\infty \lt x \lt \infty, ~~ v > 0 \]

  • With \(\Gamma(z) = \int_0^\infty t^{z-1}e^{-t}dt\) denoting the gamma function.
  • When \(v \rightarrow \infty\) then the Student-t distribution is exactly the normal distribution.
  • The smaller the degree of freedom parameter, the fatter are the tails of the distribution.
  • Properties of this distribution are:
    • \(E[X] = 0, ~~ v>1\)
    • \(Var(X) = \frac{v}{v-2}, ~~ v > 2\)
    • \(Skew(X) = 0, ~~ v > 3\)
    • \(excess kurt(X) = \frac{6}{v-4} - 3, ~~ v > 4\)
  • in R we have the functions: rt, pt, qt and dt related to this distribution.
  • In practice if v=60 then we can already consider that we have the normal distribution.
  • Let X be a discrete or continuous rc with \(\mu_X = E[X]\) and \(\sigma_X^2 = Var(X)\)
  • We define a new rv Y, such as: \(Y = g(X) = a \cdot X + b\)
  • Then we have: \(\mu_Y = a \cdot \mu_X + b\) and \(\sigma_Y = a \cdot \sigma_X\)
  • Let \(X \sim N(\mu_X,\sigma_X^2)\) and define \(Y = a \cdot X + b\). Then: \(Y \sim N(\mu_Y,\sigma_Y^2)\) with:

\[\mu_Y = a \cdot \mu_X + b\] \[\sigma_Y^2 = a^2 \cdot \sigma_X^2\]

  • Let \(X \sim N(\mu_X,\sigma_X^2)\). The standardized rv Z is created using:

\[\begin{align} Z & = \frac{X - \mu_X}{\sigma_X} = \frac{1}{\sigma_X} \cdot X - \frac{\mu_X}{\sigma_X} \\ & = a \cdot X + b \\ a & = \frac{1}{\sigma_X}, ~ b = -\frac{\mu_X}{\sigma_X} \end{align}\]

  • Thus we get: \(Z \sim N(0,1)\).
  • Eg. compute how much money we could loose with a specified probability \(\alpha\).
  • Assume R = simple monthly return. \(R \sim N(0.05, (0.10)^2)\)
  • \(\alpha\) is usually 5% or 1%.
  • End of month wealth \(W_1 = $10000 \cdot (1+R)\)
  • What is \(Pr(W_1 \lt $9000\)
  • What value of R produces \(W_1 = $9000\)
  • In general, the \(\alpha \times 100%\) Value-at-Risk \((VaR_\alpha)\) for an initial investment of \($W_0\) is computed as: \(VaR_\alpha = $W_0 \times q_\alpha\) where \(q_\alpha\) is the quantile of the simple return distribution.
  • Note that the Var is often reported as a positive number instead of a negative value.
  • r =ln(1+R)
  • We assume \(r \sim N(\mu_r,\sigma_r^2)\)
  • We then:
    • Compute the alpha quantile of the normal dist for r: \(q_\alpha^r = \mu_r + \sigma_r z_\alpha\)
    • Convert the alpha quantile for r into an alpha quantile for R: \(q_\alpha^R = e^{q_\alpha^r} - 1\)
    • We compute the \(VaR_\alpha\) using \(q_\alpha^R\): \(VaR_\alpha = $W_0 \cdot q_\alpha^R\)
  • public/courses/finance/computational_finance/probability_review.txt
  • Last modified: 2020/07/10 12:11
  • (external edit)