====== Week3 - Probability Review Continued====== ===== 3.1 - Location-scale Model ===== * Reverse of standardization. So we build: \(X = \mu_X + \sigma_X \cdot Z, ~~ Z \sim N(0,1)\) * So the location is the means here and the scale is the sigma value. ==== Quantiles of normal distribution ==== * \(Z \sim N(0,1)\), \(Pr(z \le z_\alpha) = \alpha\) * For the general normal distribution, we get: \(q_\alpha^X = \mu_X + \sigma_X \cdot z_\alpha \) ===== 3.2 - Bivariate Discrete Distributions ===== * We have 2 discrete rv X and Y. * p(x,y) = Pr(X = x, Y = y). * Sample space is noted: \(S_{XY}\) ==== Marginal pdfs ==== * \(p(x) = Pr(X = x) = \sum\limits_{y \in S_Y} p(x,y)\) and similarly: * \(p(y) = Pr(Y = y) = \sum\limits_{x \in S_X} p(x,y)\) ==== Conditional probability ==== * Suppose we know Y = 0, how does this affect the probabilities of X ? \[\begin{align} Pr(X=0 | Y=0) & = \frac{Pr(X=0, Y=0)}{Pr(Y=0)} \\ & = \frac{\text{joint probability}}{\text{marginal probability}}\end{align}\] => X depends on Y, so \(Pr(X=0|Y=0) \neq Pr(X=0)\) * So we have \(p(x|y) = Pr(X=x | Y=y) = \frac{Pr(X=x, Y=y)}{Pr(Y=y)} \) * And \(p(y|x) = Pr(Y=y | X=x) = \frac{Pr(X=x, Y=y)}{Pr(X=x)} \) ==== Conditional Mean and Variance ==== * \(\mu_{X|Y=y} = E[X|Y=y] = \sum\limits_{x \in S_X} x \cdot Pr(X=x |Y=y)\) * \(\mu_{Y|X=x} = E[Y|X=x] = \sum\limits_{y \in S_Y} y \cdot Pr(Y=y |X=x)\) * \(\sigma_{X|Y=y}^2 = Var(X|Y=y) = \sum\limits_{x \in S_X} (x-\mu_{X|Y=y})^2 \cdot Pr(X=x |Y=y)\) * and similarly for \(\sigma_{Y|X=x}^2\) * Most of the time, the conditional variances will be smaller than the unconditioned variances. ==== Independence ==== * Given rv X and Y are independent rv if and only if: \(p(x,y) = p(x) \cdot p(y) ~~ \forall x \in S_X, ~ \forall y \in S_Y\) * If X and Y are independent, then: \[p(x|y) = p(x) \forall x \in S_X, ~ \forall y \in S_Y\] \[p(y|x) = p(y) \forall x \in S_X, ~ \forall y \in S_Y\] ===== 3.3 - Bivariate Continuous Distributions ===== * The joint pdf of X and Y is a non-negative function f(x,y) such that: \[ \int_{-\infty}^\infty \int_{-\infty}^\infty f(x,y) dx dy = 1\] * Let \([x_1,x_2]\) and \([y_1,y_2]\) be intervals on the real line. Then: \[Pr(x_1 \le X \le x_2, y_1 \le Y \le y_2) = \int_{x_1}^{x_2} \int_{y_1}^{y_2} f(x,y) dx dy\] ==== Marginal and conditional distributions ==== * given continuous rv X, Y, we have the marginal pdfs: * \(f(x) = \int_{-\infty}^\infty f(x,y) dy \) * \(f(y) = \int_{-\infty}^\infty f(x,y) dx \) * The conditional pdf of X given Y=y is: \(f(x|y) = \frac{f(x,y)}{f(y)}\) * The conditional pdf of Y given X=x is: \(f(y|x) = \frac{f(x,y)}{f(x)}\) * Conditional means are computed as: \[\mu_{X|Y=y} = E[X|Y=y] = \int x \cdot p(x|y) dx\] * Conditional variances are computed as: \[\sigma_{X|Y=y}^2 = Var(X|Y=y) = \int (x-\mu_{X|Y=y})^2 p(x|y) dx\] ==== Independence ==== * Let X and Y be continuous rv. X and Y are independent if and only if: \[f(x,y) = f(x)f(y)\] * Or equivalently: \[f(x|y) = f(x), for -\infty \lt x,y \lt \infty\] \[f(y|x) = f(y), for -\infty \lt x,y \lt \infty\] * Example if \(X \sim N(0,1)\) and \(Y \sim N(0,1)\) and X,Y are independent. Then \[f(x,y) = f(x)f(y) = \frac{1}{2\pi} e^{- \frac 12 (x^2+y^2)}\] * in R we use the **mvtnorm** package to compte those multi variate integrals. ===== 3.4 - Covariance ===== * Covariance: measures direction but not strength of linear relationship between 2 rv's: \[\begin{align} \sigma_{XY} & = E[(X-\mu_X)(Y - \mu_Y)] \\ & = \sum\limits_{x,y \in S_ {XY}} (x-\mu_X)(y-\mu_Y) \cdot p(x,y) ~~ \text{(for discret rvs)} \\ & = \int_{-\infty}^\infty \int_{-\infty}^\infty (x-\mu_X)(y-\mu_Y) f(x,y) dx dy ~~ \text{(for continuous rvs)}\end{align}\] * Correlation: measures direction and strength of linear relationship between 2 rv's: \[\begin{align} \rho_{XY} & = Cor(X,Y) = \frac{Cov(X,Y)}{SD(X) \cdot SD(Y)} \\ & = \frac{\sigma_{XY}}{\sigma_X \cdot \sigma_Y} = \text{scaled covariance}\end{align}\] => This is sometimes called the **pearson correlation** ==== Properties of Covariance ==== * Cov(X,Y) = Cov(Y,X) * Cov(aX, bY) = a*b*Cov(X,Y) * Cov(X,X) = Var(X) * X and Y are independent => Cov(X,Y) = 0 * But Cov(X) = 0 does **not** imply that X and Y are independent. * Cov(X,Y) = E[XY] - E[X]E[Y] ===== 3.5 - Correlation and the Bivariate Normal Distribution ===== * Correlation is always bounded between -1 and 1 (it is unit free). ==== Properties of Correlation ==== * \(-1 \le \rho_{XY} \le 1\) * \(\rho_{XY} = 1\) if Y = aX + b and a > 0 * \(\rho_{XY} = -1\) if Y = aX + b and a < 0 * \(\rho_{XY} = 0\) if and only if \(\sigma_{XY} = 0\) * \(\rho_{XY} = 0\) does **not* imply that X and Y are independent in general. * \(\rho_{XY} = 0\) does imply that X and Y are independent if they are normal dists. ==== Bivariate normal distribution ==== * Let X and Y be distributed bivariate normal. The joint pdf is given by: \[f(x,y) = \frac{1}{2\pi \sigma_X \sigma_Y \sqrt{1-\rho^2}} \times \\ exp \left[ - \frac{1}{2(1-\rho^2)} \left[ \left( \frac{x - \mu_X}{\sigma_X} \right)^2 + \left( \frac{y - \mu_Y}{\sigma_Y} \right)^2 - \left( \frac{2 \rho(x-\mu_X)(y-\mu_Y)}{\sigma_X \sigma_X} \right) \right] \right] \] ===== 3.6 - Linear Combination of 2 random Variables ===== * Let X and Y be rv's. We define Z as: Z = aX + bY. Then: * \(\mu_Z = a \cdot \mu_X + a \cdot \mu_Y\) * and \(\sigma_Z^2 = a^2 \sigma_X^2 + b^2 \sigma_Y^2 + 2a\cdot b \cdot \sigma_{XY}\) * if \(X \sim N(\mu_X,\sigma_X^2)\) and \(Y \sim N(\mu_Y,\sigma_Y^2)\), then \(Z \sim N(\mu_Z,\sigma_Z^2)\) ===== 3.7 - Portfolio Example ===== * Portfolio return = \(R_P = x_A \cdot R_A + x_B + R_B\) * \(x_A + x_B = 1\) * \(Cov(R_A,R_B) = \sigma_{AB}\) and \(Cor(R_A,R_B) = \frac {\sigma_{AB}}{\sigma_A \sigma_B}\) * \(E[R_P] = x_A \mu_A + x_B \mu_B\) * \(Var(R_P) = x_A^2 \sigma_A^2 + x_B^2 \sigma_B^2 + 2 x_A x_B \sigma_{AB}\) ==== Linear combination of N rvs ==== * Let \(Z = \sum\limits_{i=1}^N a_i X_i\) * Then: \(\mu_Z = \sum\limits_{i=1}^N a_i \mu_i\) * If all X rv are normally distributed, the Z is also normally distributed.