public:courses:machine_learning:machine_learning:anomaly_detection

XV - Anomaly Detection

  • Given a dataset of non anomilous samples, we want to decide if a new example is an anomaly or not.
  • To acheive this, we build a model p(x). Then we can decide that if \(p(x_{test}) \le \epsilon\) then we flag the example as anomaly. Otherwise, the example is considered OK.
  • Anomaly detection is often used for fraud detection.
  • If X is a distributed gaussian with mean \(\mu\) and variance \(\sigma^2\), then: \(X \sim N(\mu,\sigma^2) \)
  • \(\sigma\) is also called the standard deviation.
  • The gaussian distribution is \(p(x;\mu,\sigma^2) = \frac{1}{\sqrt{2\pi} \cdot \sigma} exp ^{- \frac{(x-\mu)^2}{2\sigma^2} } \).
  • How to estimate the parameters \(\mu\) and \(\sigma\) given a dataset that we suspect to be following gaussian distribution ?
  • \(\mu = \frac{1}{m} \sum\limits_{i=1}^m x^{(i)} \)
  • Then \(\sigma^2 = \frac{1}{m} \sum\limits_{i=1}^m (x^{(i)} - \mu)^2 \)
  • The value computed above are the maximum likelyhood estimation of the parameters.
  • We start with a training set \({x^{(1)},\dots,x^{(m)}}\)
  • For each example \(x \in \mathbb{R}^n\), we then compute the probability: \(p(x)=p(x_1;\mu_1, \sigma_1^2) \cdot p(x_2;\mu_2, \sigma_2^2) \cdots p(x_n;\mu_n, \sigma_n^2) \).
  • We assume here that each feature is following a gaussian distribution with specific mean and variance.
  • We can also write: \( p(x) = \prod\limits_{j=1}^n p(x_j;\mu_j, \sigma_j^2) \)
  • We now assume we have some labeled data (y=0 if normal, y=1 if anomalous).
  • Typically we will have between 2-50 anomalous examples.
  • From the training set we compute \(\mu_1,\sigma_1,\mu_2,\sigma_2,\dots\mu_n,\sigma_n\)
  • We fit p(x) on the training set,
  • On the cross validation set we predict y given x (y=1 if p(x) < \(\epsilon\), y=0, otherwise).
  • Possible evaluation metrics:
    • True positive/ false positive, false negative, true negative
    • Precision/Recall
    • F1-score
  • We can use the cross validation set to choose parameter \(\epsilon\).
  • Use Anomaly detection if:
    • Very small number of positive examples (y=1) for instance 0-20 with very large number of negative examples (y=0).
    • Many different “types” of anomalies (future anomalies may look completely different)
  • Use Supervised learning if:
    • Large number of both positive and negative examples.
    • Future positives examples might look like the current positives.
  • If a feature is not gaussian distributed, then we can try to transform it to get back to gaussian distribution (for instance by taking the log(), replacing x with log(x+c), or maybe replacing x with x^c).
  • To test this on the fly with Octave, we can use the function:
    hist(x.^0.1, 50)
    hist(log(x),50)
  • Also called Multivariate Normal Distribution.
  • The parameters are: \(\mu \in \mathbb{R}^n, \Sigma \in \mathbb{R}^{n\times n}\)
  • The probability formula is then: \(p(x; \mu, \Sigma) = \frac{1}{(2\pi)^{\frac n2} \|\Sigma\|^{\frac 12}} exp(-\frac 12 (x-\mu)^T \Sigma^{-1} (x-\mu))\)
  • \(\|\Sigma\|\) is the determinant if the matrix. It can be computed in Octave with:
    det(Sigma)
  • We can compute the mean and sigma as: \(\mu = \frac 1m \sum\limits_{i=1}^m x^{(i)} \in \mathbb{R}^n \) and \(\Sigma = \frac 1m \sum\limits_{i=1}^m (x^{(i)} - \mu) \cdot (x^{(i)} - \mu)^T \in \mathbb{R}^{n \times n}\)
  • Note that the original model is a special case of the multivariate case with \(\Sigma\) being a diagonal matrix with the diagonal containing the values \(\sigma_1^2,\sigma_2^2,\dots,\sigma_n^2\).
  • The original model can be used when we are creating extra features to capture anomalies due to unusual combinations. Its cheaper, and scales better (for n=10000 or n=100000)
  • The multivariate Gaussian is used to automatically capture correlations between features. But its more computationally expensive. And we must have m > n or the Sigma matrix is non-invertible. Rule of thumb would be to have m > 10*n. (can also be non -invertible if we have some redondant features)
  • public/courses/machine_learning/machine_learning/anomaly_detection.txt
  • Last modified: 2020/07/10 12:11
  • by 127.0.0.1