Graduationwoot

Dragon Notes

i

\( \newcommand{bvec}[1]{\overrightarrow{\boldsymbol{#1}}} \newcommand{bnvec}[1]{\overrightarrow{\boldsymbol{\mathrm{#1}}}} \newcommand{uvec}[1]{\widehat{\boldsymbol{#1}}} \newcommand{vec}[1]{\overrightarrow{#1}} \newcommand{\parallelsum}{\mathbin{\|}} \) \( \newcommand{s}[1]{\small{#1}} \newcommand{t}[1]{\text{#1}} \newcommand{tb}[1]{\textbf{#1}} \newcommand{ns}[1]{\normalsize{#1}} \newcommand{ss}[1]{\scriptsize{#1}} \newcommand{vpl}[]{\vphantom{\large{\int^{\int}}}} \newcommand{vplup}[]{\vphantom{A^{A^{A^A}}}} \newcommand{vplLup}[]{\vphantom{A^{A^{A^{A{^A{^A}}}}}}} \newcommand{vpLup}[]{\vphantom{A^{A^{A^{A^{A^{A^{A^A}}}}}}}} \newcommand{up}[]{\vplup} \newcommand{Up}[]{\vplLup} \newcommand{Uup}[]{\vpLup} \newcommand{vpL}[]{\vphantom{\Large{\int^{\int}}}} \newcommand{lrg}[1]{\class{lrg}{#1}} \newcommand{sml}[1]{\class{sml}{#1}} \newcommand{qq}[2]{{#1}_{\t{#2}}} \newcommand{ts}[2]{\t{#1}_{\t{#2}}} \) \( \newcommand{ds}[]{\displaystyle} \newcommand{dsup}[]{\displaystyle\vplup} \newcommand{u}[1]{\underline{#1}} \newcommand{tu}[1]{\underline{\text{#1}}} \newcommand{tbu}[1]{\underline{\bf{\text{#1}}}} \newcommand{bxred}[1]{\class{bxred}{#1}} \newcommand{Bxred}[1]{\class{bxred2}{#1}} \newcommand{lrpar}[1]{\left({#1}\right)} \newcommand{lrbra}[1]{\left[{#1}\right]} \newcommand{lrabs}[1]{\left|{#1}\right|} \newcommand{bnlr}[2]{\bn{#1}\left(\bn{#2}\right)} \newcommand{nblr}[2]{\bn{#1}(\bn{#2})} \newcommand{real}[1]{\Ree\{{#1}\}} \newcommand{Real}[1]{\Ree\left\{{#1}\right\}} \newcommand{abss}[1]{\|{#1}\|} \newcommand{umin}[1]{\underset{{#1}}{\t{min}}} \newcommand{umax}[1]{\underset{{#1}}{\t{max}}} \newcommand{und}[2]{\underset{{#1}}{{#2}}} \) \( \newcommand{bn}[1]{\boldsymbol{\mathrm{#1}}} \newcommand{bns}[2]{\bn{#1}_{\t{#2}}} \newcommand{b}[1]{\boldsymbol{#1}} \newcommand{bb}[1]{[\bn{#1}]} \) \( \newcommand{abs}[1]{\left|{#1}\right|} \newcommand{ra}[]{\rightarrow} \newcommand{Ra}[]{\Rightarrow} \newcommand{Lra}[]{\Leftrightarrow} \newcommand{rai}[]{\rightarrow\infty} \newcommand{ub}[2]{\underbrace{{#1}}_{#2}} \newcommand{ob}[2]{\overbrace{{#1}}^{#2}} \newcommand{lfrac}[2]{\large{\frac{#1}{#2}}\normalsize{}} \newcommand{sfrac}[2]{\small{\frac{#1}{#2}}\normalsize{}} \newcommand{Cos}[1]{\cos{\left({#1}\right)}} \newcommand{Sin}[1]{\sin{\left({#1}\right)}} \newcommand{Frac}[2]{\left({\frac{#1}{#2}}\right)} \newcommand{LFrac}[2]{\large{{\left({\frac{#1}{#2}}\right)}}\normalsize{}} \newcommand{Sinf}[2]{\sin{\left(\frac{#1}{#2}\right)}} \newcommand{Cosf}[2]{\cos{\left(\frac{#1}{#2}\right)}} \newcommand{atan}[1]{\tan^{-1}({#1})} \newcommand{Atan}[1]{\tan^{-1}\left({#1}\right)} \newcommand{intlim}[2]{\int\limits_{#1}^{#2}} \newcommand{lmt}[2]{\lim_{{#1}\rightarrow{#2}}} \newcommand{ilim}[1]{\lim_{{#1}\rightarrow\infty}} \newcommand{zlim}[1]{\lim_{{#1}\rightarrow 0}} \newcommand{Pr}[]{\t{Pr}} \newcommand{prop}[]{\propto} \newcommand{ln}[1]{\t{ln}({#1})} \newcommand{Ln}[1]{\t{ln}\left({#1}\right)} \newcommand{min}[2]{\t{min}({#1},{#2})} \newcommand{Min}[2]{\t{min}\left({#1},{#2}\right)} \newcommand{max}[2]{\t{max}({#1},{#2})} \newcommand{Max}[2]{\t{max}\left({#1},{#2}\right)} \newcommand{pfrac}[2]{\frac{\partial{#1}}{\partial{#2}}} \newcommand{pd}[]{\partial} \newcommand{zisum}[1]{\sum_{{#1}=0}^{\infty}} \newcommand{iisum}[1]{\sum_{{#1}=-\infty}^{\infty}} \newcommand{var}[1]{\t{var}({#1})} \newcommand{exp}[1]{\t{exp}\left({#1}\right)} \newcommand{mtx}[2]{\left[\begin{matrix}{#1}\\{#2}\end{matrix}\right]} \newcommand{nmtx}[2]{\begin{matrix}{#1}\\{#2}\end{matrix}} \newcommand{nmttx}[3]{\begin{matrix}\begin{align} {#1}& \\ {#2}& \\ {#3}& \\ \end{align}\end{matrix}} \newcommand{amttx}[3]{\begin{matrix} {#1} \\ {#2} \\ {#3} \\ \end{matrix}} \newcommand{nmtttx}[4]{\begin{matrix}{#1}\\{#2}\\{#3}\\{#4}\end{matrix}} \newcommand{mtxx}[4]{\left[\begin{matrix}\begin{align}&{#1}&\hspace{-20px}{#2}\\&{#3}&\hspace{-20px}{#4}\end{align}\end{matrix}\right]} \newcommand{mtxxx}[9]{\begin{matrix}\begin{align} &{#1}&\hspace{-20px}{#2}&&\hspace{-20px}{#3}\\ &{#4}&\hspace{-20px}{#5}&&\hspace{-20px}{#6}\\ &{#7}&\hspace{-20px}{#8}&&\hspace{-20px}{#9} \end{align}\end{matrix}} \newcommand{amtxxx}[9]{ \amttx{#1}{#4}{#7}\hspace{10px} \amttx{#2}{#5}{#8}\hspace{10px} \amttx{#3}{#6}{#9}} \) \( \newcommand{ph}[1]{\phantom{#1}} \newcommand{vph}[1]{\vphantom{#1}} \newcommand{mtxxxx}[8]{\begin{matrix}\begin{align} & {#1}&\hspace{-17px}{#2} &&\hspace{-20px}{#3} &&\hspace{-20px}{#4} \\ & {#5}&\hspace{-17px}{#6} &&\hspace{-20px}{#7} &&\hspace{-20px}{#8} \\ \mtxxxxCont} \newcommand{\mtxxxxCont}[8]{ & {#1}&\hspace{-17px}{#2} &&\hspace{-20px}{#3} &&\hspace{-20px}{#4}\\ & {#5}&\hspace{-17px}{#6} &&\hspace{-20px}{#7} &&\hspace{-20px}{#8} \end{align}\end{matrix}} \newcommand{mtXxxx}[4]{\begin{matrix}{#1}\\{#2}\\{#3}\\{#4}\end{matrix}} \newcommand{cov}[1]{\t{cov}({#1})} \newcommand{Cov}[1]{\t{cov}\left({#1}\right)} \newcommand{var}[1]{\t{var}({#1})} \newcommand{Var}[1]{\t{var}\left({#1}\right)} \newcommand{pnint}[]{\int_{-\infty}^{\infty}} \newcommand{floor}[1]{\left\lfloor {#1} \right\rfloor} \) \( \newcommand{adeg}[1]{\angle{({#1}^{\t{o}})}} \newcommand{Ree}[]{\mathcal{Re}} \newcommand{Im}[]{\mathcal{Im}} \newcommand{deg}[1]{{#1}^{\t{o}}} \newcommand{adegg}[1]{\angle{{#1}^{\t{o}}}} \newcommand{ang}[1]{\angle{\left({#1}\right)}} \newcommand{bkt}[1]{\langle{#1}\rangle} \) \( \newcommand{\hs}[1]{\hspace{#1}} \)

  UNDER CONSTRUCTION

Machine Learning:
Anomaly Detection



Anomaly Detection

Identifying datapoints that differ significantly from majority, and are indicative of unusual observations

    Algorithm:\(\up\)
  • [1] Choose features \(x_i\) that may indicate anomalous examples
  • [2] Fit parameters \((\bn{\mu},\bn{\sigma}^2)=([\mu_1,...,\mu_n],[\sigma_1^2,...,\sigma_n^2])\):
    \(\ds \mu_j = \frac{1}{m}\sum_{i=1}^m x_j^{(i)},\ \ \sigma_j^2 = \frac{1}{m}\sum_{i=1}^m (x_j^{(i)}-\mu_j)^2\)
  • [3] Given a new example \(x\), compute \(p(x)\):
    \(\ds p(x) = \prod_{j=1}^n p(x_j;\mu_j,\sigma_j^2) = \prod_{j=1}^n \frac{1}{\sqrt{2\pi}\sigma_j}\exp{\frac{(x_j-\mu_j)^2}{-2\sigma_j^2}}\)

  • - Semi-supervised learning algorithm; trained on un-labeled data, tested on labeled
  • - Useful in discovering hidden patterns, extracting features, detecting outliers
  • - Applied in: detecting fraud, device defects; signal noise filtering; medical diagnosis ...
Anomaly detection illustration

Evaluating Detection Systems

Anomaly detection systems learning per semi-supervised methods are trained on unlabeled data (assumed 'normal'), and tested on labeled data:

  • \(\sim 60\%\) 'normal': \(\hspace{137px}\)  Training set (unlabeled)
  • \(\sim 20\%\) 'normal' \(+\ 50\%\) anomalous:  Cross-validation set (labeled)
  • \(\sim 20\%\) 'normal' \(+\ 50\%\) anomalous:  Testing set (other 50% of anomalous)
    Assessing accuracy: (on CV/test sets)
  • \(\up p(x) < \epsilon \ra y = 1\) (anomaly)
  • \(p(x) \geq \epsilon \ra y = 0\) (normal)
  • Measures: Precision/Recall; \(\Up\t{F1}\)-score
  • (use CV set to choose the best \(\epsilon\))

Choosing Features

Features used in anomaly detection determine the types of unusual behavior/properties identified by the ultimate algorithm

  • (1) Handpicking: relating features in a meaningful manner
  • (2) Transforming: normalizing features for workability with Gaussian
  • (3) MVG: automatically extracting features via Multivariate Gaussian
    Examples
  • (1) Let \(X_1 = \) network traffic, \(X_2 = \) CPU load. On computers dedicated to server hosting, higher network traffic \(\ra\) higher CPU load; then, having one much greater than the other would be anomalous - which can be captured with a feature \(X_2/X_1\)
  • (2)
    \(\begin{align} & \t{log}(X) && \sqrt{X} \\ & \t{log}(X+c) && X^{1/3} \end{align}\)

    Using Gaussian PDF to detect anomalies requires Gaussian features to work well; if a feature isn't normally distributed, it can be made so via a transformation
  • (3) MVG captures correlations between features via the covariance matrix - automatically detecting linear relationships

Collaborative Filtering

A recommender systems algorithm using existing user preferences to predict preferences of the next; the algorithm cost function is

\(\ds J(\bn{x},\bn{\theta}) = \frac{1}{2}\hspace{-40px} \sum_{\hspace{40px}(i,j):r(i,j)=1} \hspace{-35px}[(\bn{\theta}^{(j)})^T \bn{x}^{(i)} - y^{(i,j)}]^2 + \frac{\lambda}{2}\sum_{i=1}^{n_m}\sum_{k=1}^n (x_k^{(i)})^2 + \frac{\lambda}{2}\sum_{j=1}^{n_u}\sum_{k=1}^{n}(\theta_k^{(j)})^2\)
    Algorithm:\(\up\)
  • [1] Initialize \((\bn{x},\bn{\theta})=([x^{(i)},...,x^{(n_m)}],[\theta^{(1)},...,\theta^{(n_u)}])\) to small random values (symmetry-breaking)
  • [2] Minimize \(J(\bn{x},\bn{\theta})\) over \((\bn{x},\bn{\theta})\):
    (note: no pre-set intercept term \(\theta_0,x_0\))

    \(\ds x_k^{(i)} \t{:= } x^{(i)}_k - \alpha\lrpar{\sum_{j:r(i,j)=1}[(\bn{\theta}^{(j)})^T\bn{x}^{(i)} - y^{(i,j)}]\theta_k^{(j)} + \lambda x_k^{(i)}}\)
    \(\ds \theta_k^{(j)} \t{:= } \theta^{(j)}_k - \alpha\lrpar{\sum_{j:r(i,j)=1}[(\bn{\theta}^{(j)})^T\bn{x}^{(i)} - y^{(i,j)}]x_k^{(i)} + \lambda\theta_k^{(j)}}\)
  • [3] For a user w/ parameters \(\bn{\theta}\) and a movie w/ (learned) features \(\bn{x}\), predict a preference of \(\bn{\theta}^T\bn{x}\)

- \(n_u =\) # of users
- \(n_m =\) # of movies
- \(r(i,j) = 1\) if user \(j\) has rated movie \(i\)
- \(y(i,j) = \) rating given by user \(j\) to movie \(i\)
Mean Normalization:
\(\ds \bn{y} = \begin{bmatrix} 5 & 4 & 0 & ? \\ 4 & ? & 1 & ? \\ ? & 4 & 0 & ? \\ 0 & 0 & 3 & ? \end{bmatrix}\hspace{-7px},\ \bn{\mu} = \begin{bmatrix} 3 \\ 2.5 \\ 2 \\ 1 \end{bmatrix}\)
\(\ds \Ra \bn{y} - \bn{\mu} = \begin{bmatrix} 2 & 1 & -3 & ? \\ 1.5 & ? & -1.5 & ? \\ ? & 2 & -2 & ? \\ -1 & -1 & 2 & ? \end{bmatrix}\)

\(\ra\) learn \(\bn{\theta}^{(j)},\bn{x}^{(i)}\) \(\ra\)
\(\require{cancel}\ds\bn{\theta}^{(5)} = \bn{0},\ y^{(i,5)} = \cancelto{0}{(\bn{\theta}^{(5)})^T(\bn{x}^{(i)})}+\mu_i\)
\(\ds \Rightarrow \bn{y} = \begin{bmatrix} 2 & 1 & -3 & 3 \\ 1.5 & \bn{\theta}^{(2)}\bn{x}^{(2)} & -1.5 & 2.5 \\ \bn{\theta}^{(1)}\bn{x}^{(3)} & 2 & -2 & 2 \\ -1 & -1 & 2 & 1 \end{bmatrix} \)
Vectorization:

J =  sum(sum(((X*Theta').*R - Y).^2))/2
    X_grad = ((X*Theta').*R - Y)*Theta
Theta_grad = ((X*Theta').*R - Y)'*X

Multivariate Gaussian

Gaussian random vector, \(\bn{X}\sim\mathcal{N}(\bn{\mu},\bn{\Sigma})\), is an \((N\times 1)\) random variable \([X_1\ X_2\ ...\ X_N]^T\) with a joint PDF given by the multivariate Gaussian PDF
\(p_{\bn{X}}(\bn{x})=\lfrac{1}{(2\pi)^{N/2}\t{det}^{1/2}(\bn{\Sigma})}\t{exp}\lrbra{-\frac{1}{2}\lrpar{\bn{x}-\bn{\mu}^T\bn{\Sigma}^{-1}(\bn{x}-\bn{\mu})}},\)

\(\ds\bn{\Sigma}=\left[\mtxxxx{\var{X_1}}{\cov{X_1,X_2}}{...}{\cov{X_1,X_N}}{\cov{X_2,X_1}}{\var{X_2}}{...}{\cov{X_2,X_N}}{\vdots}{\vdots} {\ddots}{\vdots}{\cov{X_N,X_1}}{\cov{X_N,X_2}}{...}{\cov{X_N,X_N}}\right]\hspace{-7px},\) \(\ds\ \bn{\mu}=\lrbra{\nmtttx{\mu_1}{\mu_2}{\vdots}{\mu_N}}=\lrbra{\nmtttx{E_{X_1}[X_1]}{E_{X_2}[X_2]}{\vdots}{E_{X_N}[X_N]}}\)


[1]: Only the first two moments, \(\bn{\mu}\) and \(\bn{\Sigma}\), are required to specify the entire PDF
[2]: Uncorrelated\({}^*\) \(\ra\) independent (* - all random variables)
[3]: A linear transformation of \(\bn{X}\) produces another Gaussian vector.
If \(\bn{Y}=\bn{GX}\), then \(\bn{Y}\sim\mathcal{N}(\bn{G\mu}, \bn{G\Sigma G}^T)\) -- (\(\bn{G} = M\times N\) matrix, \(M\leq N\))
Multivariate Gaussian PDF




Dragon Notes,   Est. 2018     About

By OverLordGoldDragon