# Dragon Notes

UNDER CONSTRUCTION
Latest content:
 Apr 05 Deep Learning Mar 19 Anomaly Detection - ML Mar 13 +Data Tables Mar 08 Clustering - Machine Learning Feb 28 Support Vector Machines - ML Feb 20 Regression - Data Science

# Machine Learning:Clustering

K-means Clustering

Separating data into groups of nearest datapoints via self-adjusting clusters:

Inputs:
$$K =$$ # of clusters
$$\{x^{(1)}, x^{(2)}, ..., x^{(m)}\} =$$ training set, $$x\in \mathbb{R}^n$$
Algorithm:$$\up$$
• - Randomly initialize $$K$$ cluster centroids $$\mu_1, \mu_2, ...,\mu_K \in \mathbb{R}^n$$
• - Repeat {$$\hspace{9px}$$

for

$$i = 1\hspace{2px}\t{:}\hspace{2px}m$$
$$c^{(i)} \t{ := }$$ index ($$1$$ to $$K$$) of cluster centroid closest to $$x^{(i)}$$

for

$$k = 1\hspace{2px}\t{:}\hspace{2px}K$$
$$\mu_k \t{ := }$$ mean of points assigned to cluster $$k$$$$\hspace{9px}$$}

• - Unsupervised learning algorithm; no data labels $$y$$
• - Useful in discovering hidden patterns, grouping & organizing data
• - Applied in market segmentation, social network & astronomical data analysis, organizing computing clusters, improving learning algorithms, etc

Cost Function (K-means)

Optimization objective lies in minimizing $$(\bn{c},\bn{\mu})$$ over the cost function:

$$\ds J(\bn{c},\bn{\mu}) = \sfrac{1}{m}\sum_{i=1}^m \abss{x^{(i)}-\mu_{c^{(i)}}}^2,$$
\ds \begin{align} &\bn{c} = c^{(1)},...,c^{(m)}\\ &\bn{\mu} = \mu_1,...,\mu_K \end{align}
• - $$c^{(i)}=$$ index of cluster to which example $$x^{(i)}$$ is currently assigned
• - $$\mu_k =$$ cluster centroid $$k$$ ($$\mu_k \in \mathbb{R}^n$$)
• - $$\mu_{c^{(i)}} =$$ cluster centroid of cluster to which example $$x^{(i)}$$ has been assigned

Principal Component Analysis

Reducing dimensions by mapping data onto a set of principal components:

Algorithm:$$\up$$ Reduce data from $$n$$- to $$k$$-dimensions:
• - $$\ds\s{\Sigma}=\sfrac{1}{m}\sum_{i=1}^n(x^{(i)})(x^{(i)})^T$$
• - $$\t{[U, S, V]} = \t{svd(Sigma)},\ \ \bn{U}=\lrbra{\bn{u}^{(1)}\ \bn{u}^{(2)}\ \cdots\ \bn{u}^{(n)}} \in \mathbb{R}^{n\times n}$$
• - $$\bn{U}_{\t{reduce}} = \lrbra{\bn{u}^{(1)}\ \bn{u}^{(2)}\ \cdots\ \bn{u}^{(k)}} \in \mathbb{R}^{n\times k}$$
• - $$\bn{z}^{(i)}=\bn{U}_{\t{reduce }}^T \bn{x}^{(i)},\ \ \bn{z}^{(i)}\in \mathbb{R}^k \ \ \ (=\lrbra{k\times n}*\lrbra{n\times1})$$
• $$\up\u{\smash{\t{Choosing }k}}\t{:}$$ choose smallest $$k$$ such that
• $$\ds\frac{\frac{1}{m}\sum_{i=1}^m\abss{\bn{x}^{(i)}-\bn{x}^{(i)}_{\t{approx}}}^2}{\frac{1}{m}\sum_{i=1}^m\abss{\bn{x}^{(i)}}^2}\leq 0.01 \quad\Leftrightarrow \quad \frac{\sum_{i=1}^k S_{ii}}{\sum_{i=1}^m S_ii}\geq 0.99\hspace{20px}$$
• $$\ra$$ 99% of variance is retained

(can be 95%, 90%, etc)

• - Is a method of dimensionality reduction
• - Works by collapsing correlated components onto principal (uncorrelated, independent) comp- onents
• - $$\t{svd}$$ returns $$\t{S}$$ diagonal entries in order of descending 'importance', such that the last (bottom) component bears the least influence
• - Reconstruction from compression, shown left, maps data from compressed representation back to original space
• - $$\bn{U}^T=\bn{U}^{-1}$$, since $$\bn{U}$$ is unitary and real