# Dragon Notes

UNDER CONSTRUCTION
Latest content:
 Apr 05 Deep Learning Mar 19 Anomaly Detection - ML Mar 13 +Data Tables Mar 08 Clustering - Machine Learning Feb 28 Support Vector Machines - ML Feb 20 Regression - Data Science

# Deep Learning:Classification

Softmax Regression

For $$(n+1)$$ classes, drawing a probabilistic, independently-determined decision boundary

\ds \begin{align} h_{\bn{\theta}}(\bn{x}) = \begin{bmatrix} P(y = 1 | \bn{x}; \bn{\theta}) \\ P(y = 2 | \bn{x}; \bn{\theta}) \\ \vdots \\ P(y = K | \bn{x}; \bn{\theta}) \end{bmatrix} = \frac{1}{ \sum_{j=1}^{K}{\t{exp}(\theta^{(j)} \bn{x}) }} \begin{bmatrix} \t{exp}(\theta^{(1)} \bn{x}) \\ \t{exp}(\theta^{(2)} \bn{x}) \\ \vdots \\ \t{exp}(\theta^{(K)} \bn{x}) \\ \end{bmatrix} \end{align}
\begin{align} J(\bn{\theta}) = -\sum_{i=1}^{m} \sum_{k=1}^{K} \{y^{(i)} == k\}\ \t{log}\hspace{-4px}\lrpar{\frac{\t{exp}(\theta^{(k)} x^{(i)})}{\sum_{j=1}^K \t{exp}(\theta^{(j)} x^{(i)})}} \end{align}

One-vs-All

For $$(n+1)$$ classes $$y=\{0,1,...,n\}$$, dividing the problem into $$(n+1)$$ classification problems:

\ds \begin{align} & h_{\theta}^{(0)}(x) = P(y=0 | x; \theta) \\ & h_{\theta}^{(1)}(x) = P(y=1 | x;\theta) \\ & \vdots \\ & h_{\theta}^{(n)}(x)=P(y=n | x;\theta) \end{align} $$\ds \quad \t{prediction} = \underset{i}{\t{max}}(h_{\theta}^{(i)}(x))$$

[1] Choosing one class and lumping all the others into a single second class
[2] In each classification problem, predicting the probability that $$y$$ is a member of one of the classes
[3] Given an input $$x$$, outputs a $$y$$ with the greatest $$h_{\theta}(x)$$

Hypothesis Function (Logistic Regression)

The hypothesis function for logistic regression is

$$\ds h_{\theta}(x)=g(\bn{\theta}^T\bn{x}),$$
$$\ds \t{where } g(\bn{z})= \frac{1}{1+e^{-\bn{z}}}$$

- Used in classification problems
- Output is bounded between $$0$$ and $$1$$ (contrasting w/ lin-reg's $$\pm\infty$$)
- $$h_{\theta}$$ gives the probability that the output is positive (=1).
Decision Boundary

A line (/bounding manifold) separating regions of positive and negative classification ($$y=1$$ and $$y=0$$), defined as the set of all points for which $$\underline{\bn{\theta}^T\bn{x}=0}$$:

Cost Function (Logistic Regression)

The cost function for logistic regression is

\ds \begin{align} J(\theta) &= \sfrac{1}{m}\sum_{i=1}^{m}\t{Cost}(h_{\theta}(x^{(i)}),y^{(i)}) \\ &= -\sfrac{1}{m}\sum_{i=1}^{m}[y^{(i)}\t{log}(h_{\theta}(x^{(i)}))+(1-y^{(i)})\t{log}(1-h_{\theta}(x^{(i)}))] \end{align}
\ds \t{Cost}(h_{\theta}(x),y)=\left\{ \begin{align} & -\t{log}(h_{\theta}(x)) && \t{if } y = 1 \\ & -\t{log}(1-h_{\theta}(x)) && \t{if } y = 0 \end{align} \right.
$$\dsup \hspace{160px}= -y\ \t{log}(h_{\theta}(x))-(1-y)\t{log}(1-h_{\theta}(x))$$

- Makes the cost function convex for linear regression hypothesis func
- Imposes steep penalty on false positives and false negatives
- Is convex $$\ra$$ has a global minimum
Logistic Regression GD

Same as Linear Regression's, except different hypothesis & cost functions:

$$\ds h_{\theta}(x)=g(\bn{\theta}^T\bn{x}),\ \ g(\bn{z})= \frac{1}{1+e^{-\bn{z}}}$$

\ds \begin{align} J(\bn{\theta})= \sfrac{1}{m}[&-\bn{y}^T\t{log}(\bn{h})\\&-(1-\bn{y})^T\t{log}(1-\bn{h})] \end{align}

$$\dsup \bn{\theta}: = \bn{\theta}-\sfrac{\alpha}{m}\bn{x}^T(g(\bn{x}\bn{\theta})-\bn{y})$$