# Dragon Notes

UNDER CONSTRUCTION
Latest content:
 Apr 05 Deep Learning Mar 19 Anomaly Detection - ML Mar 13 +Data Tables Mar 08 Clustering - Machine Learning Feb 28 Support Vector Machines - ML Feb 20 Regression - Data Science

# Deep Learning:Recurrent Neural Networks

RNN Base Model (I)

ANN architecture using temporal memory to process sequential data:

• - Excels at sequential, time-series data
• - Applied in natural language processing, signal anomaly detection, speech synthesis, music composition ...

RNN Base Model (II)

RNN model is represented via NN time-steps $$\t{<}\t{t}\t{>}$$, infeeding current inputs $$\bn{x}^{ < t > }$$ and past activations $$\bn{a}^{ < t - 1 > }$$ to predict:

\ds \begin{align} \bn{a}^{ < t > } & = g (\bn{W}_{aa}\bn{a}^{ < t-1 > } + \bn{W}_{ax}\bn{x}^{ < t > } + \bn{b}_a) \\ \widehat{\bn{y}}^{ < t > } & = g (\bn{W}_{ya}\bn{a}^{ < t > } + \bn{b}_y) \end{align}
$$\Leftrightarrow$$

\ds \begin{align} \bn{a}^{ < t > } & = g (\bn{W}_a [\bn{a}^{< t - 1 >}, \bn{x}^{ < t > }] + \bn{b}_a) \\ \widehat{\bn{y}}^{ < t > } & = g (\bn{W}_y \bn{a}^{ < t > } + \bn{b}_y) \end{align} \hspace{48px}
$$\bn{a}^{[l](i)< j >} =$$ timestep $$j$$ in training example $$i$$ at layer $$l$$ activation

RNN Architectures

RNNs can be built to handle different forms of input-output relations:

LSTM

Long Short-Term Memory enables RNN to remember key information over many timesteps:

\ds \begin{align} \tilde{\bn{c}}^{ < t > } & = \t{tanh}(\bn{W}_c[\bn{a}^{ < t > },\bn{x}^{ < t > }] + \bn{b}_c) \\ \bn{c}^{ < t > } & = \bn{\Gamma}_u \cdot \tilde{\bn{c}}^{ < t > } + \bn{\Gamma}_f \cdot \bn{c}^{ < t - 1> } \\ \bn{a}^{ < t > } & = \bn{\Gamma}_o \cdot \t{tanh}(\bn{c}^{ < t > }) \\ \widehat{\bn{y}}^{ < t > } & = \t{softmax}(\bn{W}_y\bn{a}^{ < t > } + \bn{b}_y) \\ \bn{\Gamma}_u & = \sigma(\bn{W}_u[\bn{a}^{ < t - 1> },\bn{x}^{ < t > }] + \bn{b}_u)\\ \bn{\Gamma}_f & = \sigma(\bn{W}_f[\bn{a}^{ < t - 1> },\bn{x}^{ < t > }] + \bn{b}_f)\\ \bn{\Gamma}_o & = \sigma(\bn{W}_o[\bn{a}^{ < t - 1> },\bn{x}^{ < t > }] + \bn{b}_o)\\ \end{align}
• - Effectively aids with vanishing/exploding gradient
• - More robust against gap length memory decay than GRU