# Dragon Notes

UNDER CONSTRUCTION
Latest content:
 Apr 05 Deep Learning Mar 19 Anomaly Detection - ML Mar 13 +Data Tables Mar 08 Clustering - Machine Learning Feb 28 Support Vector Machines - ML Feb 20 Regression - Data Science

# Randomness & Probability:Random Processes

Random Process Types

 [DTDV]: Discrete-time Discrete-valued. Bernoulli random process: [DTCV]: Discrete-time Continuous-valued. Gaussian random process: [CTDV]: Continuous-time Discrete-valued. Binomial random process: [CTCV]: Continuous-time Continuous-valued. Gaussian random process: $$X[n]\sim\t{Ber}(p)$$ $$Y[n]\sim N(0,1)$$ $$\small{}W(t)=\sum_{n=0}^{\floor{t}}X[n]$$ $$Z(t)\sim N(0,1)$$

Strict Stationarity (SSS)

$$\up$$In a (strict sense) stationary random process, samples shifted in time retain their PDF/PMF:
$$p_{X[n_1+n_0],X[n_2+n_0],...,X[n_N+n_0]}=p_{X[n_1],X[n_2],...,X[n_N]}$$
That is, the PDF/PMF of the samples $$\{X[n_1],X[n_2],...,X[n_N]\}$$ is the same as for $$\{X[n_1+n_0],X[n_2+n_0],...,X[n_N+n_0]\}$$, for any integer $$n_0$$. Relevant DT & CT rp parameters are defined as follows:
 Discrete-time Continuous-time \begin{align} \mu_X[n] &= E[X[n]], \\ \sigma_X^2[n] &= \var{X[n]}, \\ \t{cov}_X[n_1,n_2] &= E[(X[n_1]-\mu_X[n_1])(X[n_2]-\mu_X[n_2])] \\ &= \t{cov}(X[n_1],X[n_2]) \end{align} $$\up-\infty < n < \infty \quad -\infty < n_{1,2} < \infty$$ \begin{align} \mu_X(t) &= E[X(t)] \\ \sigma_X^2(t) &= \var{X(t)} \\ \t{cov}_X(t_1,t_2) &= E[(X(t_1)-\mu_X(t_1))(X(t_2)-\mu_X(t_2))] \\ &= \t{cov}(X(t_1),X(t_2)) \end{align}

Wide Stationarity (WSS)

$$\up$$A random process is defined to be wide sense stationary (WSS) if
\hspace{48px}\begin{align} \mu_X[n] &= \mu\ \ \t{(const.)} && -\infty < n < \infty \\ E[X[n_1]X[n_2]] &= g(|n_2 - n_1|) && -\infty < n_{1,2} < \infty \end{align}
\begin{align}\up&\hspace{-2px}\Leftrightarrow \\ \t{mean } & =\t{ time-independent} \\ \t{covariance }&=\t{ time interval dependent} \end{align}
$$\hspace{33px}\t{(between samples)}$$
$$\up$$Application: We are able to predict a random process $$X[n_2]$$ based on observing $$X[n_1]=x[n_1]$$ as
$$\hat{X}[n_2]=\mu_X[n_2]+\lfrac{\t{cov}_X[n_1,n_2]}{\t{cov}_X[n_1,n_1]}(x[n_1]-\mu_X[n_1])$$
Why WSS? Since in general the mean and covariance change with time, estimating them can prove very difficult in practice. To extend utility, we wish the mean to not depend on time and covariance to only depend on separation between samples (or $$|n_2-n_1|$$).
$$\Up$$A WSS rp whose mean can be extracted from a single infinite length realization is termed ergotic in the mean; that is, rp $$\boxed{\t{temporal average }=\t{ ensemble average}}$$, and occurs when the variance of the sample mean converges to zero as $$N\ra\infty$$:
$$\ds \boxed{\ilim{N}\t{var}(\hat{\mu}_N)=\ilim{N}\frac{1}{N}\sum_{k=-(N-1)}^{N-1}\lrpar{1-\frac{\abs{k}}{N}}(r_X[k]-\mu^2)=0}$$

Autocorrelation Sequence (ACS)

$$\up$$Given a WSS random process $$X[n]$$, its autocorrelation sequence (ACS) is defined as
$$r_X[k]=E[X[n]X[n+k]]$$
which is a joint moment for $$n_1=n$$ and $$n_2=n+k$$.
Autocorrelation is the correlation of a signal with a delayed copy of itself as a function of the delay. Autocorrelation can be used to
- detect repeated patterns, such as the presence of a periodic signal obscured by noise, or
- identify a missing fundamental frequency implied by its harmonic frequencies

ACS properties follow.
 [$$n$$-independence] $$\vph{A^A}$$ACS depends only on the time difference between samples, $$|n_2-n_1|=|n+k-n|=|k|$$, so that the value of $$n$$ used in the definition is arbitrary [PDF-independence] ACS relies on only the first two moments of $$X[n]$$, regardless its PDF [Zero-lag positivity] $$\boxed{r_X[0]>0}$$ (ACS is positive for zero lag) [Average power $$r_X[0]$$] $$\boxed{r_X[0]=P_{\t{av}}}$$ (Zero-lag ACS is the average power of the rp at all sample times $$n$$) [Evenness] $$\boxed{r_X[-k]=r_X[k]}$$ (ACS is an even sequence) [Max at $$k=0$$] $$\boxed{|r_X[k]|\leq r_X[0]}$$ (ACS takes on its maximum value at $$k=0$$) ($$X$$ is the most correlated with itself) [Measures predictability] $$\ds \boxed{\rho_{X[n],X[n+k]}=r_X[k]/r_X[0]}$$ asm $$\up\mu=0$$. (ACS measures rp predictability; adjustable for $$\mu\neq 0$$) [Covariance relation] $$\boxed{\t{cov}_X[n_1,n_2]=r_X[n_2-n_1]-\mu^2}\quad (=E[X[n_1]X[n_2]]-\mu_X[n_1]\mu_X[n_2]])$$ [$$r_X\ra \mu^2$$ as $$k\rai$$] asm $$c_X[n,n+k]\ra 0$$ as $$k\rai$$ (samples decorrelate for large lags [usually the case]) asm [LSI Systems] $$\ds\boxed{r_X[k]=\sigma_U^2\s{}\iisum{m}\ns{}h[m]h[m+k]}$$ $$h[m]=$$ system impusle response; $$U,X=$$ input, output [Positive semidefinite] $$r_X[k]$$ satisfies $$\bn{a}^T\bn{R}_X\bn{a}\geq 0$$ for all $$\bn{a}=[a_0\ a_1\ ...\ a_{N-1}]^T$$, where $$\bn{R}_X=$$ autocorrelation matrix:
$\ds \bn{R}_X=\lrbra{\mtxxxx {r_X[0]}{r_X[1]}{...}{r_X[N-1]} {r_X[1]}{r_X[0]}{...}{r_X[N-2]} {\vdots}{\vdots}{\ddots}{\vdots} {r_X[N-1]}{r_X[N-2]}{...}{r_X[0]} }$
$$\bn{R}_X$$ is the covariance matrix for a zero-mean random vector $$\bn{X}=\lrbra{X[0]X[1]...X[N-1]}^T$$.
- Definition implies all principal minors must be non-negative; for a $$2\times 2\up$$ matrix,
\begin{align} r_X[0] &\geq 0 \\ r_X^2[0]-r_X^2[1] &\geq 0\end{align}\hspace{50px}
-
asm
$$X$$ can be perfectly predicted. (Else, $$\bn{R}_X$$ is positive definite: $$> 0$$, instead of $$\geq 0$$).

Power Spectral Density (PSD)

$$\up$$Power Spectral Density measures the average power distribution with frequency of a signal, and is defined by
$$\ds P_X(f)=\ilim{M}\frac{1}{2M+1}E\lrbra{\lrabs{\sum_{n=-M}^{M}X[n]\t{exp}(-j2\pi fn)}^2}\ \ \bb{1^*}$$
When integrated, provides a measure of the average power within a band of frequencies. PSD properties follow.
 [ACS DTFT] PSD can be obtained via the discrete-time Fourier transform of the ACS:$$\boxed{P_X(f)=\mathcal{F}[r_X[k]]}$$ $$\boxed{P_X(f)=\s{}\iisum{k}\ns{}r_X[k]\t{exp}(-j2\pi fk)}$$ [Real] PSD is a real function, and is also given by:$$\boxed{P_X(f)=\s{}\iisum{k}\ns{}r_X[k]\cos(2\pi fk)}\ \ \ \bb{2^*}$$ [Non-negative] $$\boxed{P_X(f)>0}$$ PSD is non-negative. (follows from $$\s{}\bb{1^*}$$) [Evenness] $$\boxed{P_X(-f)=P_X(f)}$$ PSD is symmetric about $$f=0$$. (follows from $$\s{}\bb{2^*}$$) [Periodicity, $$T=1$$] $$\boxed{P_X(f+1)=P_X(f)}$$ PSD is periodic with period one. (follows from $$\s{}\bb{2^*}$$) [PSD$$\ra$$ACS] ACS can be recovered from PSD using inverse Fourier transform: $$\ds\boxed{r_X[k]=\mathcal{F}^{-1}[P_X(f)]}$$ $$\ds \boxed{r_X[k]=\s{}\int_{-1/2}^{1/2}\ns{}P_X(f)\t{exp}(j2\pi fk)df}$$ [$$\Delta f$$ Average power] PSD yields average power over a band of frequencies: $$\ds\boxed{\t{Average physical power in }[f_1,f_2]=2\s{}\int_{\ns{}f_1}^{\ns{}f_2}\ns{}P_X(f)df}$$ [Total average power] PSD yields total average power when integrated over all frequencies: $$\ds \boxed{\s{}\int_{-1/2}^{1/2}\ns{}P_X(f)df=r_X[0]}$$ asm [LSI Systems] $$\ds\boxed{P_X(f)=\abs{H(f)}^2P_U(f)}$$ Output PSD $$=$$ Input PSD $$\times$$ magnitude-squared Frequency Response
[PSD definitions are not consistent] Some fields, especially those using PSD for physical measurements, define the PSD slightly differently: $$G_X(f)=2P_X(f)$$. It is called the one-sided PSD, and its advantage lies in yielding the average power over a band directly when integrated over the band.