Graduationwoot

Dragon Notes

i

\( \newcommand{bvec}[1]{\overrightarrow{\boldsymbol{#1}}} \newcommand{bnvec}[1]{\overrightarrow{\boldsymbol{\mathrm{#1}}}} \newcommand{uvec}[1]{\widehat{\boldsymbol{#1}}} \newcommand{vec}[1]{\overrightarrow{#1}} \newcommand{\parallelsum}{\mathbin{\|}} \) \( \newcommand{s}[1]{\small{#1}} \newcommand{t}[1]{\text{#1}} \newcommand{tb}[1]{\textbf{#1}} \newcommand{ns}[1]{\normalsize{#1}} \newcommand{ss}[1]{\scriptsize{#1}} \newcommand{vpl}[]{\vphantom{\large{\int^{\int}}}} \newcommand{vplup}[]{\vphantom{A^{A^{A^A}}}} \newcommand{vplLup}[]{\vphantom{A^{A^{A^{A{^A{^A}}}}}}} \newcommand{vpLup}[]{\vphantom{A^{A^{A^{A^{A^{A^{A^A}}}}}}}} \newcommand{up}[]{\vplup} \newcommand{Up}[]{\vplLup} \newcommand{Uup}[]{\vpLup} \newcommand{vpL}[]{\vphantom{\Large{\int^{\int}}}} \newcommand{lrg}[1]{\class{lrg}{#1}} \newcommand{sml}[1]{\class{sml}{#1}} \newcommand{qq}[2]{{#1}_{\t{#2}}} \newcommand{ts}[2]{\t{#1}_{\t{#2}}} \) \( \newcommand{ds}[]{\displaystyle} \newcommand{dsup}[]{\displaystyle\vplup} \newcommand{u}[1]{\underline{#1}} \newcommand{tu}[1]{\underline{\text{#1}}} \newcommand{tbu}[1]{\underline{\bf{\text{#1}}}} \newcommand{bxred}[1]{\class{bxred}{#1}} \newcommand{Bxred}[1]{\class{bxred2}{#1}} \newcommand{lrpar}[1]{\left({#1}\right)} \newcommand{lrbra}[1]{\left[{#1}\right]} \newcommand{lrabs}[1]{\left|{#1}\right|} \newcommand{bnlr}[2]{\bn{#1}\left(\bn{#2}\right)} \newcommand{nblr}[2]{\bn{#1}(\bn{#2})} \newcommand{real}[1]{\Ree\{{#1}\}} \newcommand{Real}[1]{\Ree\left\{{#1}\right\}} \newcommand{abss}[1]{\|{#1}\|} \newcommand{umin}[1]{\underset{{#1}}{\t{min}}} \newcommand{umax}[1]{\underset{{#1}}{\t{max}}} \newcommand{und}[2]{\underset{{#1}}{{#2}}} \) \( \newcommand{bn}[1]{\boldsymbol{\mathrm{#1}}} \newcommand{bns}[2]{\bn{#1}_{\t{#2}}} \newcommand{b}[1]{\boldsymbol{#1}} \newcommand{bb}[1]{[\bn{#1}]} \) \( \newcommand{abs}[1]{\left|{#1}\right|} \newcommand{ra}[]{\rightarrow} \newcommand{Ra}[]{\Rightarrow} \newcommand{Lra}[]{\Leftrightarrow} \newcommand{rai}[]{\rightarrow\infty} \newcommand{ub}[2]{\underbrace{{#1}}_{#2}} \newcommand{ob}[2]{\overbrace{{#1}}^{#2}} \newcommand{lfrac}[2]{\large{\frac{#1}{#2}}\normalsize{}} \newcommand{sfrac}[2]{\small{\frac{#1}{#2}}\normalsize{}} \newcommand{Cos}[1]{\cos{\left({#1}\right)}} \newcommand{Sin}[1]{\sin{\left({#1}\right)}} \newcommand{Frac}[2]{\left({\frac{#1}{#2}}\right)} \newcommand{LFrac}[2]{\large{{\left({\frac{#1}{#2}}\right)}}\normalsize{}} \newcommand{Sinf}[2]{\sin{\left(\frac{#1}{#2}\right)}} \newcommand{Cosf}[2]{\cos{\left(\frac{#1}{#2}\right)}} \newcommand{atan}[1]{\tan^{-1}({#1})} \newcommand{Atan}[1]{\tan^{-1}\left({#1}\right)} \newcommand{intlim}[2]{\int\limits_{#1}^{#2}} \newcommand{lmt}[2]{\lim_{{#1}\rightarrow{#2}}} \newcommand{ilim}[1]{\lim_{{#1}\rightarrow\infty}} \newcommand{zlim}[1]{\lim_{{#1}\rightarrow 0}} \newcommand{Pr}[]{\t{Pr}} \newcommand{prop}[]{\propto} \newcommand{ln}[1]{\t{ln}({#1})} \newcommand{Ln}[1]{\t{ln}\left({#1}\right)} \newcommand{min}[2]{\t{min}({#1},{#2})} \newcommand{Min}[2]{\t{min}\left({#1},{#2}\right)} \newcommand{max}[2]{\t{max}({#1},{#2})} \newcommand{Max}[2]{\t{max}\left({#1},{#2}\right)} \newcommand{pfrac}[2]{\frac{\partial{#1}}{\partial{#2}}} \newcommand{pd}[]{\partial} \newcommand{zisum}[1]{\sum_{{#1}=0}^{\infty}} \newcommand{iisum}[1]{\sum_{{#1}=-\infty}^{\infty}} \newcommand{var}[1]{\t{var}({#1})} \newcommand{exp}[1]{\t{exp}\left({#1}\right)} \newcommand{mtx}[2]{\left[\begin{matrix}{#1}\\{#2}\end{matrix}\right]} \newcommand{nmtx}[2]{\begin{matrix}{#1}\\{#2}\end{matrix}} \newcommand{nmttx}[3]{\begin{matrix}\begin{align} {#1}& \\ {#2}& \\ {#3}& \\ \end{align}\end{matrix}} \newcommand{amttx}[3]{\begin{matrix} {#1} \\ {#2} \\ {#3} \\ \end{matrix}} \newcommand{nmtttx}[4]{\begin{matrix}{#1}\\{#2}\\{#3}\\{#4}\end{matrix}} \newcommand{mtxx}[4]{\left[\begin{matrix}\begin{align}&{#1}&\hspace{-20px}{#2}\\&{#3}&\hspace{-20px}{#4}\end{align}\end{matrix}\right]} \newcommand{mtxxx}[9]{\begin{matrix}\begin{align} &{#1}&\hspace{-20px}{#2}&&\hspace{-20px}{#3}\\ &{#4}&\hspace{-20px}{#5}&&\hspace{-20px}{#6}\\ &{#7}&\hspace{-20px}{#8}&&\hspace{-20px}{#9} \end{align}\end{matrix}} \newcommand{amtxxx}[9]{ \amttx{#1}{#4}{#7}\hspace{10px} \amttx{#2}{#5}{#8}\hspace{10px} \amttx{#3}{#6}{#9}} \) \( \newcommand{ph}[1]{\phantom{#1}} \newcommand{vph}[1]{\vphantom{#1}} \newcommand{mtxxxx}[8]{\begin{matrix}\begin{align} & {#1}&\hspace{-17px}{#2} &&\hspace{-20px}{#3} &&\hspace{-20px}{#4} \\ & {#5}&\hspace{-17px}{#6} &&\hspace{-20px}{#7} &&\hspace{-20px}{#8} \\ \mtxxxxCont} \newcommand{\mtxxxxCont}[8]{ & {#1}&\hspace{-17px}{#2} &&\hspace{-20px}{#3} &&\hspace{-20px}{#4}\\ & {#5}&\hspace{-17px}{#6} &&\hspace{-20px}{#7} &&\hspace{-20px}{#8} \end{align}\end{matrix}} \newcommand{mtXxxx}[4]{\begin{matrix}{#1}\\{#2}\\{#3}\\{#4}\end{matrix}} \newcommand{cov}[1]{\t{cov}({#1})} \newcommand{Cov}[1]{\t{cov}\left({#1}\right)} \newcommand{var}[1]{\t{var}({#1})} \newcommand{Var}[1]{\t{var}\left({#1}\right)} \newcommand{pnint}[]{\int_{-\infty}^{\infty}} \newcommand{floor}[1]{\left\lfloor {#1} \right\rfloor} \) \( \newcommand{adeg}[1]{\angle{({#1}^{\t{o}})}} \newcommand{Ree}[]{\mathcal{Re}} \newcommand{Im}[]{\mathcal{Im}} \newcommand{deg}[1]{{#1}^{\t{o}}} \newcommand{adegg}[1]{\angle{{#1}^{\t{o}}}} \newcommand{ang}[1]{\angle{\left({#1}\right)}} \newcommand{bkt}[1]{\langle{#1}\rangle} \) \( \newcommand{\hs}[1]{\hspace{#1}} \)

  UNDER CONSTRUCTION

Data Science:
Regression



Simple Linear Regression

Estimates response based on single predictor:
\(\ds \boxed{y\approx \beta_0 + \beta_1 x = \hat{y}}\)
\(\ds\hat{\beta_1}=\frac{\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^n(x_i-\bar{x})^2},\ \hat{\beta_0}=\bar{y}-\hat{\beta_1}\bar{x}\)

  • - 'hats' on vars denote approximations, or estimators
  • - Residual Sum of Squares (RSS) is a measure of predictive error
  • - \(i^{\t{th}}\) residual \(=\) diff. between \(i^{\t{th}}\) observed & predicted responses: \(e_i = y_i - \hat{y}_i\)
\(\t{RSS} = e_1^2 + e_2^2 + ... +e_n^2\)
\(\t{RSS} = (y_1 -\hat{\beta}_0 - \hat{\beta_1}x_1)^2 + ... (y_n -\hat{\beta}_0 - \hat{\beta_1}x_n)^2\ \t{ [Simp Lin Reg]}\)
Simple linear regression

Assessing Regression Accuracy

  • \(\bb{1}\) Standard Error (SE) of \(\hat{\mu}\) measures how well the sample mean \(\hat{\mu}\) estimates \(\mu\)  
    asm
    the \(n\) observations are uncorrelated
  • \(\bb{2}\) Residual Standard Error (RSE) estimates the standard deviation of \(\epsilon\); measures 'lack of fit' of model to data
    Ex: \(\s{}\t{RSE}=50 \ra\) actual responses differ from true reg. line predictions by \(50\), on average
  • \(\bb{3}\) \(\b{R}^2 =\) proportion of variance explained by linear regression model
    TSS \(\s{}\ra\) total variance in the response \(\s{Y}\) (variability inherent in \(\s{Y}\));
    RSS \(\s{}\ra\) amount of variability left unexplained w/ regression;
    \(\s{}(\t{TSS}-\t{RSS})\ra\) amount of variability in the response that is explained (or removed) by performing regression, and \(\s{}R^2\ra\) proportion of variability in \(\s{}Y\) that can be explained using \(\s{}X\)
  • \(\bb{4}\) Confidence Interval indicates how well the mean is captured
  • Interpretation: Suppose CI = 95%. Then, if the distribution is sampled many times, and a CI is computed for each sample, we'd expect about 95% of those intervals to include the true population mean.
  • \(\bb{5}\) Prediction Interval indicates where to expect the next sampled datapoint
  • Interpretation: Suppose PI = 95%. We sample the distribution, compute a PI for the sample, and sample another datapoint. If we do this many times, we'd expect the next datapoint to lie within the PI for 95% of the samples.
  • \(\ds \bb{1}\ \t{SE}(\hat{\mu})=\frac{\sigma}{\sqrt{n}} = \sqrt{\t{var}(\hat{\mu})}\)
  • \(\ds \bb{2}\ \t{RSE} = \sqrt{\sfrac{1}{n-2}\t{RSS}},\ \t{RSS} = \sum_{i=1}^n (y_i - \hat{y}_i)^2\)
  • \(\ds \bb{3}\ R^2 = \frac{\t{TSS}-\t{RSS}}{\t{TSS}} = 1 - \frac{\t{RSS}}{TSS},\ \t{TSS}=\sum_{i=1}^n (y_i - \bar{y})^2\)
  • \(\ds \bb{4}\ \bar{x}\pm z_{\alpha/2}\sfrac{\sigma}{\sqrt{n}}\)
    \(\bar{x}=\) sample mean

    \(\alpha=\) confidence level
    \(\sigma=\) population std.

    \(n=\) sample size
  • \(\ds \bb{5}\ \mu \pm z\sigma\)
    asm
    \(\mu\) & \(\sigma\) of distribution are known

Null Hypothesis

Tests whether there exists a relationship between variables or association between groups:
Null hypothesis, \(H_0\): There is no relationship between \(X\) and \(Y\)

Alt. Hypothesis, \(H_a\): There is some relationship between \(X\) and \(Y\)
For a predictor \(\beta_1\) predicting \(Y\) from \(X\), this corresponds to
\(H_0\t{: } \beta_1 = 0,\ \ H_a\t{: } \beta_1 \neq 0\)

\(\ds t = \sfrac{\hat{\beta_1}-0}{\t{SE}(\hat{\beta_1})}\)
\(t\)-statistic \(=\) # of standard errors the predictor is away from zero; used as a test of strength of null hypothesis by indicating whether \(\s{}\hat{\beta_1}\) is 'sufficiently far' from zero so we can be confident that \(\beta_1\) is non-zero.
\(p\)-value \(=\) probability of attaining the \(t\)-value due to random chance; by convention, \(p<0.05\) or \(p<0.01 \ra\) reject null hypothesis
Interpretation: small \(p\)-value indicates that it is unlikely to observe a substantial association between predictor & response due to chance, in absence of any actual association between the predictor & the response
Null hypothesis diagram
NOTE:
\(\ds \begin{align} & \t{Pr}(\t{observation | hypothesis}) \neq \\ & \t{Pr}(\t{hypothesis | observation}) \end{align}\)
Probability of observing a result given that some hypothesis is true is not equivalent to the probability that a hypothesis is true given that some result has been observed.

(\(p\)-values indicate the 'incompatibility' of a given dataset with the null hypothesis; they do not validate the research hypothesis)


Null Hypothesis [Multivariate]

Is there a relationship between response and any of (\(p\)) predictors?

\(H_0\t{: }\beta_1 = \beta_2 = ... \beta_p = 0\)

\(H_a\): at least one of \(\beta_j\) is non-zero

\(\ra \ds F = \frac{(\t{TSS} - \t{RSS})/p}{\t{RSS}/(n-p-1)}\)
Is there a relationship between response and a subset of (\(q\)) predictors?

\(H_0\t{: }\beta_{p-q+1} = \beta_{p-q+2} = ... \beta_p = 0\)

\(H_a\): at least one of \(\beta_j\) selected is non-zero

\(\ra \ds F = \frac{(\t{RSS}_0 - \t{RSS})/q}{\t{RSS}/(n-p-1)}\)

  • F-statistic \(=\) measures multivariate strength of association
  • - Higher \(F \ra\) weaker \(H_0\)
  • - Need \(F>1\) to reject \(H_0\); 'by how much' depends on \(n\) & \(p\)
  • - Larger \(n \ra\) less-larger \(F\) needed
  • - Small \(p\)-value associated w/ \(F\) \(\ra\) sufficient \(F\)
  • - \(\t{RSS}_0\) is computed on a model using all but the selected \(q\) predictors
NOTE: small individual \(p\)-values \(\neq\) reject \(H_0\)
It seems likely that if any one of the \(p\)-values for predictors is very small, then at least one of the predictors is related to the response - so why use the \(F\)-statistic? This logic, however, may not hold when # of predictors \(p\) is large. Ex: Suppose \(p=100\) and \(H_0\t{: }\beta_1 = \beta_2 =...=\beta_p = 0\) is true, so no predictor is truly associated with the response. Then, about 5% of the \(p\)-values associated with each variable will be below \(0.05\) – simply by chance






Dragon Notes,   Est. 2018     About

By OverLordGoldDragon