# Dragon Notes

UNDER CONSTRUCTION
Latest content:
 Apr 05 Deep Learning Mar 19 Anomaly Detection - ML Mar 13 +Data Tables Mar 08 Clustering - Machine Learning Feb 28 Support Vector Machines - ML Feb 20 Regression - Data Science

# Data Science:Solved Problems

MATH 478/678 HW Solutions:

Flexibility vs. Predictors, Sample size
(Learning Models)
For each of (a)(d), indicate whether we would generally expect the performance of a flexible statistical learning method to be better or worse than an inflexible method. Explain your answer.

• (a) The sample size $$n$$ is extremely large, and the number of predictors $$p$$ is small.
• (b) The number of predictors $$p$$ is extremely large, and the number of observations is small.
• (c) The relationship between the predictors and the response is highly non-linear.
• (d) The variance of the error terms ($$\sigma^2 = \t{var}(\epsilon)$$) is extremely high.

$$\tb{Solution}\t{:}$$
• (a) $$n = \t{very large},\ p = \t{small}$$
• - Model fit based on large data is robust to variance – overfitting less likely
• - Model w/ less predictors $$=$$ less combined variance
• - Model w/ higher flexibility has lower bias
• Thus, a higher flexibility model generally performs better.
• (c) Predictor-response relation $$=$$ highly non-linear
• - Model w/ lower flexibility will underfit non-linearly related predictors
• Thus, a higher flexibility model generally performs better.
• (b) $$n = \t{small},\ p = \t{very large}$$
• - Model fit based on large data is sensitive to variance – overfitting is likely
• - Model w/ more predictors = greater combined variance (amplifying MSE due to variance)
• Thus, a lower flexibility model generally performs better.
• (d) $$\t{var}(\epsilon) = \t{very high}$$
• - Model w/ higher flexibility will capture more of the error terms
• Thus, a lower flexibility model generally performs better.