Figure 1. The machine learning pipeline (with a focus on model development and evaluation).
The bias-variance trade-off allows data scientists to understand why a model behaves a certain way, and allows them to apply corrective actions. In general, the bias-variance trade-off proceeds as follows: increasing bias decreases variance, and increasing variance decreases bias. The trick is to find a balance between the two error types.
3a Read Fortmann's essay on the bias-variance trade-off. You can find the material, here.
3b List, and describe the four different cases representing combinations of both high and low bias and variance (See Figure 1. in Fortmann's essay).
3c Watch the video Machine Learning Fundamentals: Bias and Variance by StatQuest with Josh Starmer.
Video 1. Machine Learning Fundamentals: Bias and Variance by StatQuest with Josh Starmer
The following exercises are optional. They are included in the independent study material to help you understand the variance-bias trade-off.
3d The prediction error of a model is composed of three elements. List the three elements. Write your answer down.
3e In the article Fortmann presents a KNN analysis of voter party registration to explain the bias-variance trade-off. 1) What are the features? 2) What is the response (i.e. y-variable)? 3) Is this a regression or classification task? Elaborate on your answers.
3f Select a small value of K, and click the button ‘Generate New Training Data' several times. Does the graph depict: 1) low variance or high variance? 2) low bias or high bias? Explain your answers.
3g Does a large value for K cause a) overfitting or b) underfitting? Explain your answer.
Figure 2. A case of overfitting…
Coming Datalab we will reflect on classification algorithms again and give you an opportunity to ask any questions you might have.
Then we will apply our newly learned techniques on the Yelp dataset again by performing a logistic regression! Subsequently, we will perform a logistic regression on our Oosterhout dataset; in line with our research problem!