latentspace · machine-learning-students-overfit-to-overfitting

Machine Learning Students Overfit to Overfitting

05/27/2024

But the fundamentals of machine learning have not changed, and the basic concept of any successful learning system is generalization, that is, to learn from a limited training set, and still generalize and perform well on test samples that might come from different distributions.
Teaching the concept of overfitting is not easy, basically because it is a judgment call based on the ratios of training and validation losses.
The contributions of this paper are a conceptual framework to understand why students have misconceptions on overfitting, we present examples of these misconceptions, both on the concept of overfitting, how overfitting can be prevented, and possible implementation errors that are often be confused with overfitting.

Overfitting is the lack of generalization in a machine learning model.
This is usually evaluated over losses computed on train and validation split of the data, where the generalization gap can be estimated:
In general if \( L_{gap} >> 0 \), it is said that the model is overfitting. But there is normally a small difference between validation and training loss, the question is, how much difference should there be to declare overfitting?
But there is normally a small difference between validation and training loss, the question is, how much difference should there be to declare overfitting.
The typical view of overfitting is presented in Figure 1, where training loss decreases with epochs while validation loss increases, clearly indicating overfitting.

Example view of overfitting in textbooks (top) vs how students see overfitting in practice (bottom)

Overfitting is not a binary condition consisting of whether a model overfits or not.
Overfitting is more likely to happen with larger \( L_{gap} \) values, and the question is, how large \( L_{gap} \) must be to decide that the model overfits. This is basically a judgment call, and there are no clear guidelines in the literature.

Mostly students try to make unrelated changes to the model or training process (like changing learning rates), and in many cases the students do not realize that the dataset is just too small.

An exercise can be built, adding more data to the training set, the student can see how overfitting decreases and generalization improves. This particular use case can be used to showcase that overfitting is not a binary condition but has continuous properties.
In reinforcement learning, limited exploration can be used to show students the influence of the training set on generalization, using part of the environment for training, and out of training distribution parts of the environment for testing, which will most likely reveal failure to generalize.
We argue that the generalization gap \( L_{gap} \) is not intuitive to interpret since there are no clear thresholds to declare overfitting.
In the appendix we provide a checklist that students and lecturers can use to check if their training scheme is appropriate. This can be useful to systematically debug overfitting and failure to generalize issues, and from where new use cases or exercises can be derived

There is only one way to check for overfitting, that is, to use a train and a validation split that have no opverlapped samples, train the model on the training set, and after each epoch (or a set number of iterations), evaluate the model on the validation set.

Is there enough training data?
Is the data distribution of training and validation/test sets equal or similar?
Is the model and/or training process correctly implemented?