[Machine Learning Diary] Day 6 — Don’t overfit

2 min readApr 20, 2019

Overfitting is a very common issue in machine learning projects and data competitions. I think this is one of the most common issue we need to deal with.

What is overfit

According to the book hands-on machine learning Overfitting happens when the model is too complex relative to the amount and noiseness of the training data.

If you have ever attend any kaggle competition. You may know that sometimes people’s ranking will drop after test their model with the private dataset at final stage. This normally happens when people use some “magic features” only works on the public dataset.

How to avoid overfitting

There some algorithms to resolve overfitting

regularization
data augmentation
dropout
Bootstrap/Bagging
ensemble
early stopping
utilize invariance
Bayesian

I will use example later for each of them. To be continue

Apart from these algorithms we can also use some common practice

Simplify the model
Get more data
reduce noise from training data

Conclusion

Actually overfitting is you think too much given the data you have. If may be accurate in your tranning dataset. But the accuracy is not good on testing dataset and real world data.

Reference

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

[Machine Learning Diary] Day 6 — Don’t overfit

What is overfit

How to avoid overfitting

Conclusion

Reference

Written by Steven(Liang) Chen

No responses yet