[Machine Learning Diary] Day 6 — Don’t overfit

Steven(Liang) Chen
2 min readApr 20, 2019

--

Overfitting is a very common issue in machine learning projects and data competitions. I think this is one of the most common issue we need to deal with.

What is overfit

According to the book hands-on machine learning Overfitting happens when the model is too complex relative to the amount and noiseness of the training data.

overfitting

If you have ever attend any kaggle competition. You may know that sometimes people’s ranking will drop after test their model with the private dataset at final stage. This normally happens when people use some “magic features” only works on the public dataset.

How to avoid overfitting

There some algorithms to resolve overfitting

  • regularization
  • data augmentation
  • dropout
  • Bootstrap/Bagging
  • ensemble
  • early stopping
  • utilize invariance
  • Bayesian

I will use example later for each of them. To be continue

Apart from these algorithms we can also use some common practice

  1. Simplify the model
  2. Get more data
  3. reduce noise from training data

Conclusion

Actually overfitting is you think too much given the data you have. If may be accurate in your tranning dataset. But the accuracy is not good on testing dataset and real world data.

Reference

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

--

--

Steven(Liang) Chen
Steven(Liang) Chen

No responses yet