A.I, Data and Software Engineering

Bias vs Variance Quick note

B

Whenever we discuss prediction models, it’s important to understand prediction errors, i.e. bias and variance. A proper understanding of these concepts would help us not only to build accurate models but also to avoid the mistake of over-fitting and under-fitting.

We quickly explain the two concepts using the following illustration.

Bias vs Variance (source: Quora)

Suppose that a man is trying to shoot in the bull’s eye. His shooting skill can be considered the prediction model. The shooting results are the model’s prediction.

What is bias?

  • Bias shows the difference between the prediction (average) and the correct value.
  • If the shoot results are far-away from the bull’s eye, the bias is high and likewise.

Some causes of high bias:

  • Oversimplifies the model
  • Not taking into account all the key features
  • Not enough data
  • Wrong model selection

What is variance?

  • Variance shows the spread of our data. 
  • Or the variability of model prediction for a given data point or a value

Some causes on high variance:

  • Noisy training dataset
  • Sparse dataset
  • Algorithm lack of generalization to capture the underlying patterns

Overfitting and Underfitting

Fitting examples (Source: Medium)
  • Under-fitting: often high bias + low variance
  • Over-fitting: often low bias + high variance, good at training dataset, bad at testing dataset

1 comment

πŸ’¬

  • 1. Under-fit (high bias): More training data doesn’t help, so don’t waste time on collecting more data.
    2. Over-fit (high variance): getting more training data is likely to help.
    Choosing reasonable number of features, degree of polynomial, and appropriate regularization parameter (lambda) is the key to keep balance between Overfit and Underfit.
    Training set (60%), Cross Verification Set (20%), Test Set (20%) is helpful in choosing the best polynomial degree and regularization parameter

A.I, Data and Software Engineering

PetaMinds focuses on developing the coolest topics in data science, A.I, and programming, and make them so digestible for everyone to learn and create amazing applications in a short time.

Categories