In linear regression, It is assumed that the variance of error terms is constant. Technically, this is known as Homoscedasticity (or same variance). In simple language, with the inclusion of new predictor observations the variance of error terms should not change as against to the previous predictor observations. For example, if we say there is a correlation between the income and shopping done by a person, then it is assumed that people with lower-income will do less shopping and with higher income will do more shopping. The intuition says that the variance of error terms (difference between actual vs. predicted shopping) should remain constant, but what if we find that people with higher income do less or moderate type shopping. This type of situation results in variable variance in the error terms, technically known as Heteroscedasticity. In the dataset, Heteroscedasiticity is identified by plotting residuals with respect to response variable. Presence of funnel shape confirms heteroscedasticity. This can be removed by introducing transformations. Figure on the left plot shows the presence of funnel shape ( i.e., Heteroscedasticity). On transforming, the response variable (Y) through log transformation we obtain right plot in the below figure. This shows that tranformations of either predictor or response variable removes heteroscedasticity. Note: Figure taken from book – An Intro. to Statistical Learning by James et al.
- Book – An Introduction to Statistical Learning by James et al.