What is Adjusted R2 used for?
Adjusted R2 is primarily used in statistical modeling to assess the goodness-of-fit of a model. It helps to evaluate how well the model explains the variability in the dependent variable, while taking into account the number of predictors used. Unlike R2, it penalizes the model for including predictors that don’t improve the model’s ability to predict. It is widely used in various fields including academic research, economics, machine learning, healthcare, and business analytics.
How does Adjusted R2 differ from R2?
Both R2 and Adjusted R2 provide a measure of how well the model fits the data. However, the key difference is that Adjusted R2 adjusts for the number of predictors in the model. While R2 increases as you add more predictors, regardless of their usefulness, Adjusted R2 will decrease if the new variables don’t improve the model’s fit. This makes Adjusted R2 a more robust metric when comparing models with different numbers of predictors.
Is a higher Adjusted R2 always better?
Generally, a higher Adjusted R2 indicates a better model fit, as it suggests that the model explains a larger portion of the variability in the data. However, a high Adjusted R2 is not a guarantee of a good model. It’s essential to consider other diagnostic tests and goodness-of-fit measures, as well as the logical and theoretical reasoning behind including specific predictors in the model.
Can Adjusted R2 be negative?
Yes, unlike R2, which is bounded between 0 and 1, Adjusted R2 can be negative. However, a negative Adjusted R2 is generally a sign that the model is a poor fit for the data, and it usually occurs when the model does worse than a horizontal line (mean model) in predicting the output. It’s rare but serves as a warning that the model is probably overfitted or fundamentally flawed.
What are the limitations of Adjusted R2?
Non-linear Models: Adjusted R2 is best suited for linear models and may produce misleading results for non-linear models.
Overfitting: While it’s better than R2 at avoiding overfitting, a high Adjusted R2 can still give a false sense of model quality.
Correlation ≠ Causation: A high Adjusted R2 doesn’t confirm that the predictors cause the dependent variable to change; it only signifies a relationship.
Requires Careful Interpretation: Like any statistical measure, Adjusted R2 should be used in conjunction with other tests and domain-specific knowledge for robust model evaluation.