An interactive guide to the 5 diagnostic plots in R. Each plot checks a different assumption of linear regression. Click a plot to explore it interactively.
plot(model)Linear regressionR / statistics
Residuals vs Fitted plot(model, which=1)
Checks two assumptions: linearity (is the relationship actually linear?) and homoscedasticity (constant variance of errors). Points should scatter randomly around the horizontal zero line — no pattern.
What to look for:
• Random cloud around the dashed zero line = assumptions met ✅
• Curved/U-shaped smoothed line = non-linearity → consider polynomial terms or log transformation
• Funnel shape (variance increases with fitted values) = heteroscedasticity → use robust SEs or transform outcome
Normal Q-Q Plot plot(model, which=2)
Checks whether residuals are approximately normally distributed. Points should follow the diagonal reference line closely. Minor deviations at the extremes are usually acceptable.
What to look for:
• Points follow the diagonal = normality assumption met ✅
• S-shape (tails curve away at both ends) = heavy tails or outliers
• Points bend upward at the right = right-skewed residuals → consider log transformation
• With large n, CLT provides some robustness — minor deviations are usually acceptable
Scale-Location plot(model, which=3)
Also called Spread-Location. Checks homoscedasticity more directly than plot 1 by plotting the square root of standardised residuals. The red smoothed line should be approximately horizontal.
What to look for:
• Flat red line + evenly spread points = homoscedasticity ✅
• Upward slope = variance increases with fitted values (common in count data, income, biological measurements)
• Fix: log or square-root transform the outcome variable, or use weighted least squares (WLS)
Residuals vs Leverage plot(model, which=5)
Identifies influential observations — points with high leverage (unusual predictor values) AND large residuals. Points outside Cook's D contour lines are potentially distorting your regression line.
Key concepts:
• Leverage = how far the x-value is from the mean — unusual predictor values
• Influence = high leverage + large residual = high Cook's D
• High leverage but small residual: unusual x, but model fits it well — usually OK
• Cook's D > 0.5: moderate concern | Cook's D > 1: strong concern, always investigate
Cook's Distance Bar Plot plot(model, which=4)
Shows the influence of each individual observation on the entire fitted model. A bar above the threshold means that one data point is substantially shifting your regression line.
401
Threshold rules of thumb:
• Cook's D > 4/n: worth a look (dashed threshold line)
• Cook's D > 0.5: moderate concern — examine the observation
• Cook's D > 1: strong concern — report sensitivity analysis with and without
• High Cook's D ≠ automatically delete! Investigate why. It may be the most scientifically interesting point.