A Complete Guide to Stepwise Regression in R (2024)

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Stepwise regression is a powerful technique used to build predictive models by iteratively adding or removing variables based on statistical criteria. In R, this can be achieved using functions like step() or manually with forward and backward selection.

Forward Stepwise Regression:

# Initialize an empty modelforward_model <- lm(mpg ~ ., data = mtcars)# Forward stepwise regressionforward_model <- step(forward_model, direction = "forward", scope = formula(~ .))

In simple terms, we start with a model containing no predictors (mpg ~ 1) and iteratively add the most statistically significant variables until no improvement is observed.

Backward Stepwise Regression:

# Initialize a model with all predictorsbackward_model <- lm(mpg ~ ., data = mtcars)# Backward stepwise regressionbackward_model <- step(backward_model, direction = "backward")
Start: AIC=70.9mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb Df Sum of Sq RSS AIC- cyl 1 0.0799 147.57 68.915- vs 1 0.1601 147.66 68.932- carb 1 0.4067 147.90 68.986- gear 1 1.3531 148.85 69.190- drat 1 1.6270 149.12 69.249- disp 1 3.9167 151.41 69.736- hp 1 6.8399 154.33 70.348- qsec 1 8.8641 156.36 70.765<none> 147.49 70.898- am 1 10.5467 158.04 71.108- wt 1 27.0144 174.51 74.280Step: AIC=68.92mpg ~ disp + hp + drat + wt + qsec + vs + am + gear + carb Df Sum of Sq RSS AIC- vs 1 0.2685 147.84 66.973- carb 1 0.5201 148.09 67.028- gear 1 1.8211 149.40 67.308- drat 1 1.9826 149.56 67.342- disp 1 3.9009 151.47 67.750- hp 1 7.3632 154.94 68.473<none> 147.57 68.915- qsec 1 10.0933 157.67 69.032- am 1 11.8359 159.41 69.384- wt 1 27.0280 174.60 72.297Step: AIC=66.97mpg ~ disp + hp + drat + wt + qsec + am + gear + carb Df Sum of Sq RSS AIC- carb 1 0.6855 148.53 65.121- gear 1 2.1437 149.99 65.434- drat 1 2.2139 150.06 65.449- disp 1 3.6467 151.49 65.753- hp 1 7.1060 154.95 66.475<none> 147.84 66.973- am 1 11.5694 159.41 67.384- qsec 1 15.6830 163.53 68.200- wt 1 27.3799 175.22 70.410Step: AIC=65.12mpg ~ disp + hp + drat + wt + qsec + am + gear Df Sum of Sq RSS AIC- gear 1 1.565 150.09 63.457- drat 1 1.932 150.46 63.535<none> 148.53 65.121- disp 1 10.110 158.64 65.229- am 1 12.323 160.85 65.672- hp 1 14.826 163.35 66.166- qsec 1 26.408 174.94 68.358- wt 1 69.127 217.66 75.350Step: AIC=63.46mpg ~ disp + hp + drat + wt + qsec + am Df Sum of Sq RSS AIC- drat 1 3.345 153.44 62.162- disp 1 8.545 158.64 63.229<none> 150.09 63.457- hp 1 13.285 163.38 64.171- am 1 20.036 170.13 65.466- qsec 1 25.574 175.67 66.491- wt 1 67.572 217.66 73.351Step: AIC=62.16mpg ~ disp + hp + wt + qsec + am Df Sum of Sq RSS AIC- disp 1 6.629 160.07 61.515<none> 153.44 62.162- hp 1 12.572 166.01 62.682- qsec 1 26.470 179.91 65.255- am 1 32.198 185.63 66.258- wt 1 69.043 222.48 72.051Step: AIC=61.52mpg ~ hp + wt + qsec + am Df Sum of Sq RSS AIC- hp 1 9.219 169.29 61.307<none> 160.07 61.515- qsec 1 20.225 180.29 63.323- am 1 25.993 186.06 64.331- wt 1 78.494 238.56 72.284Step: AIC=61.31mpg ~ wt + qsec + am Df Sum of Sq RSS AIC<none> 169.29 61.307- am 1 26.178 195.46 63.908- qsec 1 109.034 278.32 75.217- wt 1 183.347 352.63 82.790

Here, we begin with a model including all predictors and iteratively remove the least statistically significant variables until the model no longer improves.

Both-Direction Stepwise Regression:

# Initialize a model with all predictorsboth_model <- lm(mpg ~ ., data = mtcars)# Both-direction stepwise regressionboth_model <- step(both_model, direction = "both")
Start: AIC=70.9mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb Df Sum of Sq RSS AIC- cyl 1 0.0799 147.57 68.915- vs 1 0.1601 147.66 68.932- carb 1 0.4067 147.90 68.986- gear 1 1.3531 148.85 69.190- drat 1 1.6270 149.12 69.249- disp 1 3.9167 151.41 69.736- hp 1 6.8399 154.33 70.348- qsec 1 8.8641 156.36 70.765<none> 147.49 70.898- am 1 10.5467 158.04 71.108- wt 1 27.0144 174.51 74.280Step: AIC=68.92mpg ~ disp + hp + drat + wt + qsec + vs + am + gear + carb Df Sum of Sq RSS AIC- vs 1 0.2685 147.84 66.973- carb 1 0.5201 148.09 67.028- gear 1 1.8211 149.40 67.308- drat 1 1.9826 149.56 67.342- disp 1 3.9009 151.47 67.750- hp 1 7.3632 154.94 68.473<none> 147.57 68.915- qsec 1 10.0933 157.67 69.032- am 1 11.8359 159.41 69.384+ cyl 1 0.0799 147.49 70.898- wt 1 27.0280 174.60 72.297Step: AIC=66.97mpg ~ disp + hp + drat + wt + qsec + am + gear + carb Df Sum of Sq RSS AIC- carb 1 0.6855 148.53 65.121- gear 1 2.1437 149.99 65.434- drat 1 2.2139 150.06 65.449- disp 1 3.6467 151.49 65.753- hp 1 7.1060 154.95 66.475<none> 147.84 66.973- am 1 11.5694 159.41 67.384- qsec 1 15.6830 163.53 68.200+ vs 1 0.2685 147.57 68.915+ cyl 1 0.1883 147.66 68.932- wt 1 27.3799 175.22 70.410Step: AIC=65.12mpg ~ disp + hp + drat + wt + qsec + am + gear Df Sum of Sq RSS AIC- gear 1 1.565 150.09 63.457- drat 1 1.932 150.46 63.535<none> 148.53 65.121- disp 1 10.110 158.64 65.229- am 1 12.323 160.85 65.672- hp 1 14.826 163.35 66.166+ carb 1 0.685 147.84 66.973+ vs 1 0.434 148.09 67.028+ cyl 1 0.414 148.11 67.032- qsec 1 26.408 174.94 68.358- wt 1 69.127 217.66 75.350Step: AIC=63.46mpg ~ disp + hp + drat + wt + qsec + am Df Sum of Sq RSS AIC- drat 1 3.345 153.44 62.162- disp 1 8.545 158.64 63.229<none> 150.09 63.457- hp 1 13.285 163.38 64.171+ gear 1 1.565 148.53 65.121+ cyl 1 1.003 149.09 65.242+ vs 1 0.645 149.45 65.319+ carb 1 0.107 149.99 65.434- am 1 20.036 170.13 65.466- qsec 1 25.574 175.67 66.491- wt 1 67.572 217.66 73.351Step: AIC=62.16mpg ~ disp + hp + wt + qsec + am Df Sum of Sq RSS AIC- disp 1 6.629 160.07 61.515<none> 153.44 62.162- hp 1 12.572 166.01 62.682+ drat 1 3.345 150.09 63.457+ gear 1 2.977 150.46 63.535+ cyl 1 2.447 150.99 63.648+ vs 1 1.121 152.32 63.927+ carb 1 0.011 153.43 64.160- qsec 1 26.470 179.91 65.255- am 1 32.198 185.63 66.258- wt 1 69.043 222.48 72.051Step: AIC=61.52mpg ~ hp + wt + qsec + am Df Sum of Sq RSS AIC- hp 1 9.219 169.29 61.307<none> 160.07 61.515+ disp 1 6.629 153.44 62.162+ carb 1 3.227 156.84 62.864+ drat 1 1.428 158.64 63.229- qsec 1 20.225 180.29 63.323+ cyl 1 0.249 159.82 63.465+ vs 1 0.249 159.82 63.466+ gear 1 0.171 159.90 63.481- am 1 25.993 186.06 64.331- wt 1 78.494 238.56 72.284Step: AIC=61.31mpg ~ wt + qsec + am Df Sum of Sq RSS AIC<none> 169.29 61.307+ hp 1 9.219 160.07 61.515+ carb 1 8.036 161.25 61.751+ disp 1 3.276 166.01 62.682+ cyl 1 1.501 167.78 63.022+ drat 1 1.400 167.89 63.042+ gear 1 0.123 169.16 63.284+ vs 1 0.000 169.29 63.307- am 1 26.178 195.46 63.908- qsec 1 109.034 278.32 75.217- wt 1 183.347 352.63 82.790

In both-direction regression, the algorithm combines both forward and backward steps, optimizing the model by adding significant variables and removing insignificant ones.

Visualizing Data and Model Fit:

Now, let’s visualize the data and model fit using base R plots.

# Scatter plot of mpg vs. hpplot(mtcars$hp, mtcars$mpg, main = "Scatter Plot of mpg vs. hp", xlab = "hp", ylab = "mpg", pch = 20 )abline(lm(mpg ~ hp, data = mtcars), col = "black", lwd = 2)points(sort(mtcars$hp), forward_model$fitted.values, col = "red", pch = 20)points(sort(mtcars$hp), backward_model$fitted.values, col = "blue", pch = 20)points(sort(mtcars$hp), both_model$fitted.values, col = "green", pch = 20)legend("topright", legend = c("Forward", "Backward", "Both-Direction"), col = c("red", "blue", "green"), pch = 20)

A Complete Guide to Stepwise Regression in R (1)

This plot displays the scatter plot of mpg against hp with fitted lines for each stepwise regression. The colors correspond to the models created earlier.

Visualizing Residuals:

# Residual plots for each modelpar(mfrow = c(2, 2))# Forward stepwise regression residualsplot(forward_model$residuals, main = "Forward Residuals", ylab = "Residuals")# Backward stepwise regression residualsplot(backward_model$residuals, main = "Backward Residuals", ylab = "Residuals")# Both-direction stepwise regression residualsplot(both_model$residuals, main = "Both-Direction Residuals", ylab = "Residuals")

A Complete Guide to Stepwise Regression in R (2)

These plots help assess how well the models fit the data by examining the residuals.

Stepwise regression is a valuable tool, but it’s crucial to interpret results cautiously and be aware of potential pitfalls.

Related

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A Complete Guide to Stepwise Regression in R (2024)
Top Articles
Sam&#39;s Club hiring Prepared Meals and Rotisserie Chicken Associate in Temple, Texas, United States | LinkedIn
Sam&#39;s Club hiring Meat Cutter and Wrapper in Temple, Texas, United States | LinkedIn
NFL on CBS Schedule 2024 - How To Watch Live Football Games
24 Hour Car Wash Queens Ny
Boost Mobile 69Th Ashland
Lynaritaa Boobs
Dtm Urban Dictionary
2167+ Unique Pieces of Authentic Mid Century Modern Furniture In Stock - furniture - by dealer - sale - craigslist
How To Find Someone's IP On Discord | Robots.net
Creepshot. Org
Craigslist Richmond Ba
Jocelyne Mirando
Milk And Mocha Bear Gifs
Discovering The Height Of Hannah Waddingham: A Look At The Talented Actress
Zulrah Strat Osrs
Ttw Cut Content
Myjohnshopkins Mychart
Trizzle Aarp
Us151 San Jose
Convert liters to quarts
Craigslist For Cars Los Angeles
Vanity Fair Muckrack
Katmoie
Importing Songs into Clone Hero: A Comprehensive Tutorial
Arapahoe Youth League Baseball
Claw Machine Random Name Picker
Wyr Discount Code
Locals Canna House Deals
Preventice Learnworlds
Bank Of America Financial Center Irvington Photos
Wie funktioniert der Ochama Supermarkt? | Ladenbau.de Ratgeber
$200K In Rupees
No Compromise in Maneuverability and Effectiveness
Junees Cedarhurst
Pack & Ship Electronics, Artwork, Antiques and more at The UPS Store Newnan, GA at 90-F Glenda Trace
Super Bowl 17 Ray Finkle
Clarksburg Wv Craigslist Personals
Thotsbay New Site
Urbn Employee Appreciation Fall 2023
Sherlock - Streams, Episodenguide und News zur Serie
American Idol Winners Wiki
168 Bus Schedule Pdf 2022
Hotels Near William Woollett Jr Aquatics Center
Varsity Competition Results 2022
Craigslist.com Hawaii
Sparkle Nails Phillipsburg
Osrs Nex Mass
Mike Huckabee Bio, Age, Wife, Fox News, Net Worth, Salary
Where To Find Mega Ring In Pokemon Radical Red
13364 Nw 42Nd Street
Fapspace.site
Gary Zerola Net Worth
Latest Posts
Article information

Author: Sen. Emmett Berge

Last Updated:

Views: 6530

Rating: 5 / 5 (60 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Sen. Emmett Berge

Birthday: 1993-06-17

Address: 787 Elvis Divide, Port Brice, OH 24507-6802

Phone: +9779049645255

Job: Senior Healthcare Specialist

Hobby: Cycling, Model building, Kitesurfing, Origami, Lapidary, Dance, Basketball

Introduction: My name is Sen. Emmett Berge, I am a funny, vast, charming, courageous, enthusiastic, jolly, famous person who loves writing and wants to share my knowledge and understanding with you.