Stepwise Regression in R - StatsCodes (2024)

  • 1 Definitions
  • 2 Forward Stepwise Regression inR
  • 3 Backward Stepwise Regression inR
  • 4 Forward-Backward (or Both) StepwiseRegression in R

Here, we discuss stepwise regression in R, including, forward,backward, and bi-directional (or forward-backward) steps.

Stepwise regression in R can be performed with thestep() and lm() functions from the "stats" package in the baseversion of R.

For multiple regressionmodeling, stepwise regression can be used to perform variableselection among a set of variables by adding or dropping one variable ata time.

Sample Steps to Run a Stepwise Regression:

# Specify datax1 = c(6.8, 4.7, 3.3, 4.3, 4.6, 7.4, 2.8)x2 = c(7.2, 3.9, 6.7, 7.0, 5.4, 5.8, 6.6)x3 = c(2.8, 1.8, 6.2, 3.6, 0.4, 4.5, 3.0)x4 = c(6.4, 7.7, 7.9, 7.0, 7.4, 7.1, 6.4)e = c(-0.01, -0.05, 0.04, -0.07, 0.05, 0.03, -0.02)y = 3 + 4*x1 + 3*x3 + edf_data = data.frame(y, x1, x2, x3, x4)df_data
 y x1 x2 x3 x41 38.59 6.8 7.2 2.8 6.42 27.15 4.7 3.9 1.8 7.73 34.84 3.3 6.7 6.2 7.94 30.93 4.3 7.0 3.6 7.05 22.65 4.6 5.4 0.4 7.46 46.13 7.4 5.8 4.5 7.17 23.18 2.8 6.6 3.0 6.4
# Specify the intercept modelintercept_model = lm(y ~ 1, data = df_data)# Specify the guess modelguess_model = lm(y ~ x1 + x2, data = df_data)# Specify the full modelfull_model = lm(y ~ ., data = df_data)#orfull_model = lm(y ~ x1 + x2 +x3 + x4,  data = df_data)

Perform forward step-wise regression:

step(intercept_model, direction = 'forward',  scope = formula(full_model), trace = 0)
Call:lm(formula = y ~ x1 + x3, data = df_data)Coefficients:(Intercept) x1 x3 2.964 4.004 3.003 

Perform backward step-wise regression:

step(full_model, direction = 'backward', trace = 0)
Call:lm(formula = y ~ x1 + x3, data = df_data)Coefficients:(Intercept) x1 x3 2.964 4.004 3.003 

Perform forward-backward step-wise regression:

step(guess_model, direction = 'both', scope = list(lower = formula(intercept_model), upper = formula(full_model)), trace = 0)
Call:lm(formula = y ~ x1 + x3, data = df_data)Coefficients:(Intercept) x1 x3 2.964 4.004 3.003 

Showing Model Iterations and Using BIC:

To show model iterations, set trace = 1. For BIC, setk = log(number of rows).

The results shows that the algorithm only adds variables that reducethe criterion (BIC in this case, although labelled AIC). If there aremultiple variables that reduce the AIC, it picks the one that reduces itthe most.

When no other variable can reduce the AIC, the algorithm stops. Thisway, the algorithm is highly likely to have the final model as thecombination of variables that have the lowest value of the criterion(AIC or BIC).

# Perform forward step-wise regression with step shown# BIC can be used when we set k = log(nrow(df_data)step(intercept_model, direction = 'forward',  scope = formula(full_model), trace = 1, k = log(nrow(df_data)))
Start: AIC=30.95y ~ 1 Df Sum of Sq RSS AIC+ x1 1 253.444 187.55 26.909+ x3 1 164.437 276.56 29.627<none> 440.99 30.948+ x2 1 34.025 406.97 32.332+ x4 1 1.493 439.50 32.870Step: AIC=26.91y ~ x1 Df Sum of Sq RSS AIC+ x3 1 187.536 0.012 -38.605<none> 187.549 26.909+ x2 1 43.612 143.936 27.002+ x4 1 4.962 182.587 28.667Step: AIC=-38.6y ~ x1 + x3 Df Sum of Sq RSS AIC<none> 0.012240 -38.605+ x4 1 0.00116413 0.011076 -37.359+ x2 1 0.00003159 0.012209 -36.677
Call:lm(formula = y ~ x1 + x3, data = df_data)Coefficients:(Intercept) x1 x3 2.964 4.004 3.003 
Table of Some Stepwise Regression Arguments in R
ArgumentUsage
objectThe lm model object to start from
scopeSet the range of variables applicable. This can be one model or alist of the upper and lower models
directionSet to “both” for forward-backward steps, “backward”, or “forward”,with a default of “both”. If scope is not set, the default is“backward”.
traceSet to 1 to print the selection steps, set to0 otherwise
kMultiple of the number of penalty degrees of freedom. For AIC(default), k = 2, For BIC, k = log(number of rows).

Consider the set of predictors \(x_1, x_2,\cdots, x_p\), to be used to predict \(y\). We can define the following modelsbelow.

Full Model

The model with all the available variables:

\[\widehat y = \widehat \beta_0 + \widehat\beta_1 x_1 + \widehat \beta_2 x_2 + \cdots + \widehat \beta_px_p.\]

full_model = lm(y ~ ., data = dataset)#orfull_model = lm(y ~ x1 + x2 + ... + xp, data = dataset)

Intercept Model (or Null Model)

The model with no variable, hence, the predictor is the sample mean,\(\bar y\):

\[\widehat y = \bar y.\]

intercept_model = lm(y ~ 1, data = dataset)

Guess Model

A sub model of the full model, containing a subset of the full set ofvariables, which may be the guess before variable selection:

\[\widehat y = \widehat \beta_0 + \widehat\beta_1 x_1 + \widehat \beta_2 x_2.\]

guess_model = lm(y ~ x1 + x2, data = dataset)

Dataset

Using the "USJudgeRatings" data from the "datasets" package with10 sample rows from 43 rows below:

USJudgeRatings
 CONT INTG DMNR DILG CFMG DECI PREP FAMI ORAL WRIT PHYS RTENAARONSON,L.H. 5.7 7.9 7.7 7.3 7.1 7.4 7.1 7.1 7.1 7.0 8.3 7.8ARMENTANO,A.J. 7.2 8.1 7.8 7.8 7.5 7.6 7.5 7.5 7.3 7.4 7.9 7.8BURNS,E.B. 6.2 8.8 8.7 8.5 7.9 8.0 8.1 8.0 8.0 8.0 8.6 8.6HEALEY.A.H. 8.0 7.6 6.6 7.2 6.5 6.5 6.8 6.7 6.4 6.5 6.9 6.7O'BRIEN,F.J. 7.1 8.5 8.3 8.0 7.9 7.9 7.8 7.8 7.8 7.7 8.3 8.2O'SULLIVAN,T.J. 7.5 9.0 8.9 8.7 8.4 8.5 8.4 8.3 8.3 8.3 8.8 8.7PASKEY,L. 7.5 8.1 7.7 8.2 8.0 8.1 8.2 8.4 8.0 8.1 8.4 8.1SIDOR,W.J. 7.7 6.2 5.1 5.6 5.6 5.9 5.6 5.6 5.3 5.5 6.3 5.3STAPLETON,J.F. 6.5 8.2 7.7 7.8 7.6 7.7 7.7 7.7 7.5 7.6 8.5 7.7ZARRILLI,K.J. 8.6 7.4 7.0 7.5 7.5 7.7 7.4 7.2 6.9 7.0 7.8 7.1

NOTE: The dependent variable is "RTEN".

The data on US judges’ ratings by lawyers with descriptionsbelow.

Table of Variable Descriptions
SignCodeVariable Description
\(y\)RTENWorthy of retention
\(x_1\)CONTNumber of contacts of lawyer with judge
\(x_2\)INTGJudicial integrity
\(x_3\)DMNRDemeanor
\(x_4\)DILGDiligence
\(x_5\)CFMGCase flow managing
\(x_6\)DECIPrompt decisions
\(x_7\)PREPPreparation for trial
\(x_8\)FAMIFamiliarity with law
\(x_9\)ORALSound oral rulings
\(x_{10}\)WRITSound written rulings
\(x_{11}\)PHYSPhysical ability

Set the Defined Reference Models

# Specify the intercept modelintercept_model = lm(RTEN ~ 1, data = USJudgeRatings)# Specify the guess modelguess_model = lm(RTEN ~ DMNR + DILG + CFMG, data = USJudgeRatings)# Specify the full modelfull_model = lm(RTEN ~ ., data = USJudgeRatings)#orfull_model = lm(RTEN ~ CONT + INTG + DMNR + DILG + CFMG + DECI + PREP + FAMI + ORAL + WRIT + PHYS,  data = USJudgeRatings)

In the forward selection, the following happens:

  • We start with the intercept model or the guess model (theguess model must be a subset of the full set of variables in the upperscope).

  • Then from the intercept model or the guess model, based on AIC orBIC criterion, we add one variable repeatedly, and potentially pickvariables up to the full set of variables available (or upper scope) ifthey all improve model quality.

Perform forward step-wise regression fromintercept:

step(intercept_model, direction = 'forward',  scope = formula(full_model), trace = 0)
Call:lm(formula = RTEN ~ ORAL + DMNR + PHYS + INTG + DECI, data = USJudgeRatings)Coefficients:(Intercept) ORAL DMNR PHYS INTG DECI -2.2043 0.2917 0.1520 0.2829 0.3779 0.1667 

The final selection includes only "ORAL", "DMNR", "PHYS","INTG", and "DECI".

summary(lm(formula = RTEN ~ ORAL + DMNR + PHYS + INTG + DECI, data = USJudgeRatings))
Call:lm(formula = RTEN ~ ORAL + DMNR + PHYS + INTG + DECI, data = USJudgeRatings)Residuals: Min 1Q Median 3Q Max -0.240656 -0.069026 -0.009474 0.068961 0.246402 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.20433 0.43611 -5.055 1.19e-05 ***ORAL 0.29169 0.10191 2.862 0.006887 ** DMNR 0.15199 0.06354 2.392 0.021957 * PHYS 0.28292 0.04678 6.048 5.40e-07 ***INTG 0.37785 0.10559 3.579 0.000986 ***DECI 0.16672 0.07702 2.165 0.036928 * ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Residual standard error: 0.1119 on 37 degrees of freedomMultiple R-squared: 0.9909, Adjusted R-squared: 0.9897 F-statistic: 806.1 on 5 and 37 DF, p-value: < 2.2e-16

If we want our model to include a specific set of variables,we could start with those variables in a guess model. For example, for"DMNR", "DILG", and "CFMG", we could use the guess modelabove.

Perform forward step-wise regression from guess:

step(guess_model, direction = 'forward',  scope = formula(full_model), trace = 0)
Call:lm(formula = RTEN ~ DMNR + DILG + CFMG + PHYS + ORAL + INTG + DECI, data = USJudgeRatings)Coefficients:(Intercept) DMNR DILG CFMG PHYS ORAL -2.22619 0.15132 0.01817 -0.10734 0.29224 0.30146 INTG DECI 0.37467 0.24209 

The final selection includes the guess variables, "DMNR","DILG", and "CFMG", including additional variables "PHYS", "ORAL","INTG" and "DECI".

Notice that not all the variables in the table below are significant,compared to the model selection that started with the intercept model."DILG" and "CFMG" are not significant, which is a disadvantage of notstarting from the intercept model.

summary(lm(formula = RTEN ~ DMNR + DILG + CFMG + PHYS + ORAL + INTG + DECI, data = USJudgeRatings))
Call:lm(formula = RTEN ~ DMNR + DILG + CFMG + PHYS + ORAL + INTG + DECI, data = USJudgeRatings)Residuals: Min 1Q Median 3Q Max -0.24638 -0.06583 -0.01054 0.06288 0.25587 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.22619 0.45932 -4.847 2.55e-05 ***DMNR 0.15132 0.06908 2.190 0.03524 * DILG 0.01817 0.10274 0.177 0.86062 CFMG -0.10734 0.12109 -0.886 0.38142 PHYS 0.29224 0.05101 5.730 1.75e-06 ***ORAL 0.30146 0.10845 2.780 0.00869 ** INTG 0.37467 0.11885 3.152 0.00331 ** DECI 0.24209 0.12758 1.898 0.06603 . ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Residual standard error: 0.1137 on 35 degrees of freedomMultiple R-squared: 0.9911, Adjusted R-squared: 0.9893 F-statistic: 557.3 on 7 and 35 DF, p-value: < 2.2e-16

In the backward selection, the following happens:

  • We start with the full model or set of models to be possiblyincluded in the final selection.

  • Then from the full model, based on AIC or BIC criterion, we dropone variable repeatedly, and potentially drop every variable to get tothe intercept model if no variable improves model quality, or get to afinal selection with specific variables if those variables are set in alower scope model formula. The lower scope must be a subset of thefull set of variables in the full model.

Perform backward step-wise regression from fullmodel:

step(full_model, direction = 'backward', trace = 0)
Call:lm(formula = RTEN ~ INTG + DMNR + DECI + ORAL + PHYS, data = USJudgeRatings)Coefficients:(Intercept) INTG DMNR DECI ORAL PHYS -2.2043 0.3779 0.1520 0.1667 0.2917 0.2829 

The final selection includes only "INTG", "DMNR", "DECI","ORAL", and "PHYS". This is the same set as the selection abovefor the forward stepwise selection but in different order.

If we want our model to include a specific set of variables,we could end with those variables included in a set lower scope model.For example, for "DMNR", "DILG", and "CFMG", we could use the guessmodel above.

Perform backward step-wise regression with lower model asguess model:

step(full_model, direction = 'backward',  scope = list(lower = formula(guess_model)), trace = 0)
Call:lm(formula = RTEN ~ INTG + DMNR + DILG + CFMG + DECI + ORAL + PHYS, data = USJudgeRatings)Coefficients:(Intercept) INTG DMNR DILG CFMG DECI -2.22619 0.37467 0.15132 0.01817 -0.10734 0.24209 ORAL PHYS 0.30146 0.29224 

The final selection includes the guess variables, "DMNR","DILG", and "CFMG", including additional variables "INTG", "DECI","ORAL" and "PHYS". This is the same set as the selection abovefor the forward stepwise selection that started with the guess model butin different order.

Again, not all the variables in the selection are significant,compared to the model selection that did not have a lower scope modelset. "DILG" and "CFMG" are not significant, which is a disadvantage ofarbitrarily setting a lower scope.

In the forward-backward (or both) selection, the followinghappens:

  • We start with a guess model model, this could be the interceptmodel, the full model, or a subset of the full model.

  • Then from the guess model, based on AIC or BIC criterion, we addone or drop one variable repeatedly, and potentially add more variablesto the guess model if they are significant or drop variables from theguess model if they all improve model quality. The lower scope mustbe as subset of the guess model which in turn must be a subset of thefull model.

Perform forward-backward step-wise regression starting fromintercept model:

step(intercept_model, direction = 'both', scope = list(lower = formula(intercept_model), upper = formula(full_model)), trace = 0)
Call:lm(formula = RTEN ~ ORAL + DMNR + PHYS + INTG + DECI, data = USJudgeRatings)Coefficients:(Intercept) ORAL DMNR PHYS INTG DECI -2.2043 0.2917 0.1520 0.2829 0.3779 0.1667 

Perform forward-backward step-wise regression starting fromguess model:

step(guess_model, direction = 'both', scope = list(lower = formula(intercept_model), upper = formula(full_model)), trace = 0)
Call:lm(formula = RTEN ~ DMNR + PHYS + ORAL + INTG + DECI, data = USJudgeRatings)Coefficients:(Intercept) DMNR PHYS ORAL INTG DECI -2.2043 0.1520 0.2829 0.2917 0.3779 0.1667 

Perform forward-backward step-wise regression starting fromfull model:

step(full_model, direction = 'both', scope = list(lower = formula(intercept_model), upper = formula(full_model)), trace = 0)
Call:lm(formula = RTEN ~ INTG + DMNR + DECI + ORAL + PHYS, data = USJudgeRatings)Coefficients:(Intercept) INTG DMNR DECI ORAL PHYS -2.2043 0.3779 0.1520 0.1667 0.2917 0.2829 

The final selections are all the same and it includes only"DMNR", "PHYS", "ORAL", "INTG", and "DECI". This is the sameset as the selection above for the forward stepwise selection.

In the forward-backward stepwise selection, unlike theforward and backward stepwise selections, you cannot pre-specifyvariables that must be included in the final selection, you can only setthe scope of the variables to be included.

Stepwise Regression in R - StatsCodes (2024)
Top Articles
How to Furnish a New Home
Shenzhen: Gateway to China - Brisbane Airport
Rosy Boa Snake — Turtle Bay
Katie Nickolaou Leaving
My E Chart Elliot
Frank Lloyd Wright, born 150 years ago, still fascinates
Nwi Police Blotter
Top Financial Advisors in the U.S.
Is Sportsurge Safe and Legal in 2024? Any Alternatives?
Whiskeytown Camera
Craigslist Free Grand Rapids
What’s the Difference Between Cash Flow and Profit?
1Win - инновационное онлайн-казино и букмекерская контора
Transfer Credits Uncc
Conan Exiles Colored Crystal
Dallas Cowboys On Sirius Xm Radio
Epro Warrant Search
Trac Cbna
Craigslist Sparta Nj
Wics News Springfield Il
Craigslist Panama City Beach Fl Pets
Margaret Shelton Jeopardy Age
Coindraw App
'Insidious: The Red Door': Release Date, Cast, Trailer, and What to Expect
Radical Red Ability Pill
O'reilly's In Monroe Georgia
Tottenham Blog Aggregator
Experity Installer
2487872771
Vip Lounge Odu
Missing 2023 Showtimes Near Mjr Southgate
Rocksteady Steakhouse Menu
Chris Provost Daughter Addie
Chs.mywork
Sams La Habra Gas Price
Hellgirl000
Uvalde Topic
2 Pm Cdt
Ross Dress For Less Hiring Near Me
RECAP: Resilient Football rallies to claim rollercoaster 24-21 victory over Clarion - Shippensburg University Athletics
Cocaine Bear Showtimes Near Cinemark Hollywood Movies 20
Arcane Bloodline Pathfinder
Ethan Cutkosky co*ck
LoL Lore: Die Story von Caitlyn, dem Sheriff von Piltover
Todd Gutner Salary
Wilson Tire And Auto Service Gambrills Photos
Makes A Successful Catch Maybe Crossword Clue
Hampton In And Suites Near Me
A jovem que batizou lei após ser sequestrada por 'amigo virtual'
Lightfoot 247
Used Curio Cabinets For Sale Near Me
Obituary Roger Schaefer Update 2020
Latest Posts
Article information

Author: Prof. An Powlowski

Last Updated:

Views: 6538

Rating: 4.3 / 5 (64 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Prof. An Powlowski

Birthday: 1992-09-29

Address: Apt. 994 8891 Orval Hill, Brittnyburgh, AZ 41023-0398

Phone: +26417467956738

Job: District Marketing Strategist

Hobby: Embroidery, Bodybuilding, Motor sports, Amateur radio, Wood carving, Whittling, Air sports

Introduction: My name is Prof. An Powlowski, I am a charming, helpful, attractive, good, graceful, thoughtful, vast person who loves writing and wants to share my knowledge and understanding with you.