stepwise function - RDocumentation (2024)

Description

Select optimal model using various stepwise regression strategies, e.g., Forward Selection, Backward Elimination, Bidirectional Elimination; meanwhile, it also supports Best Subset method. Four types of models are currently implemented: linear regression, logistic regression, Cox regression, Poisson, and Gamma regression. For selection criteria, a.k.a, stop rule, users can choose from AIC, AICc, BIC, HQ, Significant Level, and more.

Usage

stepwise( formula, data, type = c("linear", "logit", "cox", "poisson", "gamma", "negbin"), include = NULL, strategy = c("forward", "backward", "bidirection", "subset"), metric = c("AIC", "AICc", "BIC", "CP", "HQ", "Rsq", "adjRsq", "SL", "SBC", "IC(3/2)", "IC(1)"), sle = 0.15, sls = 0.15, test_method_linear = c("Pillai", "Wilks", "Hotelling-Lawley", "Roy"), test_method_glm = c("Rao", "LRT"), test_method_cox = c("efron", "breslow", "exact"), tolerance = 1e-07, weight = NULL, best_n = 3, num_digits = 6)

Value

A list containing multiple tables will be returned.

  • Summary of arguments for model selection: Arguments used in the stepwise function, either default or user-supplied values.

  • Summary of variables in dataset: Variable names, types, and classes in dataset.

  • Summary of selection process under xxx(strategy) with xxx(metric): Overview of the variable selection process under specified strategy and metric.

  • Summary of coefficients for the selected model with xxx(dependent variable) under xxx(strategy) and xxx(metric): Coefficients for the selected models under specified strategy with metric. Please note that this table will not be generated for the strategy 'subset' when using the metric 'SL'.

Arguments

formula

(formula) The formula used for model fitting by defining the scope of dependent and independent variables. The formula takes the form of a '~' (tilde) symbol, with the response variable(s) on the left-hand side, and the predictor variable(s) on the right-hand side. The 'lm()' function uses this formula to fit a regression model. A formula can be as simple as 'y ~ x'. For multiple predictors, they must be separated by the '+' (plus) symbol, e.g. 'y ~ x1 + x2'. To include an interaction term between variables, use the ':' (colon) symbol: 'y ~ x1 + x1:x2'. Use the '.' (dot) symbol to indicate that all other variables in the dataset should be included as predictors, e.g. 'y ~ .'. In the case of multiple response variables (multivariate), the formula can be specified as 'cbind(y1, y2) ~ x1 + x2'. By default, an intercept term is always included in the models, to exclude it, include '0' or '- 1' in your formula: 'y ~ 0 + x1', 'y ~ x1 + 0', and 'y ~ x1 - 1'.

data

(data.frame) A dataset consisting of predictor variable(s) and response variable(s).

type

(character) The stepwise regression type. Choose from 'linear', 'logit', 'poisson', 'cox', 'gamma' and 'negbin'. Default is 'linear'. More information, see StepReg_vignettes

include

(NULL|character) A character vector specifying predictor variables that will always stay in the model. A subset of the predictors in the dataset.

strategy

(character) The model selection strategy. Choose from 'forward', 'backward', 'bidirectional' and 'subset'. Default is 'forward'. More information, see StepReg_vignettes

metric

(character) The model selection criterion (model fit score). Used for the evaluation of the predictive performance of an intermediate model. Choose from 'AIC', 'AICc', 'BIC', 'CP', 'HQ', 'Rsq', 'adjRsq', 'SL', 'SBC', 'IC(3/2)', 'IC(1)'. Default is 'AIC'. More information, see StepReg_vignettes

sle

(numeric) Significance Level to Enter. It is the statistical significance level that a predictor variable must meet to be included in the model. E.g. if 'sle = 0.05', a predictor with a P-value less than 0.05 will 'enter' the model. Default is 0.15.

sls

(numeric) Significance Level to Stay. Similar to 'sle', 'sls' is the statistical significance level that a predictor variable must meet to 'stay' in the model. E.g. if 'sls = 0.1', a predictor that was previously included in the model but whose P-value is now greater than 0.1 will be removed.

test_method_linear

(character) Test method for multivariate linear regression analysis, choose from 'Pillai', 'Wilks', 'Hotelling-Lawley', 'Roy'. Default is 'Pillai'. For univariate regression, 'F-test' will be used.

test_method_glm

(character) Test method for logit, Poisson, Gamma, and negative binomial regression analysis, choose from 'Rao', 'LRT'. Default is 'Rao'. Only "Rao" is available for strategy = 'subset'.

test_method_cox

(character) Test method for cox regression analysis, choose from 'efron', 'breslow', 'exact'. Default is 'efron'.

tolerance

(numeric) A statistical measure used to assess multicollinearity in a multiple regression model. It is calculated as the proportion of the variance in a predictor variable that is not accounted for by the other predictor variables in the model. Default is 1e-07.

weight

(numeric) A numeric vector specifying the coefficients assigned to the predictor variables. The magnitude of the weight reflects the degree to which each predictor variable contributes to the prediction of the response variable. The range of weight should be from 0 to 1. Values greater than 1 will be coerced to 1, and values less than 0 will be coerced to 0. Default is NULL, which means that all weight are set equal.

best_n

(numeric(integer)) The number of models to be retained in the process output. Default is 3, indicating that only the top 3 best models with the same number of variables are displayed. If all models are displayed, set it to Inf.

num_digits

(numeric(integer)) The number of digits to keep when rounding the results. Default is 6.

Author

Junhui Li, Kai Hu, Xiaohuan Lu

References

Alsubaihi, A. A., Leeuw, J. D., and Zeileis, A. (2002). Variable strategy in multivariable regression using sas/iml. , 07(i12).

Darlington, R. B. (1968). Multiple regression in psychological research and practice. Psychological Bulletin, 69(3), 161.

Dharmawansa, P. , Nadler, B. , & Shwartz, O. . (2014). Roy's largest root under rank-one alternatives:the complex valued case and applications. Statistics.

Hannan, E. J., & Quinn, B. G. (1979). The determination of the order of an autoregression. Journal of the Royal Statistical Society, 41(2), 190-195.

Harold Hotelling. (1992). The Generalization of Student's Ratio. Breakthroughs in Statistics. Springer New York.

Hocking, R. R. (1976). A biometrics invited paper. the analysis and strategy of variables in linear regression. Biometrics, 32(1), 1-49.

Hurvich, C. M., & Tsai, C. (1989). Regression and time series model strategy in small samples. Biometrika, 76(2), 297-307.

Judge, & GeorgeG. (1985). The Theory and practice of econometrics /-2nd ed. The Theory and practice of econometrics /. Wiley.

Mallows, C. L. (1973). Some comments on cp. Technometrics, 15(4), 661-676.

Mardia, K. V., Kent, J. T., & Bibby, J. M. (1979). Multivariate analysis. Mathematical Gazette, 37(1), 123-131.

Mckeon, J. J. (1974). F approximations to the distribution of hotelling's t20. Biometrika, 61(2), 381-383.

Mcquarrie, A. D. R., & Tsai, C. L. (1998). Regression and Time Series Model strategy. Regression and time series model strategy /. World Scientific.

Pillai, K. . (1955). Some new test criteria in multivariate analysis. The Annals of Mathematical Statistics, 26(1), 117-121.

R.S. Sparks, W. Zucchini, & D. Coutsourides. (1985). On variable strategy in multivariate regression. Communication in Statistics- Theory and Methods, 14(7), 1569-1587.

Sawa, T. (1978). Information criteria for discriminating among alternative regression models. Econometrica, 46(6), 1273-1291.

Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), pags. 15-18.

Examples

Run this code

## perform multivariate linear stepwise regression with 'bidirection' ## strategy and 'AIC' stop rule, excluding intercept.data(mtcars)mtcars$yes <- mtcars$wtformula <- cbind(mpg,drat) ~ . + 0stepwise(formula = formula, data = mtcars, type = "linear", strategy = "bidirection", metric = "AIC")## perform linear stepwise regression with 'bidirection' strategy and ## "AIC","SBC","SL","AICc","BIC", and "HQ" stop rule.formula <- mpg ~ . + 1stepwise(formula = formula, data = mtcars, type = "linear", strategy = c("forward","bidirection"), metric = c("AIC","SBC","SL","AICc","BIC","HQ"))## perform logit stepwise regression with 'forward' strategy and significance## level as stop rule.data(remission)formula <- remiss ~ .stepwise(formula = formula, data = remission, type = "logit", strategy = "forward", metric = "SL", sle=0.05, sls=0.05)

Run the code above in your browser using DataLab

stepwise function - RDocumentation (2024)

FAQs

What does a stepwise function do? ›

In Mathematics, a stepwise function (staircase function) is defined as a piecewise constant function or series of steps, that only has a finite number of pieces. In other words, a function on the real numbers can be described as a finite linear combination of indicator functions of given intervals.

What is the stepwise selection function? ›

The basic idea is to build an initial model with all of the variables and calculate a metric for evaluation with a set criterion. The algorithm then continually removes variables until the criterion is satisfied. There are many similar methods to the stepwise procedure, such as backward elimination.

What is the stepwise function in R? ›

To perform forward stepwise addition and backward stepwise deletion, the R function step is used for subset selection. For forward stepwise selection, baseModel indicates an initial model in the stepwise search and scope defines the range of models examined in the stepwise search.

What is the step function in R model? ›

The step() function in R Programming Language is used for stepwise variable selection in linear models. It automates the process of selecting a subset of variables from a larger set based on some criterion, such as AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion).

What is the advantage of stepwise? ›

Stepwise methods can help you simplify your model, reduce overfitting, and improve prediction accuracy. The stepwise method is an important fundamental statistical technique in building multiple regression models. It involves both forward selection and backward elimination approaches.

Is stepwise regression good or bad? ›

Critics regard the procedure as a paradigmatic example of data dredging, intense computation often being an inadequate substitute for subject area expertise. Additionally, the results of stepwise regression are often used incorrectly without adjusting them for the occurrence of model selection.

How does stepwise regression work in R? ›

Stepwise regression is a powerful technique used to build predictive models by iteratively adding or removing variables based on statistical criteria. In R, this can be achieved using functions like step() or manually with forward and backward selection.

What is an example of a stepwise regression? ›

An example of a stepwise regression using the backward elimination method would be an attempt to understand energy usage at a factory using variables such as equipment run time, equipment age, staff size, temperatures outside, and time of year.

What is the difference between piecewise and stepwise functions? ›

In CP Optimizer, piecewise linear functions are typically used to model a known function of time, for instance the cost incurred for completing an activity after a known date t. Stepwise functions are typically used to model the efficiency of a resource over time.

Why use a step function? ›

Step Functions manages your application's components and logic, so you can write less code and focus on building and updating your application quickly.

How do you explain step functions? ›

Mathematically speaking, a step function is a function whose graph looks like a series of steps because it consists of a series of horizontal line segments with jumps in-between. For this reason, it is also sometimes called a staircase function.

How to draw step function in R? ›

We use the plot() function to create a basic step line plot. The x argument specifies the x-axis values (time_points), and the y argument specifies the y-axis values (values). The type argument is set to “s” to indicate that we want to create a step line plot.

What is stepwise used for? ›

Stepwise regression is used to design a regression model to introduce only relevant and statistically significant variables. Other variables are discarded. However, every regression calculation contains unwanted variables. These variables are predictive and complicate the process unnecessarily.

What is the point of a step function? ›

In mathematics, the step function is a function that has a constant value along given intervals, with the constant value varying between intervals. The name of this function comes from the fact that when you graph the function, it looks like a set of steps or stairs.

What is the role of step function? ›

With AWS Step Functions, you can create workflows, also called state machines, to build distributed applications, automate processes, orchestrate microservices, and create data and machine learning pipelines. Step Functions is based on state machines and tasks.

Top Articles
Latest Posts
Article information

Author: Kieth Sipes

Last Updated:

Views: 6544

Rating: 4.7 / 5 (67 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Kieth Sipes

Birthday: 2001-04-14

Address: Suite 492 62479 Champlin Loop, South Catrice, MS 57271

Phone: +9663362133320

Job: District Sales Analyst

Hobby: Digital arts, Dance, Ghost hunting, Worldbuilding, Kayaking, Table tennis, 3D printing

Introduction: My name is Kieth Sipes, I am a zany, rich, courageous, powerful, faithful, jolly, excited person who loves writing and wants to share my knowledge and understanding with you.