# Clever Algorithms

## Clever Algorithms: Statistical Machine Learning Recipes

By Jason Brownlee PhD.

# Stepwise Regression

Stepwise Regression, Stepwise Selection, Stepwise Multiple Linear Regression

## Taxonomy

Stepwise Regression is a Model Selection method for selecting a regression model. It was originally proposed as an extension to Multiple Linear Regression called Stepwise Multiple Linear Regression. It has since been generalized to other regression models, such as Logistic Regression for Stepwise Logistic Regression.

It is considered an improvement over Model Selection methods for Regression such as All Subsets, and is related to other Model Selection methods for Regression such as Ridge Regression. It is also related to Regularized Regression methods such as LASSO and the Least Angle Regression used to solve it.

## Strategy

The information processing objective of the technique is to result in a regression model that minimizes a specified measure, hypothesis test or criterion, for example F-tests in the original description, or more simply here as the minimum Sum Squared Residuals (SSR). A greedy hill-climbing algorithm that successively adds one of the remaining variables to the regression model that result in the largest reduction in the SSR and then refits the model. The procedure stops once no further improvements to the specified criterion can be achieved or there are no more variables to add to the model. This process alone is called Forward Model Selection. Conversely, the greedy algorithm may start with all variables and successively remove one that result in the largest decrease in the SSR. This approach alone is called Backward Selection.

Stepwise Regression applies Forward Model Selection as well as Backward Model Selection, allowing variables that have become redundant through the addition of other variables, to be removed from the model later in the process. This procedure requires the specification of threshold parameters for the minimum change in the of the selected measure for adding (F-to-enter) and removing (F-to-remove) variables to and from the model. The Forward method is applied until it can no longer add any more variables, then the Backward method is applied. This sequence repeats until the model stabilizes. One may start with no variables and successively grow a model, called Forward Stepwise Regression, or start with all variables and successively prune the model, called Backward Stepwise Regression.

## Heuristics

• Given the greedy nature of the selection procedure, neither Forward nor Backward Stepwise Regression generally result in the selection of an optimum model, but rather a sub-optimal model that is commonly close to the optimum model.
• It is typically applied to Linear Regression and Generalized Linear Regression Models.
• Modern implementations use hypothesis based tests to compare models (such as a t-test or an F-test) or criteria-based tests such as the Akaike Information Criterion (AIC) and the Bayes Information Criterion (BIC).
• Criterion tests for adding and removing variables such as AIC and BIC are preferred over hypothesis-based tests (such as the F-test) as the use of the latter have been discredited [Pope1972] [Wilkinson1979].
• If the errors are normally distributed, the Akaike Information Criterion (AIC) selection criteria can be used [Akaike1973].
• Using a cross-validated error score for making decisions to add and remove variables may result in an improved final model.
• Resulting models frequently fail to include all variables that have an influence on the dependent variable while also frequently including variables that do not influence the dependent variable [Derksen1992], meaning the final model is generally not the best model [Miller1984] (taken from [Mundry2009]).

## Code Listing

Listing (below) provides a code listing of Stepwise Multiple Linear Regression method in R. Figure (below) provides a plot of the training dataset with the line of best fit highlighted.

The example uses the lm function and the and step in the stats core package which are responsible for fitting linear models and performing stepwise selection respectively [RDCT2011a]. The step function uses a Akaike Information Criterion as the model evaluation criteria. For more information on this library type: library(help="stats"), and for more information on the function type: ?step.

The test problem is a four-dimensional dataset of 100 samples, where the x1, x2, x3 values are drawn from a uniformly random distribution $x \in [0,10]$ and y values are dependent on the x1 value plus a value drawn from a normally random distribution with a mean of 0 and a standard deviation of 1. The features x2 and x3 are random independent variables that have no relevant interaction x1 and y. In this example, y is considered the dependent variable and x1 the single relevant independent. The stepwise method is expected to discount x2 and x3 and select a linear model for y given x1.

# define 4 variable regression problem of 100 samples
regression_dataset <- function() {
x1 <- runif(100, 0, 10)
x2 <- runif(100, 1, 2) 	# random
x3 <- runif(100, 2, 3) 	# random
y <- x1 + rnorm(100) 	# dependent on x1
data <- data.frame(x1, x2, x3, y)
}

# get the data
data <- regression_dataset()
# split data in to train and test (67%/33%)
training_set <- sample(100,67)
train <- data[training_set,]
test <- data[(1:100)[-training_set],]

# create a linear regression model using ordinary least squares
base_model <- lm(
y~x1+x2+x3, # predict y given x1, x2 and x3
train, # training dataset
NULL, # no weighting on the variables
NULL, # no action on missing values
method="qr")	# QR decomposition (efficient matrix calculation method)

# apply the Stepwise Regression procedure
selected_model <- step(
base_model, # the model on which to operate
y~x1+x2+x3, # parameter relationships
0, # estimate the scale for the AIC statistic
"both", # use forward and backward selection
1, # provide debug information during the execution
NULL, # no filter function for models
1000, # maximum steps to execute
2) # Use AIC as the test criterion (use log(n) for BIC)

# summarize the selected linear regression model
summary(selected_model)
# display the selected variables
names(selected_model$model) # regression model diagnostic plots with the training data par(mfrow=c(2,2)) plot(selected_model) # plot the training data and the line of best fit plot(selected_model$model)
abline(selected_model$coef, lty=5, col="red") # make predictions for the test data predictions <- predict.lm(selected_model, test[,1:3]) # compute mean squared error print(mean((test$y-predictions)^2))
Example of Stepwise Multiple Linear Regression in R using the lm and step functions in the stats core package.
Download: stats_stepwise_linear_regression.R. Unit test available in the github project

Plot 2D training dataset with the line of best fit.

Other packages that provide stepwise selection include the stepAIC function of the MASS package. The leaps package does not provide stepwise selection, although does provides a related selection method called Regression Subset Selection that uses a branch and bound search.

## References

### Primary Sources

Efroymson is credited with Stepwise Multiple Linear Regression in which he used an F-test significance for the add and remove decisions [Efroymson1960]. Breaux provides an early and salient overview of Efroymson's Stepwise Multiple Regression in his technical report that promotes the method as computationally efficient over enumerating all subsets of regression models [Breaux1967].

Hocking provides an early and detailed review of forward and backward variable selection methods and of Stepwise Multiple Linear Regression [Hocking1976]. Miller provides an analysis on the convergence of Stepwise Regression [Miller1996]. Whittingham et al. provide a modern enumeration of all of the known problems and reasons against using Stepwise Multiple Regression for modelling, and fret about its commonplace use in the field ecology [Whittingham2006]. Mundry and Nunn provide a similar review and focus on the limitations of the method for statistical inference [Mundry2009]. Faraway provides a good example of stepwise procedures for regression with examples in R [Faraway2002] (page 125).

## Bibliography

 [Akaike1973] H. Akaike, "Information theory and an extension of the maximum likelihood principle", in Second International Symposium on Information Theory, 1973. [Breaux1967] H. J. Breaux, "On stepwise multiple linear regression", Technical Report 1369, Ballistic Research Labs, 1967. [Derksen1992] S. Derksen and H. J. Keselman, "Backward, forward and stepwise automated subset selection algorithms: frequency of obtaining authentic and noise variables", British Journal of Mathematical and Statistical Psychology, 1992. [Efroymson1960] M. A. Efroymson, "17. Multiple regression analysis", in Mathematical Methods for Digital Computers, pages 191, John Wiley & Sons Inc, 1960. [Faraway2002] J. J. Faraway, "Practical Regression and Anova in R", University of Michigan, 2002. [Hocking1976] R. R. Hocking, "The Analysis and Selection of Variables in Linear Regression", Biometrics, 1976. [Miller1984] A. J. Miller, "Selection of Subsets of Regression Variables", Journal of the Royal Statistical Society. Series A (General), 1984. [Miller1996] A. J. Miller, "The convergence of Efroymson's stepwise regression algorithm", The American Statistician, 1996. [Mundry2009] R. Mundry and C. L. Nunn, "Stepwise model fitting and statistical inference: turning noise into signal pollution.", The American naturalist, 2009. [Pope1972] P. T. Pope and J. T. Webster, "The use of an F-statistic in stepwise regression procedures", Technometrics, 1972. [RDCT2011a] R Development Core Team, "Package 'stats': The R Stats Package", 2011. [Whittingham2006] M. J. Whittingham and P. A. Stephens and R. B. Bradbury and R. P. Freckleton, "Why do we still use stepwise modelling in ecology and behaviour?", Journal of Animal Ecology, 2006. [Wilkinson1979] L. Wilkinson, "Tests of signiﬁcance in stepwise regression", Psychological Bulletin, 1979.

soon

soon

### Contribute

Please Note: This content was automatically generated from the book content and may contain minor differences.