Build a GAM Model in a Step-Wise Fashion

DESCRIPTION:
Uses a stepwise search to select the best (in terms of AIC) GAM model, given a range of models to consider.

USAGE:
step.gam(object, scope, scale, direction, trace = T, keep, steps)

REQUIRED ARGUMENTS:
object:
An object of class gam or any of its inheritants.
scope:
defines the range of models examined in the step-wise search. It is a list of formulas, with each formula corresponding to a term in the model. A 1 in the formula allows the additional option of leaving the term out of the model entirely.

OPTIONAL ARGUMENTS:
scale:
an optional argument used in the definition of the AIC statistic used to evaluate models for selection. By default, the scaled Chi-squared statistic for the initial model is used, but if forward selection is to be performed, this is not necessarily a sound choice.
direction:
The mode of step-wise search, can be one of "both", "backward", or "forward", with a default of "both".
trace:
If TRUE, information is printed during the running of step.gam. This is an encouraging choice in general, since step.gam can take some time to compute either for large models or when called with an an extensive scope argument. A simple one line model summary is printed for each model visited in the search, and the selected model is noted at each step.
keep:
A filter function whose input is a fitted gam object and the associated "AIC" statistic, and whose output is arbitrary. Typically keep will select a subset of the components of the object and return them. The default is not to keep anything.
steps:
The maximum number of steps to be considered. The default is 1000 (essentially as many as required). It is typically used to stop the process early.

VALUE:
The step-wise-selected model is returned, with up to two additional components. There is an "anova" component corresponding to the steps taken in the search, as well as a "keep" component if the keep= argument was supplied in the call.

DETAILS:
Each of the formulas in scope specifies a "regimen" of candidate forms in which the particular term may enter the model. For example, a term formula might be ~ Income + log(Income) + s(Income) This means that Income could either appear linearly, linearly in its logarithm, or as a smooth function estimated nonparametrically. Every term in the model is described by such a term formula, and the final model is built up by selecting a component from each formula.

The supplied model object is used as the starting model, and hence there is the requirement that one term from each of the term formulas be present in formula(object). This also implies that any terms in formula(object) not contained in any of the term formulas will be forced to be present in every model considered.

While step.glm uses score-test approximations to speed up the search, step.gam forgoes this speedup in favor of greater generality. We describe the most general setup, when direction="both". At any stage there is a current model comprising a single term from each of the term formulas supplied in the scope argument. A series of models is fitted, each corresponding to a formula obtained by moving each of the terms one step up or down in its regimen, relative to the formula of the current model. If the current value for any term is at either of the extreme ends of its regimen, only one rather than two steps can be considered. So if there are p term formulas, at most 2*p - 1 models are considered. A record is kept of all the models ever visited (hence the -1 above), to avoid repetition. Once each of these models has been fitted, the best in terms of the AIC statistic is selected and defines the step. The entire process is repeated until either the maximum number of steps has been used, or until the AIC criterion can not be decreased by any of the eligible steps.


SEE ALSO:
step.glm, step , gam , drop1 , add1 .

EXAMPLES:
step(gam.object, scope=list(
        "Age" = ~ 1 + Age + log(Age),
        "BP"  = ~ 1 + BP + poly(BP,2) + s(BP),
        "Chol" = ~ s(Chol, df = 4) + s(Chol, df = 7)
        ))