S-PLUS help

Fit a Regression or Classification Tree

DESCRIPTION:: Grows a tree object from a specified formula and data.

USAGE:

tree(formula, data=<<see below>>, weights=<<see below>>,
      subset=<<see below>>, na.action=na.fail,
      method="recursive.partition", control=<<see below>>,
      model=NULL, x=F, y=T, ...)

REQUIRED ARGUMENTS:

formula:: a formula expression as for other regression models, of the form response ~ predictors.

OPTIONAL ARGUMENTS:

data:: a data frame in which to interpret the variables named in the formula, subset, and weights arguments. If this is missing, the variables should be on the search list. This can also be a number which indicates the frame in which to look for the data.
weights:: vector of observation weights; this must be the same length as the number of observations. Zero weights are allowed although the subset argument is preferred for deleting observations. By default, an unweighted analysis is performed.
subset:: expression saying which subset of the rows of the data should be used in the fit. This can be a logical vector (which is replicated to have length equal to the number of observations), or a numeric vector indicating which observation numbers are to be included, or a character vector of the row names to be included. All observations are included by default.
na.action:: a function to filter missing data. This is applied to the model.frame after any subset argument has been used. The default (with na.fail) is to create an error if any missing values are found. A possible alternative is na.omit, which deletes observations that contain one or more missing values.
method:: character string stating the method to use. If method is "model.frame", the function returns the model frame used to build the tree.
control:: a list of iteration and algorithmic constants. See tree.control for their names and default values. These can also be set as arguments to tree itself. (You should not set the nobs argument to tree.control, as that is set by tree and is the number of observations used to build the tree.)
model:: if this argument is itself a model frame, then the formula and data arguments are ignored, and model is used to define the model.
x:: logical flag: if TRUE, the model.matrix is returned.
y:: logical flag: if TRUE, the response variable is returned.
...:: additional arguments for the fitting routines such as tree.control.

VALUE:: an object of class tree is returned. See tree.object for details.

DETAILS:

The model is fitted using binary recursive partitioning whereby the data are successively split along coordinate axes of the predictor variables so that at any node, the split which maximally distinguishes the response variable in the left and the right branches is selected. Splitting continues until nodes are pure or data are too sparse; terminal nodes are called leaves, while the initial node is called the root.

If the response variable is a factor, the tree is called a classification tree. The model used for classification assumes that the response variable follows a multinomial distribution. weights are not used in the computation of the deviance in classification trees.

If the response variable is numeric, the tree is called a regression tree. The model used for regression assumes that the numeric response variable has a normal (Gaussian) distribution. weights are used if they are specified. See Statistical Models in S for a more detailed discussion of the difference between regression and classification trees.

This function allows up to 128 levels for factor response variables. Factor predictor variables have a limit of 32 levels because if a factor predictor has k levels then the 2^(k-1)-1 splits which must be examined impose severe demands on the system.

The fitted model can be examined by print, summary, and plot. Its contents can be extracted using predict, residuals, deviance, and formula. It can be modified using update. Other generic functions that have methods for tree objects are text, identify, browser, and [.

REFERENCES:: Breiman L., Friedman J.H., Olshen R.A., and Stone, C.J., (1984). Classification and Regression Trees. Wadsworth International Group, Belmont CA. Chambers, J.M., and Hastie, T.J. (1991). Statistical Models in S, pg. 414.

SEE ALSO:: tree.object , tile.tree , snip.tree , select.tree , meanvar.tree, misclass.tree , path.tree , hist.tree .

EXAMPLES:

# fit regression tree to all variables
z.solder <- tree(skips ~ ., data = solder.balance)

# fit classification tree to data in kyphosis data frame
z.kyphosis <- tree(kyphosis)