Cost-complexity Pruning of Tree Object

DESCRIPTION:
Determines a nested sequence of subtrees of the supplied tree by recursively "snipping" off the least important splits, based upon the cost-complexity measure. prune.misclass is an abbeviation for prune.tree(method ="misclass") for use with the S-PLUS function cv.tree.

If k is supplied, the optimal subtree is returned.


USAGE:
prune.tree(tree, k = NULL, best = NULL, newdata, method =
                       c("deviance", "misclass")[1], eps = 1e-3)
prune.misclass(tree, k = NULL, best = NULL, newdata, eps = 1e-3)

REQUIRED ARGUMENTS:
tree:
fitted model object of class "tree". This is assumed to be the result of some function that produces an object with the same named components as that returned by the tree() function.

OPTIONAL ARGUMENTS:
k:
cost-complexity parameter defining either a specific subtree of tree (k a scalar) or the (optional) sequence of subtrees minimizing the cost-complexity measure (k a vector). If missing, k is determined algorithmically.
best:
integer specifying the size (ie number of terminal nodes) of a specific subtree in the cost-complexity sequence to be returned. This is an alternative way to select a subtree than by supplying a scalar cost-complexity parameter k.
newdata:
data frame upon which the sequence of cost-complexity subtrees is evaluated. If missing, the data used to grow the tree are used.
method:
character string denoting the measure of node heterogeneity used to guide cost-complexity pruning. For regression trees, only the default, deviance is accepted. For classification trees, the default is deviance and the alternative is misclass (misclassification error rate).
eps:
a lower bound for the probabilities, used if events of predicted probability zero occur in newdata.

VALUE:
if k is supplied and is a scalar, a tree object is returned that minimizes the cost-complexity measure for that k. If best is supplied, a tree object of size best is returned. Otherwise, an object of class tree.sequence is returned. The object contains the following components:
size:
number of terminal nodes in each tree in the cost-complexity pruning sequence.
deviance:
total deviance of each tree in the cost-complexity pruning sequence.
k:
the value of the cost-complexity pruning parameter of each tree in the sequence. If determined algorithmically, i.e. not as an input, its first value defaults to -Inf, its lowest possible bound.
method:
a character string, either "deviance" or "misclass" depending on the input value of method.

DETAILS:
The response as well as the predictors referred to in the right side of the formula in tree must be present by name in newdata. These data are dropped down each tree in the cost-complexity sequence and deviances calculated by comparing the supplied response to the prediction. The function cv.tree() routinely uses the newdata argument in cross-validating the pruning procedure. A plot() method exists for objects of this class. It displays the value of the deviance or number of mis-classifications for each subtree in the cost-complexity sequence. An additional axis displays the values of the cost-complexity parameter at each subtree.

SEE ALSO:
deviance.tree , misclass.tree , plot.tree , tree

EXAMPLES:
z.auto <- tree(Mileage ~ Weight, car.test.frame)

zp <- prune.tree(z.auto) # determine the cost complexity pruning sequence plot(zp) # plot it on current graphics device z5.auto <- prune.tree(z.auto, best = 5) # select the best 5 node subtree