Fit a Smoothing Spline

DESCRIPTION:
Fits a cubic B-spline smooth to the input data.

USAGE:
smooth.spline(x, y, w = <<see below>>, df = <<see below>>,
              spar = 0, cv = F, all.knots = F, df.offset = 0,
              penalty = 1)

REQUIRED ARGUMENTS:
x:
values of the predictor variable. There should be at least ten distinct x values.

x and y can be supplied in a variety of different forms, along the lines of the function plot; e.g., a list with components x and y, a two-column matrix, or simply a single vector, taken to be a time series if it is not complex..


OPTIONAL ARGUMENTS:
y:
response variable, of the same length as x.
w:
vector of weights for weighted smoothing, of the same length as x and y. If measurements at different values of x have different variances, w should be inversely proportional to the variances. The default is that all weights are equal.
df:
a number which supplies the degrees of freedom = trace(S) rather than a smoothing parameter. Here S is the implicit smoother matrix. If both df and spar are supplied, spar is used unless it is 0, in which case df is used.
spar:
the coefficient of the integrated second squared derivative penalty function (commonly denoted by lambda) for normalized data. Note that lambda=((max(x)-min(x))^3)*mean(w)*spar for non-normalized data. If spar is 0 or missing and df is missing, cross-validation is used to automatically select spar. If a value of spar greater than zero is supplied, it is used as the smoothing parameter.
cv:
indicates whether the ordinary (TRUE) or generalized (FALSE) cross validation score should be computed.
all.knots:
if FALSE, a suitable fine grid of knots is chosen, usually less in number than the number of unique values of x. If TRUE, the unique values of x are used as knots.
df.offset:
allows an offset to be added to the df term used in the calculation of the GCV criterion: df=tr(S) + df.offset.
penalty:
allows the df quantity used in GCV to be charged a cost = penalty per degree of freedom.

VALUE:
an object of class smooth.spline is returned, consisting of the fitted smoothing spline evaluated at the supplied data, some fitting criteria and constants, and a structure that contains the essential information for computing the spline and its derivatives for any values of x. The components of the returned list are:
x:
ordered distinct x values
y:
smoothing spline fits corresponding to x.
w:
weights used in the fit. This has the same length as x, and in the case of ties, will consist of the accumulated weights at each unique value of x.
yin:
y-values used at the unique x values (weighted averages of input y)
lev:
leverage values, which are the diagonal elements of the smoother matrix S.
cv.crit:
cross validation score (either GCV or CV).
pen.crit:
penalized criterion.
df:
degrees of freedom of the fit estimated by the sum of lev. If df was supplied as the smoothing parameter, then the prescribed and resultant values of df should match within 0.1 percent of the supplied df.
spar:
smoothing parameter used in the fit (useful if df was used to specify the amount of smoothing).
fit:
list containing details of the fits (knot locations, coefficients, etc.) to be used by predict.smooth.spline.
call:
the call that produced the fit

DETAILS:
The two arguments df.offset and penalty are experimental and typically will not be used. If used, the GCV criterion is RSS/(n - (penalty*(trace(S)-1) + df.offset +1)).

A cubic B-spline is fit with care taken to insure that the algorithm runs linear in the number of data points. For small data vectors (n<50), a knot is placed at every distinct data point, and the regression is fit by penalized least squares. For larger data sets the number of knots is chosen judiciously in order to keep the computation time manageable (if all.knots=F). The penalty spar can be chosen automatically by cross-validation (if spar=0), can be supplied explicitly, or supplied implicitly via the more intuitive df number.


REFERENCES:
Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Chapman and Hall, London.

SEE ALSO:
predict.smooth.spline , print.smooth.spline .

EXAMPLES:
attach(air)
plot(ozone,temperature)
lines(smooth.spline(ozone,temperature))
lines(smooth.spline(ozone,temperature, df = 5), lty = 2)

# smoothing spline fit and approximate 95% "confidence" intervals # need to create object x and y

fit <- smooth.spline(x, y) # smooth.spline fit res <- (fit$yin - fit$y)/(1-fit$lev) # jackknife residuals sigma <- sqrt(var(res)) # estimate sd

upper <- fit$y + 2.0*sigma*sqrt(fit$lev) # upper 95% conf. band lower <- fit$y - 2.0*sigma*sqrt(fit$lev) # lower 95% conf. band matplot(fit$x, cbind(upper, fit$y, lower), type="plp", pch=".")