Projection Pursuit Regression

DESCRIPTION:
Computes an exploratory nonlinear regression method that models y as a sum of nonparametric functions of projections of the x variables.

USAGE:
ppreg(x, y, min.term, max.term=min.term, wt=rep(1, nrow(x)),
      rwt=rep(1, ncol(y)), xpred=NULL, optlevel=2, bass=0, span="cv")

REQUIRED ARGUMENTS:
x:
matrix of explanatory variables. Rows represent observations, and columns represent variables. Missing values are not accepted. The ppreg function is not very useful if x contains only one column.
y:
vector or matrix of response variables. Rows represent observations, and columns represent variables. Missing values are not accepted.
min.term:
minimum number of terms to include in the model; ppreg will return complete results only for this minimum number of terms.

OPTIONAL ARGUMENTS:
max.term:
maximum number of terms to choose from in the model.
wt:
vector of weights for the observations. The length must be the same as the number of rows in x. Missing values are not accepted.
rwt:
vector of weights for the responses. The length must be the same as the number of columns in y. Missing values are not accepted.
xpred=:
vector or matrix of explanatory variables for which responses are to be estimated. If xpred is omitted, then the original x data will be regressed on, and the residuals will be returned in ypred. Missing values are not accepted.
optlevel=:
integer from 0 to 3 which determines the throughness of an optimization routine in ppreg. A higher number means more optimization.
bass=:
super smoother bass tone control used with automatic span selection (see supsmu); the range of values is 0 to 10, with increasing values resulting in increased smoothing.
span=:
super smoother span control (see supsmu). The default is "cv", which results in automatic span selection by local cross validation. span can also take a value from 0 < span <= 1.

VALUE:
a list containing the following components:
ypred:
matrix of predicted values for y given the matrix xpred. If xpred was not input, then ypred contains the residuals for the model fit.
fl2:
the sum of squared residuals divided by the total corrected sums of squares.
alpha:
a minterm by ncol(x) matrix of the direction vectors, alpha[m,j] contains the j-th component of the direction in the m-th term.
beta:
a minterm by ncol(y) matrix of term weights, beta[m,k] contains the value of the term weight for the m-th term and the k-th response variable.
z:
a matrix of values to be plotted against zhat. z[i,m] contains the z value of the i-th observation in the m-th model term, i.e., z equals x %*% t(alpha). The columns of z have been sorted.
zhat:
a matrix of function values to be plotted. zhat[i,m] is the smoothed ordinate value (phi) of the i-th observation in the m-th model term evaluated at z[i,m].
allalpha:
a three dimensional array, the [m,j,M] element contains the j-th component of the direction in the m-th model term for the solution consisting of M terms. Values are zero for M less than minterm.
allbeta:
a three dimensional array, the [m,k,M] element contains the term weight for the m-th term and the k-th response variable for the solution consisting of M terms. Values are zero for M less than minterm.
esq:
esq[M] contains the fraction of unexplained variance for the solution consisting of M terms. Values are zero for M less than minterm.
esqrsp:
matrix that is ncol(y) by maxterm containing the fraction of unexplained variance for each response. esqrsp[k,M] is for the k-th response variable for the solution consisting of M terms, for M ranging from min.term to max.term. Other columns are zero.

DETAILS:
The z component of the result is sorted, thus it can not be compared with the original data.

REFERENCES:
Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. Journal of the American Statistical Association 76, 817-823.

The chapter "Regression and Smoothing for Continous Response Data" in the S-PLUS Guide to Statistical and Mathematical Analysis.


SEE ALSO:
ace , avas , supsmu .

EXAMPLES:
x1 <- rnorm(100) ; x2 <- rnorm(100) ; eps <- rnorm(100, 0, .1)
x <- matrix(c(x1, x2), 100, 2)
y <- x1*x2 + eps
# Set up a matrix of predictor values.
xpred <- matrix(c(0, 0, 0, 1, 1, 0, 1, 1), 4, 2, byrow=T)
# Use ppreg with unit weights for both the observations and
# the response, and a 2 term regression model (picked from 3 terms).
a <- ppreg(x, y, 2, 3, xpred=xpred)

# Plot the function values versus their abscissas, to look for structure. matplot(a$z, a$zhat)