Principal Components Analysis

DESCRIPTION:
Returns an object of class "princomp" containing the standard deviations of the principal components, the loadings, and, optionally, the scores.

USAGE:
princomp(x, data=NULL, covlist=NULL, scores=T, cor=F,
         na.action=na.fail, subset=T)

REQUIRED ARGUMENTS:
at least one of x, data or covlist must be given.

OPTIONAL ARGUMENTS:
x:
a matrix, data frame or formula. If a matrix, the columns should correspond to variables and the rows to observations. If a formula, no variables may appear on the left (response) side.
data:
a data frame or matrix. This is usually used only when x is a formula, though it may be used instead of x.
covlist:
a list of the form returned by cov.wt and cov.mve. Components must include center and cov. A cor component will not be used, however, an n.obs component will be used if present.
scores:
logical value, or an integer. If scores is TRUE, then a matrix of the scores for all of the components is returned. If scores is numeric, then scores and loadings for the first scores components are returned. If scores is FALSE, then no scores are computed.
cor:
logical flag: if TRUE, then the principal components are based on the correlation matrix rather than the covariance matrix. That is, the variables are scaled to have unit variance.
na.action:
function to handle missing values. The default is to create an error if missing values are found.
subset:
the subset of the observations to use.

VALUE:
an object of class "princomp" which is a list with components:
sdev:
vector of standard deviations of the principal components.
loadings:
orthogonal matrix of class "loadings" giving the loadings. The first column is the linear combination of columns of x defining the first principal component, etc. If argument scores was numeric, then this contains that number of columns.
n.obs:
the number of observations on which the estimates are based. This may not be present if covlist was used.
scores:
the scores of some or all of the principal components for the observations.
center:
vector of centers for the variables.
scale:
vector of numbers by which the variables are scaled. These are all 1 if cor is FALSE. If cor is TRUE, then scales will be the square roots of the diagonal of the cov component of covlist, if present, and otherwise it is the standard deviations of the input data variables.
terms:
the terms object of the formula. This is not present if a formula was not used.
call:
an image of the call to princomp.

DETAILS:
If cor is TRUE, then the data, if it exists, is standardized by the scales.

BACKGROUND:
Principal component analysis defines a rotation of the variables of x. The first derived direction (a linear combination of the variables) is chosen to maximize the standard deviation of the derived variable, the second to maximize the standard deviation among directions uncorrelated with the first, etc.

Principal component analysis is often used as a data reduction technique, sometimes in conjunction with regression. If the variables are not all in the same units, it is advisable to scale the columns of the input before performing the principal component analysis since a variable with large variance relative to the others will dominate the first principal component.


REFERENCES:
Many multivariate statistics books (and some regression texts) include a discussion of principal components. Below are a few examples:

Dillon, W. R. and Goldstein, M. (1984). Multivariate Analysis, Methods and Applications. Wiley, New York.

Johnson, R. A. and Wichern, D. W. (1982). Applied Multivariate Statistical Analysis. Prentice-Hall, Englewood Cliffs, New Jersey.

Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic Press, London.


SEE ALSO:
princomp.object , loadings , biplot, screeplot , plot.loadings , svd , cancor .

EXAMPLES:
princomp(prim4)

# use a robust estimate of the covariances and scale the variables prim4.pcr <- princomp(prim4, covlist=cov.mve(prim4), cor=T)

screeplot(prim4.pcr) plot(loadings(prim4.pcr)) print(loadings(prim4.pcr), cutoff=.5)

princomp(~pre.mean + post.mean + pre.dev + post.dev, data=wafer)