Canonical Correlation Analysis

DESCRIPTION:
Finds linear relationships between two groups of multivariate data. By default the data is centered using means.

USAGE:
cancor(x, y, xcenter=<<see below>>, ycenter=<<see below>>)

REQUIRED ARGUMENTS:
x,y:
two matrices of data. The number of rows (which represent observations) must be the same in each. missing values are not accepted.

OPTIONAL ARGUMENTS:
xcenter:
controls the centering applied to the columns of x before computing the canonical analysis. If TRUE or if the argument is missing, column means are removed. If FALSE, no centering is done. If the argument is numeric, the numeric values are removed from the corresponding columns.
ycenter:
controls the centering of the columns of y analogously to xcenter for the columns of x.

VALUE:
list representing the canonical correlation analysis:
cor:
vector of the correlations between the pairs of variables.
xcoef:
the matrix of linear combinations of the columns of x. The first column of xcoef is the linear combination of columns of x corresponding to the first canonical correlation, etc.
ycoef:
matrix like xcoef, but originating from y, i.e., The first column of ycoef is the linear combination of columns of y corresponding to the first canonical correlation, etc..
xcenter:
vector of values subtracted from the columns of x.
ycenter:
vector of values subtracted from the columns of y.

BACKGROUND:
Canonical correlation seeks a linear combination of one set of variables and a linear combination of a second set of variables such that the correlation is maximized. It is similar to regression, which seeks a linear combination of a set of variables that maximizes the correlation with a single (response) variable.

The second and higher canonical correlations find linear combinations that maximize the correlation subject to being uncorrelated with previous canonical variables. The number of canonical correlations is the minimum of the number of variables in the two sets.


REFERENCES:
Many multivariate statistics books have discussions of canonical correlation. Examples include:

Dillon, W. R. and Goldstein, M. (1984). Multivariate Analysis, Methods and Applications. Wiley, New York.

Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic Press, London.


SEE ALSO:
lsfit , prcomp .

EXAMPLES:
#canonical decomposition with column means swept out
cancor(x, y)

#canonical decomposition with column medians of x subtracted out, y as is cancor(x, y, apply(x, 2, median), F)

soil <- evap.x[,1:3] air <- evap.x[,-1:-3] cc.airsoil_cancor(air, soil) can.air <- air %*% cc.airsoil$xcoef can.soil <- soil %*% cc.airsoil$ycoef plot(can.air[,1], can.soil[,1], xlab="first air canonical variable", ylab="first soil canonical variable")

par(mfrow=c(2, 1)) barplot(cc.airsoil$xcoef[,1], ylab="first air loadings", names=dimnames(air)[[2]], density=20) barplot(cc.airsoil$ycoef[,1], ylab="first soil loadings", names=dimnames(soil)[[2]], density=20, space=1.4)