Fit an Analysis of Variance Model

DESCRIPTION:
Returns an object of class "aov", "aovlist" or "maov" that contains the analysis of variance for the specified model.

USAGE:
aov(formula, data = <<see below>>, projections = F, qr = F,
    contrasts = NULL, ...)

REQUIRED ARGUMENTS:
formula:
formula or terms describing the model.

OPTIONAL ARGUMENTS:
data:
if supplied, a data frame in which the objects named in the formula are to be found. If data is omitted, the current search list is used to find the objects in formula; frequently, a data frame will have been attached.
projections:
logical flag: if TRUE, the result will include a projections component, i.e., the result of a call to proj. This adds substantially to the size of the returned object (a matrix with as many rows as observations and as many columns as there are terms in the model), but if you plan to use the projections, it is more efficient to compute them during the fit rather than by calling proj later.
qr:
logical flag: should the orthogonal decomposition be returned? See lm.fit.qr. If you can't imagine why you would need this, you don't.
contrasts:
a list of contrasts to be used for some or all of the factors appearing as variables in the model formula. The names of the list should be the names of the corresponding variables, and the elements should either be contrast-type matrices (matrices with as many rows as levels of the factor and with columns linearly independent of each other and of a column of one's), or else they should be functions that compute such contrast matrices.
...:
arguments to be passed to lm. In particular, the argument na.action can be a function that filters missing values from a data frame, and subset can be a vector for selecting observations (rows) from a data frame.

VALUE:
an object describing the fit. There are two cases:

if there is no Error term in the model, the object is of class "aov" (or "maov" for multiple response models). This class inherits from the class of linear models, class "lm" ("mlm") and has the following components:

coefficients:
the coefficients of the least squares fit of the response(s) on the model matrix. The column names of the matrix of coefficients are the names of the single-degree-of-freedom effects (the linearly independent columns of the model matrix).
residuals:
the residuals from the fit.
fitted.values:
the fitted values for the model.
effects:
orthogonal, single-degree-of-freedom effects. Note that these are always an orthogonal transformation of the response. These effects do not depend on any property of the design (e.g., balance) for their orthogonality. Caution: these are NOT what is called the effects in 2 to the k designs, that type of effects is defined as twice the coefficients.
rank:
the computed rank (number of estimable effects) for the model.
assign:
the list of assignments of coefficients (and effects) to the terms in the model. The names of this list are the names of the terms. The ith component of the list is the vector saying which coefficients correspond to the ith term. It may be of length zero if there were no estimable effects for the term.
R:
part of the decomposition of the design matrix (the R of the QR decomposition by default). This is of little direct interest to the general user.
terms:
an object of mode "expression" and class "terms" summarizing the formula. This is used by various methods, but not typically of direct relevance to users.
call:
an image of the call that produced the object, but with the arguments all named and with the actual formula included as the formula argument.

if there is an Error term in the model, then the object returned by aov has class "aovlist" and is a list of aov objects of the form above (without call or terms components), one for each stratum. This list has attributes call and terms as described above.


DETAILS:
The aov function fits analysis of variance models, typically from designed experiments. Usually, the variables on the right-hand side are of class factor or ordered, that is, they are categorical. Numerical variables are also allowed on the right-hand side of the formula. The order in which numeric variables appear will undoubtedly be important for the interpretation of the model.

Use the summary function on the output of aov to see the anova table for the model.

FORMULAS. A plus sign (+) separates terms in the formula. Specify an interaction with a colon; for example, A:B is the interaction between factor A and factor B. The * operator gives the interaction plus the main effects, so A*B*C expands to three main effects, three two-factor interactions and one three-factor interaction. The term B %in% A means that B is nested within A; and A/B expands to A + B %in% A. Terms may be subtracted from the model if they are specified elsewhere in the formula, e.g., A*B*C - B:C contains only two two-factor interactions. The precedence of these operators follows the usual S Language precedence.

UNBALANCED MODELS. If effects are not orthogonal, then the order in the model is significant. For example, A*B will give different sums of squares than B*A if there is imbalance in the data. The aov function produces sequential sums of squares (Type I in the notation of SAS GLM).

MULTIPLE STRATA. The formula may optionally specify special blocking or error structure if it includes a term that calls the special function Error. For example, response ~ time * concentration + Error(blocks) specifies that factor blocks defines an error stratum. The resulting model will include two error strata, blocks and Within. In the case of multiple error strata, aov fits a separate model for each stratum. The response is projected onto each term in the error model, and these projections are then used to fit separate models. There must only be one Error term in a formula; however, there may be more than one term inside the error function.

For example, the Error term for a split-plot design would be: Error(plots) while the Error term for a split-split-plot would be: Error(plots + subplots) The order of the terms inside of Error is important. See Heiberger (1989) for more on error strata.

REPEATED MEASURES. Using an error stratum is also the way to produce a univariate analysis of a repeated measures design. The appropriate Error term for a design in which "subject" is the repeated measure would be: Error(subject)


REFERENCES:
Books on the analysis of variance include:

Box, G. E. P., Hunter, W. G. and Hunter, J. S. (1978). Statistics for Experimenters. New York: Wiley.

Daniel, C. (1976). Applications of Statistics to Industrial Experimentation. New York: Wiley.

Heiberger, R. M. (1989). Computation for the Analysis of Designed Experiments. New York: Wiley.

Hicks, C. R. (1982). Fundamental Concepts in the Design of Experiments. Third Edition. New York: Holt, Rinehart and Winston.

Scheff', H. (1959). The Analysis of Variance. New York: Wiley.


SEE ALSO:
alias , aov.object , design.table , fac.design , factor , friedman.test , kruskal.test , lm , model.matrix, oa.design , ordered , plot.design , plot.factor , proj , raov , summary , varcomp .

EXAMPLES:
# fit main effects and 2 factor interactions
cat.aov2 <- aov(Yield ~ .^2, catalyst)
summary(cat.aov2) # look at anova table

gun.aov <- aov(Rounds ~ Method + Physique/Team, gun)

aov(Yield ~ Temp * Pressure + Method) # uses an attached data frame aov(Yield ~ Temp * Pressure + Method, exp1, na.action=na.omit)

attach(guayule) # split plot design aov(plants ~ variety * treatment + Error(flats))

tgaov <- aov(plants ~ variety * treatment + Error(flats), guayule, contrasts = list(treatment = contr.treatment))