Apply a Function to a Ragged Array

DESCRIPTION:
Applies a function to each cell of a ragged array, that is, to the values corresponding to the same levels in all of several categories.

USAGE:
tapply(X, INDICES, FUN=<<see below>>, ..., simplify=T)

REQUIRED ARGUMENTS:
X:
vector of data to be grouped by indices. Missing values (NAs) are allowed if FUN accepts them.
INDICES:
list whose components are interpreted as categories, each of the same length as X. The elements of the categories define the position in a multi-way array corresponding to each X observation. Missing values (NAs) are allowed. The names of INDICES are used as the names of the dimnames of the result. If a vector is given, it will be treated as a list with one component.

OPTIONAL ARGUMENTS:
FUN:
function or character string giving the name of the function to be applied to each cell. If FUN is omitted, tapply returns a vector that can be used to subscript the multi-way array that tapply normally produces. This vector is useful for computing residuals. See the example.
...:
optional arguments to be given to each invocation of FUN.
simplify:
If FALSE, tapply will always return an array of mode list. If TRUE (the default), then if FUN always returns a scalar the tapply will return an array with the mode of the scalar, and if the array would be one dimensional then the dimension is removed, to make it a vector. (simplify is ignored if FUN is not supplied.)

VALUE:
if FUN is missing, a vector of indices is returned. These are the indices giving the position in the array that would be returned if FUN were not missing.

When FUN is present, tapply calls FUN for each cell that has any data in it. If FUN returns a single atomic value for each cell (e.g. functions mean or var), then tapply returns a multi-way array containing the values. The array has the same number of dimensions as INDICES has components; the number of levels in a dimension is the number of levels in the corresponding component of INDICES. This is a vector if INDICES has only one component.

If FUN does not return a single atomic value, tapply returns an array of mode "list", whose components are the values of the individual calls to FUN. Another way of saying this is that the result is a list that has a dim attribute (this prints as a list, but you can subscript it like an array).


DETAILS:
Evaluates a function, FUN, on data values that correspond to each cell of a multi-way array.

SEE ALSO:
table returns an array that is the counts of occurrences in the cells. loglin performs a log-linear analysis on an array.

apply is used to apply a function to sections of an array; lapply and sapply apply a function to a list.


EXAMPLES:
tapply(income, list(cut(age, 5), gender), mean)
# 5 by 2 matrix of the mean income for each age-gender combination

# generate mean republican votes for regions of the U.S. # category that gives the region for each observation region <- state.region[row(votes.repub)] election <- category(votes.year)[col(votes.repub)] mn <- tapply(votes.repub,list(region,election),mean) round(mn,1) # table of mean vote by region and election positions <- tapply(votes.repub,list(region,election)) # positions is a vector of indices for mn (treated as a vector) residuals <- votes.repub - mn[positions]