Create a Contingency Table from Factor Data

DESCRIPTION:
Create a multiway contingency table (a crosstabulation ) from a collection of factors.

USAGE:
crosstabs(formula, data=sys.parent(), margin=<<see below>>,
    subset, na.action=na.fail, drop.unused.levels=T)

REQUIRED ARGUMENTS:
formula:
a formula object with the terms, separated by + operators, on the right of the ~. Each term on the right hand side should be a factor, and will be converted to one if not. If there is a term to the left of the ~ it should be a vector of counts -- this useful for data that has already been tabulated. If the formula is omitted or is ~ . and the data argument is a data frame, then all the variables in data will be crosstabulated.

OPTIONAL ARGUMENTS:
data:
A dataframe or frame number telling where the variables named in the formula (and in the subset argument) may be found. If a variable is not found by searching in the dataframe or frame given by data, it is expected to be on the search list.
subset:
Expression telling which subset of the rows of the data should be used in the table. It can be an expression that evaluates to a logical vector, or a vector of logical values, or a vector of row numbers or row names---in short, anything you would normaly use to subscript the rows of a data frame. The variable names in the expression should be names in the same place supplied by the data argument, otherwise they will be looked for on the search list. All observations are included by default.
margin:
a list of (possibly empty) vectors of integers. describing which marginal proportions to calculate (and print). The integers must be in the range 1 to the number of variables to be cross-tabulated, and repeated values within a vector are not allowed. The names of the list are the labels to put in the legend printed with the table.

Each element of the list gives a vector of dimension numbers to not sum over when computing denominators for various proportions of cell count to marginal totals. E.g., 1 means to calculate row sums and integer(0) means to calculate the grand sum. The default for a two way crosstabulation is list("Row%"=1, "Col%"=2, "Total%"=integer(0)) and that for a one way table is list("Total%"=integer(0)). For higher dimensional crosstabulations, the default results in printing the row and column proportions for each layer-- list("N/RowTotal" = setdiff(i, 2), "N/ColTotal" = setdiff(i, 1), "N/Total" = integer(0)) where i is 1:number.of.factors. The margin argument here is similar to that in loglin.

na.action:
A function for handling missing values. If there are any missing values in the data to be crosstabulated, the data will be put into a data frame and passed to the function given by na.action. The default is na.fail, which issues a fatal error message describing the problem. A common alternative is na.omit, which deletes cases with NAs in any of the variables to be crosstabulated. na.include will add the level NA to each factor before crosstabulating them (the formula may also include terms like na.include(x) to do this only for certain variables).
drop.unused.levels:
If TRUE (the default) then any unused levels in factors will be omitted from the table. If FALSE, they will not be dropped and the table will contain rows or columns of zeros for those unused levels. This will cause the marginal proportions for those levels and the overall chi-squared statistic to be NA's, but may be useful for making parallel tables of similar data sets.

VALUE:
An object of class crosstabs. This is an array of counts, suitable for use in functions like loglin. It also has an attribute marginals, a list of arrays of the marginal proportions specified by the margin argument. (These arrays are stacked by the print method for crosstabs so that corresponding entries lie near each other.) It also may have an attribute na.message, giving a message that the na.action function sometimes gives when it deals with missing values in the data (e.g., na.omit will supply a na.message telling how many cases were ignored).

DETAILS:
This function provides a convenient interface to the table and tapply functions. If you wish to do more than tabulate data, say compute means or sums of crossclassified data, try tapply.

NOTE:
The printing method, print.crosstabs, will generally add row and column totals for each 2 dimensional layer of the table and will compute an overall chi squared statistic to test independence of all the variables in the table. If you want to omit them you may by calling print.crosstabs directly.

BUGS:
The formula could be used to describe the marginal proportions and tests to perform but does not yet. Hence all terms should be addends in the formula.

SEE ALSO:
cut , factor , loglin , print.crosstabs , tabulate , table , tapply .

EXAMPLES:
crosstabs(~Solder+Opening, data=solder, subset = skips>10)

# produces the following output: Call: crosstabs( ~ Solder + Opening, data = solder, subset = skips > 10) 158 cases in table +----------+ |N | |N/RowTotal| |N/ColTotal| |N/Total | +----------+ Solder |Opening |S |M |L |RowTotl| -------+-------+-------+-------+-------+ Thin |99 |15 | 9 |123 | |0.805 |0.122 |0.073 |0.78 | |0.805 |0.577 |1.000 | | |0.627 |0.095 |0.057 | | -------+-------+-------+-------+-------+ Thick |24 |11 | 0 |35 | |0.686 |0.314 |0.000 |0.22 | |0.195 |0.423 |0.000 | | |0.152 |0.070 |0.000 | | -------+-------+-------+-------+-------+ ColTotl|123 |26 |9 |158 | |0.778 |0.165 |0.057 | | -------+-------+-------+-------+-------+ Test for independence of all factors Chi^2 = 9.18309 d.f.= 2 (p=0.01013719) Yates' correction not used Some expected values are less than 5, don't trust stated p-value

# example #2

data <- data.frame(Pet=c("Dog","Dog","Cat","Cat","Cat"), Food=c("Wet","Wet","Dry","Wet",NA)) crosstabs(data=data, na.action=na.omit)

# produces the following output: Call: crosstabs(data = data, na.action = na.omit) 4 cases in table Dropping 1 cases because of missing values +----------+ |N | |N/RowTotal| |N/ColTotal| |N/Total | +----------+ Pet |Food |Dry |Wet |RowTotl| -------+-------+-------+-------+ Cat |1 |1 |2 | |0.50 |0.50 |0.5 | |1.00 |0.33 | | |0.25 |0.25 | | -------+-------+-------+-------+ Dog |0 |2 |2 | |0.00 |1.00 |0.5 | |0.00 |0.67 | | |0.00 |0.50 | | -------+-------+-------+-------+ ColTotl|1 |3 |4 | |0.25 |0.75 | | -------+-------+-------+-------+ Test for independence of all factors Chi^2 = 1.333333 d.f.= 1 (p=0.2482131) Yates' correction not used Some expected values are less than 5, don't trust stated p-value