Pearson's Chi-square Test for Count Data

DESCRIPTION:
Performs a Pearson's chi-square test on a two-dimensional contingency table.

USAGE:
chisq.test(x, y=NULL, correct=T)

REQUIRED ARGUMENTS:
x:
either a factor, a category object or a two-dimensional contingency table in matrix form. If x is a matrix, it must have at least two rows and two columns, all elements must be non-negative, and NAs or Infs are not allowed. The elements of matrix x should be whole numbers, as the test is based on counts; however, since all computations are carried out to double precision accuracy where possible, the storage mode of matrix x will be coerced to "double". For restrictions on x when it is a factor or a category object, see argument y.

OPTIONAL ARGUMENTS:
y:
factor or category object. If x is a matrix, y is ignored. If x is a factor or category object, y is required and must have the same length as x. Both factor/category objects must have at least two levels. NAs in the category index vectors are allowed, but pairs (x[i],y[i]) containing these will be removed. Each element of the index vectors of x and y should give the membership of that observation in one of the groups present in the levels attributes; an NA in an index vector means that the observation is not in one of the groups listed for that object. Infs have no meaning as indices, and should not be present.

Conversely, if x or y is not a factor/category object (and x is not a matrix), it will be coerced to one implicitly. In this case pairs (x[i],y[i]) containing NAs will be removed, but not pairs with Infs. Coercion of x and y in this manner is intended for datasets of mode numeric, whose elements are typically small integers; data in the form of character vectors should first be made into factor or category objects.

correct:
logical flag: if TRUE, Yates' continuity correction will be applied, but only for dichotomous categories (2 by 2 tables).

VALUE:
A list of class "htest", containing the following components:

statistic:
Pearson's X-squared statistic with names attribute "X-squared". See section DETAILS for a definition.
parameters:
the degrees of freedom of the asymptotic chi-square distribution associated with statistic. parameters has names attribute "df".
p.value:
the asymptotic p-value for the test.
method:
character string giving the name of the method used, including whether Yates' continuity correction was applied.
data.name:
a character string (vector of length 1) containing the actual name of the input argument x, and of y if both are factor or category objects.


NULL:
The expected cell counts are estimated as the products of the observed marginal totals divided by the table total. These expected counts are relevant to several types of null hypothesis: statistical independence of the rows and columns, homogeneity of groups, etc. The appropriateness of the test to a particular null hypothesis and the interpretation of the results depend on the nature of the data at hand, in particular on the sampling scheme. See for example Fleiss (1981).


TEST:
The returned p.value should be interpreted carefully. Its validity depends heavily on the assumption that the expected cell counts are at least moderately large; a minimum size of five is often quoted as a rule of thumb. Even when cell counts are adequate, the chi-square is only a large-sample approximation to the true distribution of X-squared under the null hypothesis.

Indiscriminate use of chisq.test with arbitrary count data is to be discouraged. The null hypothesis (i.e., probability model), sampling scheme and sizes of the counts all have bearing on the meaningfulness of the test, and some thought should be given to these.


DETAILS:
See the hardcopy help-file for an algebraic definition of Pearson's chi-square statistic.

The degrees of freedom (returned component parameters) are given by the product (R-1)*(C-1), where R is the number of rows and C the number of columns of the contingency table.


REFERENCES:
Fienberg, S. E. (1983). The Analysis of Cross-Classified Categorical Data, 2nd ed. Cambridge, Mass.: The MIT Press.

Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions, 2nd ed. New York: Wiley.

Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods, 7th ed. Ames, Iowa: Iowa State University Press.


SEE ALSO:
fisher.test , mantelhaen.test , mcnemar.test , category , cut , table .

EXAMPLES:
table(x,y)                   # x and y are factor objects.
  No Yes
A 13  13
B 20  14
chisq.test(x,y)
chisq.test(table(x,y))       # same thing as chisq.test(x,y)