Fisher's Exact Test for Count Data

DESCRIPTION:
Performs a Fisher's exact test on a two-dimensional contingency table.

USAGE:
fisher.test(x, y=NULL, node.stack.dim=1001, value.stack.dim=10000,
            hybrid=F)

REQUIRED ARGUMENTS:
x:
either a factor or category object or a two-dimensional contingency table in matrix form. If x is a matrix, each dimension must be no less than 2 and no greater than 10, all elements must be non-negative, and NAs, and Infs are not allowed. The elements of matrix x should be whole numbers, as the test is based on counts; the storage mode of x will be coerced to "integer". For restrictions on x when it is a factor or a category object, see argument y.

OPTIONAL ARGUMENTS:
y:
factor or category object. If x is a matrix, y is ignored. If x is a factor or a category object, y is required and must have the same length as x. Each object must have no less than 2 and no greater than 10 levels. NAs in the category index vectors are allowed, but pairs (x[i],y[i]) containing these will be removed. Each element of the index vectors of x and y should give the membership of that observation in one of the groups present in the levels attributes; an NA in an index vector means that the observation is not in one of the groups listed for that factor or category object. Infs have no meaning as indices, and should not be present.

Conversely, if x and y are present, and either x or y is not a factor or category object (and x is not a matrix), it will be coerced to one implicitly. In this case pairs (x[i],y[i]) containing NAs will be removed, but not pairs with Infs. Coercion of x and y in this manner is intended for datasets of mode numeric, whose elements are typically small integers; data in the form of character vectors should first be made into factor or category objects.

node.stack.dim:
Dimension of a stack for storing nodes corresponding to possible subtables.
value.stack.dim:
Dimension of a stack for storing different function values corresponding to nodes.
hybrid:
logical flag: if TRUE, a hybrid algorithm is used. This involves an approximation. See Mehta and Patel (1986).

VALUE:
A list of class "htest", containing the following components:

p.value:
the p-value for the test.
alternative:
always "two.sided".
method:
character string giving the name of the method used.
data.name:
a character string (vector of length 1) containing the actual name of the input argument x, and of y if both are factor or category objects.


NULL:
Fisher's exact test is typically used to test the null hypothesis of independence between the row and column variables of the table. Certain types of homogeneity, for example homogeneity of proportions in a k by 2 table, are equivalent to the independence hypothesis. See the literature references for examples.


TEST:
Unlike many tests for categorical data whose test statistics have an asymptotic known distribution, Fisher's exact test does not require the cell counts to be large. Since the test proceeds by conditioning on the marginal totals, however, it is important that this have a meaningful interpretation relative to the sampling scheme governing data collection.


DETAILS:
Treating the table marginal totals as fixed, the conditional probability P* associated with the given table may be obtained using the hypergeometric distribution. This probability may be obtained for any table having the same marginals. The p-value for the test is the sum of the probabilities P for all tables with P <= P*.

The algorithm used in fisher.test is based on theory from Mehta and Patel (1983, 1986) and Joe (1985, 1988). It involves a network algorithm together with matrix majorization results to find the maximum and minimum of a certain objective function at each node in the network that is processed. See Joe (1988).


WARNING:
The total number of counts in the cross-classification table cannot be greater than 200.

REFERENCES:
(a) Statistical Theory

Bishop, Y. M. M., Fienberg, S. J., and Holland, P. W. (1980). Discrete Multivariate Analysis: Theory and Practice, Cambridge, Mass.: The MIT Press.

Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions, 2nd ed. New York: Wiley.

Zar, J. H. (1984). Biostatistical Analysis, 2nd ed. Englewood Cliffs: Prentice-Hall.

(b) Computer Algorithm

Joe, H. (1985). An Ordering of Dependence for Contingency Tables. Linear Algebra and its Applications 70, 89-103.

Joe, H. (1988). Extreme probabilities for contingency tables under row and column independence with application to Fisher's exact test. Communications in Statististics A, Theory and Methods 17, 3677-3685.

Mehta, C. R. and Patel, N. R. (1983). A network algorithm for performing Fisher's exact test in r*c contingency tables. Journal of the American Statistical Association 78, 427-434.

Mehta, C. R. and Patel, N. R. (1986). Algorithm 643. FEXACT: A Fortran subroutine for Fisher's exact test on unordered r*c contingency tables. ACM Transactions on Mathematical Software 12, 154-161.

Mehta, C. R. and Patel, N. R. (1986). A hybrid algorithm for Fisher's exact test in unordered r*c contingency tables. Communications in Statististics A, Theory and Methods 15, 387-404.


SEE ALSO:
chisq.test , mantelhaen.test , mcnemar.test , category , cut , table , Hypergeometric .

EXAMPLES:
x       # x and y are category objects
[1] 1 1 2 1 2 1 1 2 2
attr(, "levels"):
[1] "A"     "A bar"
table(x,y)    # table from Fleiss, p. 25
      B B bar
    A 2     3
A bar 4     0
fisher.test(x,y)
fisher.test(table(x,y))       # same thing