cov.mve(x, cor=F, print=T, popsize=<<see below>>, mutate.prob=c(0.15,0.2,0.2,0.2), random.n=<<see below>>, births.n=<<see below>>, stock=list(), maxslen=<<see below>>, stockprob=<<see below>>, nkeep=1)
if print is TRUE, then a message is printed.
A genetic algorithm, described in Burns (1992), is used. Individual solutions are defined by a set of observation numbers, which corresponds to a least squares solution with the specified observations. A stock of popsize individuals is produced by random sampling, then a number of random samples are taken and the best solutions are saved in the stock. During the genetic phase, two parents are picked which produce an offspring that contains a sample of the observations from the parents. The best two out of the three are retained in the stock. The best of all of the solutions found is used to compute the coefficients and the residuals. The standard random sampling algorithm can be used by setting popsize to one, maxslen to p+1 and births.n to zero.
The mutate.prob argument controls the mutation of the offspring. The length of the offspring is initially set to be the length of the first parent. This length is reduced by one, increased by one, or given a length uniformly distributed between p+1 and maxslen according to the last three probabilities in mutate.prob. The other type of mutation that can occur is for one of the observations of the offspring to be changed to an observation picked at random from among all of the observations; the probability of this mutation is specified by the first element of mutate.prob.
It is suggested that the number of observations be at least five times the number of variables. When there are fewer observations than this, there is not enough information to accurately determine if outliers exist.
The minimum volume ellipsoid is not allowed to have zero volume - singular covariance matrices from subsamples are ignored (except for being counted). If your data has a covariance matrix that is singular, cov.mve will fail because all of the covariance matrices of the subsamples will be singular. In this case, you will need to create some new data to give to cov.mve that does not have a singular covariance matrix; perhaps by using prcomp and deleting columns with zero variance.
Although the minimum volume ellipsoid covariance estimate has a very high breakdown point, it is inefficient. More efficiency can be attained while retaining the high breakdown point by performing a weighted covariance estimate with weights based on the minimum volume ellipsoid estimate. Such an estimate is what cov.mve returns. The Mahalanobis distance (computed using a scaling of the minimum volume ellipsoid covariance estimate) of each observation is compared to the Chisquare .975 quantile; those observations with smaller distances than this are given weight 1 and the others are given weight 0. The cov.wt function is then used with these weights. This was proposed in Rousseeuw and van Zomeren (1990).
Lopuhaa, H. P. and Rousseeuw, P. J. (1991). Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Annals of Statistics 19, 229-248.
Rousseeuw, P. J. (1991). A diagnostic plot for regression outliers and leverage points. Computational Statistics and Data Analysis 11, 127-129.
Rousseeuw, P. J. and van Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points (with discussion). Journal of the American Statistical Association 85, 633-651.
Woodruff, D. L. and Rocke, D. M. (1991). Computation of minimum volume ellipsoid estimates using heuristic search. manuscript.
fr.cov1 <- cov.mve(freeny.x)cov.mve(freeny.x, stock=fr.cov1$stock, births=1000)