Least Trimmed Squares Robust Regression

DESCRIPTION:
Returns a regression estimate that minimizes the sum of the smallest half of the squared residuals.

USAGE:
ltsreg(x, y, intercept = T, popsize = <<see below>>,
       mutate.prob = c(0.15,0.2,0.2,0.2), random.n = <<see below>>,
       births.n = <<see below>>, stock = list(),
       maxslen = <<see below>>, stockprob = <<see below>>,
       nkeep = 1, quan = <<see below>>,
       singular.ok = F, qr.out = F)

REQUIRED ARGUMENTS:
x:
vector or matrix of the explanatory variables. Columns represent variables and rows represent observations. Missing values (NAs) are not accepted.
y:
vector of the observations on the response. Missing values (NAs) are not accepted.

OPTIONAL ARGUMENTS:
intercept:
logical flag: if TRUE, an intercept is fit.
popsize:
the population size of the genetic stock. The default is 10 times the number of parameters fit.
mutate.prob:
length 4 vector of mutation probabilities for offspring. The first element is the probability of a mutation to one observation in the offspring. The second through fourth elements give the probability that the length of the offspring will be one shorter than the mother, one longer than the mother or a random length, respectively.
random.n:
the number of random samples taken after the stock is filled. The default is 50 times the number of parameters fit.
births.n:
the number of genetic births. The default is 50 times p plus 15 times p squared, where p is the rank of x (including the intercept, if any). This default allows reasonably accurate estimation for p at least up to twenty. You may consider doubling this if you want to insure very accurate minimization of the objective. If p is much larger than twenty, the default number of births may be insufficient (not much is known in this case).
stock:
a list of vectors of observation numbers to be included in the stock. This is typically the stock component of the output of a previous call to the function.
maxslen:
the maximum number of observations (including duplicates) in a member of the stock. The default is p if (n-p)/2 is less than p, where n is the number of observations, and it is the minimum of trunc((n-p)/2) and 5*p otherwise.
stockprob:
vector of cumulative probabilities that a member of the stock will be chosen as a parent. The ith element corresponds to the individual with the ith lowest objective. The default is cumsum((2 * (popsize:1))/popsize/(popsize + 1)).
nkeep:
the number of individuals in the stock to keep in the output.
quan:
the number of datapoints to include in the sum for the objective. The default is floor(n/2) + floor((k+1)/2), where n is the number of observations and k is the rank of x.
singular.ok:
logical flag: if FALSE, then an error is created if x (plus the intercept) is found to be singular.
qr.out:
logical flag: if TRUE, then a list representing the QR decomposition of x is returned.

VALUE:
a list describing the regression. Note that this is an approximation to the true solution based on a random algorithm, hence you will get (slightly) different answers each time you make the same call. The components of the returned list are:
coefficients:
vector of coefficients. If singular.ok is TRUE and x is singular, then some of the elements will be NA.
residuals:
object like y of the residuals from the regression.
fitted.values:
object like y of the fitted values.
objective:
vector of the objective for each component of the output stock. These are in increasing order; the first solution is used to compute the residuals and coefficients.
stock:
list of length nkeep containing vectors of observation numbers that define fits.
births.n:
the number of genetic births that were performed.
qr:
list that is the result of qr on x. This is only present when qr.out is TRUE.

SIDE EFFECTS:
causes creation of the dataset .Random.seed if it does not already exist, otherwise its value is updated.

DETAILS:
Least trimmed squares regression has nearly 50% breakdown point, and the usual rate of convergence (least median of squares regression has a slower rate). The objective that least trimmed squares minimizes is the sum of the q smallest squared residuals, where q is floor(n/2) + floor((p + 1)/2).

A genetic algorithm, described in Burns (1992), is used. Individual solutions are defined by a set of observation numbers, which corresponds to a least squares fit with the specified observations. A stock of popsize individuals is produced by random sampling, then a number of random samples are taken and the best solutions are saved in the stock. During the genetic phase, two parents are picked which produce an offspring that contains a sample of the observations from the parents. The best two out of the three are retained in the stock. The best of all of the solutions found is used to compute the coefficients and the residuals. The standard random sampling algorithm can be used by setting popsize to one, maxslen to p and births.n to zero.

The mutate.prob argument controls the mutation of the offspring. The length of the offspring is initially set to be the length of the first parent. This length is reduced by one, increased by one, or given a length uniformly distributed between p and maxslen according to the last three probabilities in mutate.prob. The other type of mutation that can occur is for one of the observations of the offspring to be changed to an observation picked at random from among all of the observations; the probability of this mutation is specified by the first element of mutate.prob.


REFERENCES:
Burns, P. J. (1992). A Genetic Algorithm for Robust Regression Estimation. (Statsci Technical Note).

Rousseeuw, P. J. and Leroy, A. M. (1987). Robust Regression and Outlier Detection. Wiley, New York.


NOTE:
This function makes lmsreg deprecated. Least trimmed squares is a (statistically) more efficient objective and the objective is computed much more efficiently than in lmsreg. The only advantages that lmsreg has is that it allows multiple responses, and it deletes observations with missing values.

SEE ALSO:
lm , lm.fit.qr , lmsreg , rreg .

EXAMPLES:
ltsreg(freeny.x, freeny.y)

fr.lts1 <- ltsreg(freeny.x, freeny.y, nkeep = 4) ltsreg(freeny.x, freeny.y, stock = fr.lts1$stock, births = 1000)