M-Estimates of Regression

DESCRIPTION:
Performs M-estimation of a single response on one or more explanatory variables. Several different weighting (psi) functions are available as well as other options.

USAGE:
rreg(x, y, w=rep(1,nrow(x)), int=T, init=lsfit(x, y, w, int=F)$coef,
    method=wt.default, wx, iter=20, acc=10*.Machine$single.eps^0.5,
    test.vec="resid")

REQUIRED ARGUMENTS:
x:
vector or matrix of the explanatory variable(s). Columns represent variables and rows represent observations. This should not include a column of 1s for the intercept unless int=FALSE. Missing values (NAs) are allowed.
y:
vector of the response variable, to be regressed on x. Missing values (NAs) are allowed.

OPTIONAL ARGUMENTS:
w:
initial weights for robustness; for example, w may be the weights computed from residuals in previous iterations of rreg. The argument wx should be used for weights that are to remain constant from iteration to iteration.
int:
logical flag: should an intercept term be included in the regression?
init:
vector of initial coefficient values. This must have the same number of elements as there are columns in x.
method:
function which returns a vector of weights given scaled residuals. The default, wt.default, is the converged Huber estimate followed by the bisquare. See the examples below for a way to give different values for the tuning constants where applicable.
wx:
weighting vector (for intrinsic weights, not the weights used in the iterative fit). Missing values (NAs) are allowed.
iter:
maximum number of iterations.
acc:
convergence tolerance.
test.vec:
character string specifying the convergence criterion. The relative change is tested for residuals with a value of "resid", for coefficients with "coef", and for weights with "w". If test.vec=NULL, then the orthogonality of the residuals to x is tested.

VALUE:
list with the following components:
coef:
vector of coefficients in the final fit.
resid:
vector of final residuals.
fitted.values:
vector of the fitted values.
w:
vector of final weights in the iteration, excluding the influence of wx.
int:
logical flag telling whether an intercept was fitted.
conv:
vector of the value of the convergence criterion at each iteration.
status:
character string: "converged" if the iterations converged normally, or else a message describing abnormal convergence.

DETAILS:
The routine uses iteratively reweighted least squares to approximate the robust fit, with residuals from the current fit passed through a weighting function to give weights for the next iteration. There are several possible weighting functions, and you are free to create your own. The weight functions that are available are: wt.andrews, wt.bisquare, wt.cauchy, wt.default, wt.fair, wt.hampel, wt.huber, wt.logistic, wt.median, wt.talworth, wt.welsch.

Here we describe the calculations for some of the weight functions available. The vector u below is the vector of residuals divided by the (Gaussian consistent) MAD of the residuals. c is a "tuning" constant with default values as indicated for each method. (See the examples for suggestions on how to override these defaults.)

andrews The weighting function is sin(u/c)/(u/c) for abs(u) <= pi*c and 0 otherwise. The default c is 1.339.

bisquare The weight function is (1 - (u/c)^2)^2 if u < c and 0 otherwise. The default is c=4.685.

cauchy The weight function is 1/(1 + (u/c)^2) with c=2.385 the default.

fair 1/(1 + abs(u/c))^2 is the weight function and 1.4 is the default for c.

hampel This has three tuning constants, a, b and c. The weight function is 1 if abs(u)<=a; it is a/abs(u) for a<abs(u)<=b; it is (a*((c-abs(u))/(c-b)))/abs(u) for b<abs(u)<=c; and it is 0 otherwise. The defaults for a, b and c are 2, 4 and 8.

huber The weight function is 1 for abs(u) < c and c/abs(u) otherwise. The default is c=1.345.

logistic The weight function is tanh(u/c)/(u/c) with c=1.205 the default.

talworth If abs(u) <= c, the weight function is 1, otherwise it is 0. c=2.795 is the default.

welsch The weight function is exp(-2*(abs(u/(2*c))^2)) with a default c of 2.985.

To learn more details about the weighting functions, look at them-they are generally implemented in only a couple of lines of S-PLUS code.

The function rreg, which is described in Heiberger and Becker (1992), is modeled on the algorithm described in Coleman et al. (1980).


BACKGROUND:
These regression estimates are useful when there are outliers in the response. Least squares regression is very sensitive to the assumption that the errors have a Gaussian distribution, so robust techniques are often of value. The M-estimates that rreg produces are susceptible to high leverage points, however (see hat). In technical terms: M-estimates do not have bounded influence.

The ltsreg function is an alternative-it has the highest possible breakdown (almost half the observations, both x and y, can be moved arbitrarily and the estimate will change only a finite amount) but it is somewhat inefficient. Two ways to currently get high-breakdown, high-efficiency regression are to use rreg with weights that are based on the hat diagonal, or to use least squares with weights based on the size of the residuals from the least trimmed squares regression. Research is currently very active on this question, more definite solutions can be expected in the future.


REFERENCES:
Coleman, D., Holland, P., Kaden, N., Klema, V., and Peters, S. C., (1980). A system of subroutines for iteratively re-weighted least-squares computations. ACM Transactions on Mathematical Software 6, 327-336.

Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986). Robust Statistics, The Approach Based on Influence Functions. Wiley, New York.

Heiberger, R. and Becker, R. A. (1992). Design of an S function for Robust Regression Using Iteratively Reweighted Least Squares. Journal of Computational and Graphical Statistics. 1, 181-196.


SEE ALSO:
ltsreg , l1fit , lm .

EXAMPLES:
rreg(stack.x, stack.loss, method=wt.huber) # use a Huber weight function

# If a different value for the tuning constant is desired, say c=1.2, then rreg(stack.x, stack.loss, method=function(u) wt.huber(u,c=1.2))

# To implement different values for the tuning constants, we might also do: hamp.139 <- function (r) wt.hampel(r, a=0.1, b=0.3, c=0.9) rreg(stack.x, stack.loss, method=hamp.139)

reg1 <- rreg(x,y, method=wt.talworth, iter=3) rreg(x, y, w=reg1$w, method=wt.bisquare, iter=3)