Hat Diagonal Regression Diagnostic

DESCRIPTION:
Returns the diagonal of the "hat" matrix for a least squares regression.

USAGE:
hat(x, intercept=T)

REQUIRED ARGUMENTS:
x:
matrix of explanatory variables in the regression model y=xb+e, or the QR decomposition of such a matrix. Missing values are not accepted.

OPTIONAL ARGUMENTS:
intercept:
logical flag, if TRUE an intercept term is included in the regression model. This is ignored if x is a QR object.

VALUE:
vector with one value for each row of x. These values are the diagonal elements of the least-squares projection matrix H. (Fitted values for a regression of y on x are H %*% y.) Large values of these diagonal elements correspond to points with high leverage.

BACKGROUND:
The diagonals of the hat matrix indicate the amount of leverage (influence) that observations have in a least squares regression. Note that this is independent of the value of y. Observations that have large hat diagonals have more say about the location of the regression line; an observation with a hat diagonal close to 1 will have a residual close to 0 no matter what value the response for that observation takes.

The hat diagonals lie between 1/n and 1 and their average value is p/n where p is the number of variables, i.e., the number of columns of x (plus 1 if int=T), and n is the number of observations (the number of rows of x). Belsley, Kuh and Welsch (1980) suggest that points with a hat diagonal greater than 2p/n be considered high leverage points, though they state that too many points will be labeled leverage points by this rule when p is small. Another rule of thumb is to consider any point with a hat diagonal greater than .2 (or .5) as having high leverage. If p is large relative to n, then all points can be "high leverage" points.

By the way, it is called the "hat" matrix because in statistical jargon multiplying the matrix by a vector y puts a "hat" on y, that is, the estimated fit is the result.


REFERENCES:
Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. Wiley, New York.

Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression. Chapman and Hall, New York.


SEE ALSO:
qr , lsfit , lmsreg , ls.diag , lm.influence .

EXAMPLES:
h <- hat(freeny.x)
plot(h, xlab="index number", ylab="hat diagonal")
abline(h=2*ncol(freeny.x)/nrow(freeny.x))