Generalized Linear Models via Maximum Likelihood

DESCRIPTION:
Fits a generalized linear model of the form E(y) = f(Xb) via maximum likelihood. Estimation is performed for errors from one of five probability distributions, and there are several choices for f.

USAGE:
glim(x, y, n, error = "gaussian", link = "identity",
     wt = <<see below>>, resid = "none", init = <<see below>>,
     intercept = T, scale = <<see below>>, offset = 0,
     sequence = c(0, ncol(x)), eps = 0.0001, iter.max = 10)

REQUIRED ARGUMENTS:
x:
vector or matrix of explanatory variables, each column represents a variable, and each row represents an observation. Only full rank matrices are allowed.
Missing values (NA) are allowed.
y:
vector containing the response variable. The length of y must equal the number of rows of x.
Missing values (NA) are allowed.

OPTIONAL ARGUMENTS:
n:
required when error is "binomial", but ignored otherwise. n is the vector of denominators for the proportion (and y is the number of "successes" out of n trials). This must have the same length as y.
Missing values (NA) are allowed.
error:
a character string specifying the error structure for the model. Possible values are "gaussian", "binomial", "poisson", "gamma" and "inverse gaussian". Only enough of the string to determine a unique match is required.
link:
a character string specifying the link function, which is the inverse of f above. Possible values are "identity", "log", "logit", "probit", "sqrt", "inverse", and "loglog". Let g be the inverse of f. Definitions of g(y) are y, log(y), log(y/(1-y)), qpnorm(y), sqrt(y), 1/y, and log(-log(1-y)), respectively. Only enough of the string to determine a unique match is required.
wt:
vector of case weights. This must have the same length as y.
Missing values (NA) are allowed. By default unweighted estimation is performed (all weights are unity).
resid:
a character string specifying the type of residual desired. Possible values are "none", "pearson", and "deviance". The first character is sufficient.
init:
vector of initial values for the model coefficients. The length must be ncol(x) + 1 if intercept is TRUE, and ncol(x) otherwise. An initial estimate using least squares is computed when this is not given.
intercept:
if TRUE, a constant (intercept) term is included in the model.
scale:
the scale factor for the model. This may be either a numeric constant or a string specifying the estimation method, either "deviance" or "pearson". The default value is 1 for "binomial" and "poisson" error structures, and "pearson" for the others. The "deviance" and "pearson" scale estimates are equivalent to summing the squares of the respective residuals and dividing by the degrees of freedom.
offset:
in certain cases an explanatory variable with a fixed coefficient needs to be included in the model. This is modeled as E(y) = f(Xb + offset), where b is the vector of estimated coefficients, and the variables with fixed coefficients have been left out of the X matrix and summed into offset.
sequence:
integer vector giving the sequence of models to be fit. It must be in increasing order. A value of 1:3, for instance, would cause models to be fit using the first column of x, then the first 2 columns, then the first 3 columns; all of these would include an intercept term if intercept=TRUE. When intercept = TRUE, the first model in the default is a fit of only the mean or mean+offset term.
eps:
iteration will stop when the relative change in log likelihood is less than eps or after iter.max iterations. With newdev equal to the current deviance and olddev equal to the deviance from the last iteration, iteration is stopped if abs((olddev-newdev)/(newdev+1)) < eps.
iter.max:
maximum number of iterations.

VALUE:
a list with the following components:
coef:
the vector of estimated coefficients for the final model that was fit.
var:
estimated variance matrix of coef (based on the expected Fisher information matrix). It has been adjusted using the value of scale.
deviance:
a vector of deviance values for the sequence of models. The deviance is not divided by the scale estimate (otherwise a Gaussian model would always have a deviance of the number of observations minus the number of parameters being fit). The weights for the observations are used.
df:
a vector of degrees of freedom corresponding to the values in deviance. This takes account of any observations with a zero weight.
scale:
the scale value of the final model, either as estimated or the fixed value that was input.
resid:
vector of deviance or Pearson standardized residuals for the final model. The residuals have not been standardized (multiplied) by the wt vector. This is different than the GLIM program distributed by NAG, which multiplies the residuals given here by the square roots of the weights. If the weights represent one over the variance, then the NAG residuals all have the same variance. However, it does not return any information for an observation with zero weight. The behavior here is more consistent if the weights were integers representing a count for each observation; the returned residual is the same as would be obtained from replicating the observations and using a weight of 1. For deviance residuals, the overall deviance will be scale * sum(wt * resid^2). This component is not returned if the resid argument is "none".
intercept:
logical value stating whether or not an intercept was fit.

DETAILS:
An observation is considered missing if there are missing values for that observation in any of x, y, n, or wt. Two important choices when using glim are the probability distribution used to model the errors and the "link" function that describes the expected value of the response in terms of the covariates. Of the error distributions the Poisson and Binomial distributions are discrete valued, the Gamma and Inverse Gaussian take on positive real values, and the Gaussian assumes all real values. The logit, probit and complementary log-log link functions are typically used when modeling proportions (with Binomial errors). The log link function is used in conjunction with Poisson errors as well as in other cases.

Two different types of residuals may be returned, Pearson residuals and deviance residuals. Pearson residuals are the difference between the observation and the fit divided by the square root of the estimated variance for the observation. For Poisson errors the sum of the squared Pearson residuals is the Pearson Chi-squared goodness-of-fit statistic. The deviance residual is defined as the square root of the deviance for each observation with the sign determined by the sign of the observation minus the fit. (The sum of the squares of the deviance residuals will equal the deviance for the final model in unweighted cases.)

Denote the predicted value f(xb+offset) by mu. The deviance for each point is computed as: (y-mu)^2 for the Gaussian, 2*(y*log(ifelse(y==0,1,y/mu)) - (y-mu)) for the Poisson, 2*n*(y*log(ifelse(y==0,1,y/mu)) + (1-y)*log(ifelse(y==1,1,(1-y)/(1-mu)))) for the Binomial, -2*(log(y/mu) - (y-mu)/mu ) for the Gamma, and (y-mu)^2 / (y*mu^2) for the Inverse Gaussian. The deviance component of the output is the sum of wt times the deviance vector.

An iterated reweighted least squares algorithm is used to estimate the coefficients. Observations that have missing values in x, y, n or wt are deleted from the computations; residuals for such observations will be NA.


BACKGROUND:
The name glim stands for Generalized Linear Models, that is, generalized linear regression. These models are linear in the sense that the parameters are linear in the explanatory variables (covariates). They are generalized by allowing the covariates to model a non-linear transformation of the expected value of the response, and by allowing the distribution of the errors to be non-Gaussian.

REFERENCE:
McCullagh, P. and Nelder, J. A. (1983). Generalized Linear Models. London: Chapman and Hall.

SEE ALSO:
agreg , glim.print , glm , loglin , lsfit .

EXAMPLES:
liver.x <- cbind(injection=liver.cells,
  A.exper=ifelse(levels(liver.exper)[liver.exper]=="A", 1, 0),
  B.exper=ifelse(levels(liver.exper)[liver.exper]=="B", 1, 0),
  C.exper=ifelse(levels(liver.exper)[liver.exper]=="C", 1, 0))
glim(liver.x, liver.gt[,1], error="poisson",
  link="log", intercept=F, seq=c(0,1,4))
liver.pl <- glim(liver.cells, liver.gt[,1], error="poisson",
  link="log", resid="dev")
liver.fit <- exp(cbind(1,liver.cells) %*% liver.pl$coef)
plot(liver.fit, liver.pl$resid, log="x", type="n")
text(liver.fit, liver.pl$resid, levels(liver.exper)[liver.exper])

barley.cultar <- barley.disease barley.cultar[] <- 1:8 barley.cult <- NULL for (i in 1:8) barley.cult <- cbind(barley.cult, ifelse(barley.cultar==i, 1, 0)) dimnames(barley.cult) <- list(NULL, dimnames(barley.disease)[[1]])

# change 0 to NA in binomial denominator barley.exp <- as.vector(ifelse(barley.exposed==0, NA, barley.exposed)) glim(barley.cult, as.vector(barley.disease), n=barley.exp, error="binom", link="logit", intercept=F)