S-PLUS help

Create a Data Frame by Reading a Table

DESCRIPTION:: Reads in a file in table format and creates a data frame with the same number of rows as there are lines in the file, and the same number of variables as there are fields in the file.

USAGE:

read.table(file, header=<<see below>>, sep, row.names, col.names,
       as.is=F, na.strings="NA", skip=0)

REQUIRED ARGUMENTS:

file:: character string naming the text file from which to read the data. The file should contain one line per row of the table. The fields may be separated by the character in sep, or the file may be fixed format with the fields starting at fixed points within each row.

OPTIONAL ARGUMENTS:

header:: logical flag: if TRUE, then the first line of the file is used as the variable names of the resulting data frame. The default is FALSE, unless there is one less field in the first line of the file than in the second line.
sep:: the field separator (single character), often "\t" for tab. If omitted, any amount of white space (blanks or tabs) can separate fields. To read fixed format files, make sep a numeric vector giving the initial columns of the fields.
row.names:: optional specification of the row names for the data frame. If provided, it can give the actual row names, as a vector of length equal to the number of rows, or it can be a single number or character string. In the latter case, the argument indicates which variable in the data frame to use as row names (the variable will then be dropped from the frame). If row.names is missing, the function will use the first nonnumeric field with no duplicates as the row names. If no such field exists, the row names are 1:nrow(x). You can force this last version, regardless of suitable fields to use as row names, by giving row.names=NULL. Row names, wherever they come from, must be unique.
col.names:: optional names for the variables. If missing, the header information, if any, is used; if all else fails, "V" and the field number are be pasted together. Variable names, wherever they come from, must be unique. Variable names will be converted to syntactic names before assignment, but not if they came from an explicit col.names argument.
as.is:: control over conversions to factor objects. By default, non-numeric fields are turned into factors, except if they are used as row names. If some or all fields should be left as is (typically producing character variables), set the corresponding element of as.is to TRUE. The argument will be replicated as needed to be of length equal to the number of fields; thus, as.is=TRUE leaves all fields unconverted.
na.strings:: character vector; when character data is converted to factor data the strings in na.strings will be excluded from the levels of the factor, so that if any of the character data were one of the strings in na.strings the corresponding element of the factor would be NA. Also, in a numeric column, these strings will be converted to NA.
skip:: the number of lines in the file to skip before reading data.

VALUE:: a data frame with as many rows as the file has lines (or one less if header==T) and as many variables as the file has fields (or one less if one variable was used for row names). Fields are initially read in as character data. If all the items in a field are numeric, the corresponding variable is numeric. Otherwise, it is a factor (unordered), except as controlled by the as.is argument. All lines must have the same number of fields (except the header, which can have one less if the first field is to be used for row names).

DETAILS:: This function should be compared to scan; read.table tries much harder to interpret the input data automatically, figuring out the number of variables and whether fields are numeric. It also produces a more structured object as output. The price for this, aside from read.table being somewhat slower, is that the input data must themselves be more regular and that read.table decides what to do with each field, except for the use of the as.is argument. With scan, input lines do not need to correspond to one complete set of fields, and the user decides what mode each field should have. Overall, read.table will usually be the easy way to construct data frames from tables. If it doesn't do what you want, consider the functions scan, make.fields, or count.fields, as well as text-editing tools and languages outside S-PLUS.

SEE ALSO:: scan , make.fields , count.fields .

EXAMPLES:

# Example 1: Fields are in fixed columns separated by variable white space.
# Fields have internal white space.  First two lines of file "cars":
#                      Price   Country Reliability Mileage   Type
#      Acura Integra 4 11950     Japan Much better      NA  Small
# Give sep argument a vector of column numbers.
# First line has same no. of fields as all the rest, so
# extract row labels using scan, then set header explicitly:
cars.names <- scan("cars", what="", flush = T, widths=30,
     skip = 1, strip = TRUE)
cars <- read.table("cars", header = TRUE, row.names = cars.names,
          sep = c(30, 36, 46, 58, 66))
# Example 2:  Fields are separated by ~ character; header defaults
# to TRUE, and first character field is automatically assigned to
# the row labels.  First two lines of file:
# ~Price~Country~Reliability~Mileage~Type
#       Acura Integra 4~11950~Japan~Much better~NA~Small
cars <- read.table("cars.tab", sep = "~")
# Example 3:  Fields are separated by variable white space.
# There is no internal white space, so you need not specify sep.
# Use na.strings to specify string used for missing data.
# First three lines of file:
# Price                           Country Reliability     Mileage Type
# Acura_Integra_4         11950   Japan   Excelent        N/A     Small
# Dodge_Colt_4            6851    Japan   N/A     N/A     Small
 cars <- read.table("cars.na" ,  na.strings="N/A")