scan(file="", what=numeric(), n=<<see below>>, sep=<<see below>>, multi.line=F, flush=F, append=F, skip=0, widths=NULL, strip.white=<<see below>>)
If what is a list, then each record is considered to have length(what) fields and the mode of each field is the mode of the corresponding component in what. When widths is given as a vector of length greater than one, what must be a list of the same length as widths.
Any numeric field containing the characters NA will be returned as a missing value. If the field separator (the sep argument) is given and the field is empty, the returned value will be an NA for a numeric or complex field, and a "" for a character field.
The main use for separators is to allow white space inside character fields. For example, suppose in the above the numeric field was to be followed by a tab with text filling out the rest of the line. z <- scan(myfile,list(pop=0,city=""),sep="\t") would allow blanks in the city name. With no separator, arbitrary white space can be included by quoting the whole string. With a separator, quotes are not used; if the separator character is to be included in a string, it must be escaped by a preceding backslash.
Fields of mode "logical" cannot be read directly: read them as character and convert them by expressions like x=="T". Any field that cannot be interpreted according to the mode(s) supplied to scan will cause an error.
The reading of numeric data in scan is done by means of C scan formats, rather than by the rules of the S-PLUS parser (the function parse). Exponential notation must use "e"; numbers that use "d" or other letters will be read wrong. You will need to change your data from the "d" notation to the "e" notation with, for instance, the sed utility in UNIX.
As it reads more and more records, scan keeps allocating more space to accommodate the growing vectors. If you can manage to pass in a what argument that is identical in size to the result you expect, S-PLUS will use that space and not have to perform memory allocations. This may produce significant memory savings when dealing with large files of data.
The make.fields function preprocesses files that have fixed-format fields, putting in separator after each field; it can be used as a separate step instead of using the widths. The advantage of using widths is that you dont need to create any temporary files.
The read.table function reads data from a file and returns a data frame. It is often a better choice than scan if the data are in a regular table format with rows of equal length.
count.fields tells how many fields are in each line of a file---usefull for determining if read.table is appropriate or, when using scan to return a list, if the number of fields in each line is a proper multiple of the length of what.
readline is another function that accepts data interactively.
num <- scan() # read numeric values from standard input # read a label & two numeric fields, make a matrix z <- scan("myfile",list(name="",0,0)) mat <- cbind(z[[2]],z[[3]]) dimnames(mat) <- list(z$name,c("X","Y")) # read in a vector of character data personnel <- scan("person", what="") ff <- scan("myfile", what=list(NULL,name="",data=0,NULL), multi.line=T, sep="\t") # creates a list with two NULL components, a character component # and a numeric component. Fields are separated by tabs. ff <- ff[sapply(ff, length) > 0] # delete NULL components scan("myfile", single(0), skip=5) # save in single precision, skip the first five lines of the file # example of reading fixed format file using the widths argument # and of using the strip.white argument # blanks are read as NA for numeric fields # assignment can be suppressed for a field using NULL in the what argument # for this example, the file 'dfile' contains the following lines: 01giraffe.9346H01-04 88donkey .1220M00-15 77ant L04-04 20gerbil .1220L01-12 22swallow.2333L01-03 12lemming L01-23 mydf.what <- list(code=0, name="", x=0, s="", n1=0, NULL, n2=0) mydf.widths <- c(2, 7, 5, 1, 2, 1, 2) # note: strip.white defaults to TRUE if widths specified # could also use strip.white = c(F, T, F, F, F, F, F) mydf <- scan("dfile", what=mydf.what, widths=mydf.widths) mydf # this produces the following output: $code: [1] 1 88 77 20 22 12 $name: [1] "giraffe" "donkey" "ant" "gerbil" "swallow" "lemming" $x: [1] 0.9346 0.1220 NA 0.1220 0.2333 NA $s: [1] "H" "M" "L" "L" "L" "L" $n1: [1] 1 0 4 1 1 1 [[6]]: NULL $n2: [1] 4 15 4 12 3 23 # now with strip.white argument: mydf <- scan("dfile", what=mydf.what, widths=mydf.widths, strip.white=F) mydf$name # this produces a list just like the one above, except the columns are # not stripped: [1] "giraffe" "donkey " "ant " "gerbil " "swallow" "lemming"