MATH
150
STATISTICAL DATA ANALYSIS T.
LENGYEL
S-PLUS tips on how to retrieve data via the Internet
I suggest that you visit the following site:
http://lib.stat.cmu.edu/
which
is the official site of STATLIB.
You will find the most interesting datasets
in the DATA AND STORY LIBRARY (DASL) at http://lib.stat.cmu.edu/DASL/
and within the STATLIB---DATASETS ARCHIVE (DATASETS) in http://lib.stat.cmu.edu/datasets/.
You can search the database via http://lib.stat.cmu.edu/cgi-bin/iform?DASL.
You might search the database by various
categories. To download a dataset I suggest the following steps:
-
View the dataset in your Internet browser.
By using copy and paste
techniques transport the data portion of your dataset (header info is
included if any) into an Editor (Notepad Editor, Write, WordPerfect,
Microsoft Word, etc.) and safe it from within the editor. Please pay
attention to saving options! You might wish to keep the original structure
of the dataset (it might contain simple spaces or tabs to separate different
fields of the same record, etc.). You might want to use a dataset stored in
an Excel worksheet. The good news is that S-PLUS will be able to read them
in. I usually use the ASCII file saving option with tab delimiters. SAVE
YOUR DATA FILES (just to be on the safe side...)!
Make sure that every entry in any given field
looks the same. The use of “underscore” characters might help here (in your
primary dataset and not in S-PLUS): e.g., in the college dataset use:
Occidental_College rather than Occidental College, and Harvey_Mudd_College
rather than Harvey Mudd College). Don't leave unwanted spaces in names and
remove any extra lines at the beginning (except the single line with the
variable names, i.e., the header info for the S-PLUS variables).
-
Use the File->Import Data->From file
menu option of S-PLUS 8 to download the dataset for use within S-PLUS.

After identifying your data input file in
From File Name:, the File Format: (e.g., Excel Worksheet (xl?)
or ASCII file - whitespace delim (asc; dat; txt; prn)), the name of the
dataset that will be used in S-PLUS under To Data set, setting
Rounding: (if applicable), and choosing Update Preview, you will be
ready to use your dataset from within S-PLUS. First an Excel like table will pop
up to see how your dataset might look. You can repeatedly make changes in
your original Excel worksheet if necessary and then (after saving) you can
use Update Preview to see if your problems got resolved.
Use a standard S-PLUS name which you use for
naming datasets, variables, and functions. ([Although, I believe that you will
hardly need this option but anyways:] You can check the appropriate boxes and
change the default settings for conversion options and parameters under the
Options tab in the Import From File window. The default delimiter,
for example, is set to “come” right now.)
You might as well practice this with various
settings. Let me know if you still have problems after trying as many options as
you could think of. Check your dataset by simply typing in the name of the
dataset. If it has column labels such as Col1, Col2, ..., etc., then you either
did not have header info (with meaningful variable names in the first row of
your data file) in your file or you missed reading it in during import. Another
way to test your dataset is to issue the function call ”plot(name_of_your_dataset)”
or something like ”l9pairs(name_of_your_dataset)”.
(Please be aware that “plot” might not do exactly what you expect and
plotting might be time consuming and l9pairs will take forever if you have too
many columns/variables.)
P.S. read this to share
material via the shared student directory
P.S. read this to change the color
mapping (this is an S-PLUS8 issue) last modified by tl,
04/29/2008 |