MATH 150              STATISTICAL DATA ANALYSIS          T. LENGYEL 
S-PLUS tips on how to retrieve data via the Internet

I suggest that you visit the following site: http://lib.stat.cmu.edu/ which is the official site of STATLIB.

You will find the most interesting datasets  in the  DATA AND STORY LIBRARY (DASL)  at  http://lib.stat.cmu.edu/DASL/  and within the STATLIB---DATASETS ARCHIVE (DATASETS) in  http://lib.stat.cmu.edu/datasets/You can search the database via  http://lib.stat.cmu.edu/cgi-bin/iform?DASL

You might search the database by various categories. To download a dataset  I suggest the following steps:
 

  1. View the dataset in your Internet browser. By using copy and paste techniques transport the data portion of your dataset (header info is included if any) into an Editor (Notepad Editor, Write, WordPerfect, Microsoft Word, etc.) and safe it from within the editor. Please pay attention to saving options! You might wish to keep the original structure of the dataset (it might contain simple spaces or tabs to separate different fields of the same record, etc.). You might want to use a dataset stored in an Excel worksheet. The good news is that S-PLUS will be able to read them in. I usually use the ASCII file saving option with tab delimiters. SAVE YOUR DATA FILES (just to be on the safe side...)!

Make sure that every entry in any given field looks the same. The use of “underscore” characters might help here (in your primary dataset and not in S-PLUS): e.g., in the college dataset use: Occidental_College rather than Occidental College, and Harvey_Mudd_College rather than Harvey Mudd College). Don't leave unwanted spaces in names and remove any extra lines at the beginning (except the single line with the variable names, i.e., the header info for the S-PLUS variables).

  1. Use the File->Import Data->From file menu option of S-PLUS 8 to download the dataset for use within S-PLUS.

After identifying your data input file in From File Name:, the File Format:  (e.g., Excel Worksheet (xl?)  or  ASCII file - whitespace delim (asc; dat; txt; prn)), the name of the dataset that will be used in S-PLUS under To Data set, setting Rounding: (if applicable), and choosing Update Preview, you will be ready to use your dataset from within S-PLUS. First an Excel like table will pop up to see how your dataset might look. You can repeatedly make changes in your original Excel worksheet if necessary and then (after saving) you can use Update Preview to see if your problems got resolved.

Use a standard S-PLUS name which you use for naming datasets, variables, and functions. ([Although, I believe that you will hardly need this option but anyways:] You can check the appropriate boxes and change the default settings for conversion options and parameters under the Options tab in the Import From File window. The default delimiter, for example, is set to “come” right now.)

You might as well practice this with various settings. Let me know if you still have problems after trying as many options as you could think of. Check your dataset by simply typing in the name of the dataset. If it has column labels such as Col1, Col2, ..., etc., then you either did not have header info (with meaningful variable names in the first row of your data file) in your file or you missed reading it in during import. Another way to test your dataset is to issue the function call ”plot(name_of_your_dataset)” or something like ”l9pairs(name_of_your_dataset)”. (Please be aware that “plot” might not do exactly what you expect and plotting might be time consuming and l9pairs will take forever if you have too many columns/variables.)

P.S. read this to share material via the shared student directory
P.S. read this to change the color mapping (this is an S-PLUS8 issue)

last modified by tl, 04/29/2008