MATH
  150             
STATISTICAL DATA ANALYSIS          T.
LENGYEL 
 S-PLUS tips on how to retrieve data via the Internet
 
I suggest that you visit the following site: 
http://lib.stat.cmu.edu/
which
is the official site of STATLIB.
 You will find the most interesting datasets 
in the  DATA AND STORY LIBRARY (DASL)  at  http://lib.stat.cmu.edu/DASL/ 
and within the STATLIB---DATASETS ARCHIVE (DATASETS) in  http://lib.stat.cmu.edu/datasets/. 
You can search the database via  http://lib.stat.cmu.edu/cgi-bin/iform?DASL. 
 You might search the database by various
categories. To download a dataset  I suggest the following steps:
 
  
	- 
	View the dataset in your Internet browser. 
	By using copy and paste 
	techniques transport the data portion of your dataset (header info is 
	included if any) into an Editor (Notepad Editor, Write, WordPerfect, 
	Microsoft Word, etc.) and safe it from within the editor. Please pay 
	attention to saving options! You might wish to keep the original structure 
	of the dataset (it might contain simple spaces or tabs to separate different 
	fields of the same record, etc.). You might want to use a dataset stored in 
	an Excel worksheet. The good news is that S-PLUS will be able to read them 
	in. I usually use the ASCII file saving option with tab delimiters. SAVE 
	YOUR DATA FILES (just to be on the safe side...)!
 
 
Make sure that every entry in any given field 
looks the same. The use of “underscore” characters might help here (in your 
primary dataset and not in S-PLUS): e.g., in the college dataset use: 
Occidental_College rather than Occidental College, and Harvey_Mudd_College 
rather than Harvey Mudd College). Don't leave unwanted spaces in names and 
remove any extra lines at the beginning (except the single line with the 
variable names, i.e., the header info for the S-PLUS variables). 
	- 
	Use the File->Import Data->From file 
	menu option of S-PLUS 8 to download the dataset for use within S-PLUS.
	
 
 
  
After identifying your data input file in
From File Name:, the File Format:  (e.g., Excel Worksheet (xl?) 
 or  ASCII file - whitespace delim (asc; dat; txt; prn)), the name of the 
dataset that will be used in S-PLUS under To Data set, setting 
Rounding: (if applicable), and choosing Update Preview, you will be 
ready to use your dataset from within S-PLUS. First an Excel like table will pop 
up to see how your dataset might look. You can repeatedly make changes in 
your original Excel worksheet if necessary and then (after saving) you can 
use Update Preview to see if your problems got resolved. 
Use a standard S-PLUS name which you use for 
naming datasets, variables, and functions. ([Although, I believe that you will 
hardly need this option but anyways:] You can check the appropriate boxes and 
change the default settings for conversion options and parameters under the 
Options tab in the Import From File window. The default delimiter, 
for example, is set to “come” right now.) 
You might as well practice this with various 
settings. Let me know if you still have problems after trying as many options as 
you could think of. Check your dataset by simply typing in the name of the 
dataset. If it has column labels such as Col1, Col2, ..., etc., then you either 
did not have header info (with meaningful variable names in the first row of 
your data file) in your file or you missed reading it in during import. Another 
way to test your dataset is to issue the function call ”plot(name_of_your_dataset)” 
or something like ”l9pairs(name_of_your_dataset)”. 
(Please be aware that “plot” might not do exactly what you expect and 
plotting might be time consuming and l9pairs will take forever if you have too 
many columns/variables.) 
P.S. read this to share 
material via the shared student directory 
P.S. read this to change the color 
mapping (this is an S-PLUS8 issue) last modified by tl, 
04/29/2008  |