2:45 p.m.-3:40 p.m. 
Fowler  307
         
Lab: Tu: 3:05 p.m.-4:30 p.m.
Fowler  307

Mathematics 150 [Spring 2022]
Statistical Data Analysis

A VERY TENTATIVE SCHEDULE

Office Hours

Tamas Lengyel office: Fowler 322 phone: x2516
e-mail: mailto:lengyel@oxy.edu Department / Faculty / Home page / Schedule / Class home page class e-mail list: math150-L@oxy.edu

Text: see syllabus


Coverage:  see syllabus

Homework:  Homework will be due on specified dates (see in the calendar below). 

Late Assignments: No late work will be accepted.

Class Participation: mandatory.

Academic Honesty:  Please refer to the Student Handbook which describes the procedures for handling cases of academic dishonesty. You may not consult anyone but me during exams and quizzes.  While working on your problem sets and homework, you may consult with other students or me, but your written work must represent your own understanding.  You must cite sources of ideas including discussions with other students. Resources:

http://www.oxy.edu/student-handbook/academic-ethics/academic-ethics
http://www.oxy.edu/student-handbook/academic-ethics/academic-misconduct .

Tests and Quizzes:  There are three exams and potentially some quizzes tentatively scheduled on the calendar.  No make-up exams or late homework. In case of illness you must notify me beforehand.

Course Grade:

Homework, Quizzes, Presentations 3/8
Tests 1/4 (1/12 each)
Final Exam 3/8

More General Information and the Grading Scale (please follow the link)


Week Beginning (on Sunday) Monday Tuesday Wednesday Friday                          
Jan 23, W1

(REMOTE SESSION VIA ZOOM)

First day: intro
intro, mean (avg.) vs. median,
concepts, statistical inferences and analysis, frequencies
Lab 1:  intro to S-PLUS; more intro,
lottery related oddities
more intro; numerical measures of central tendency, spread of data, 
3 data sets--dotdiagram, 
histogram (S2.6),
Dynkin coupling
Jan 30, W2

Class 4: comparing distributions:
location and shape,
skewness, quantiles (S2.2), UQ, LQ

Lab 1-2:  intro to S-PLUS;
NJ Pick-it lottery (l9lab2n2);
l9.0(x) --numerical measures of central tendency and spread of data;
also read about
1. horse race betting and how to beat the system (local copy/listen to the podcast (local copy));
2. the Rolldown option of the (Cash) Winfall lottery in Michigan and Massachusetts: article 1 (local copy), article 2 (local copy) and article 3 (local copy);
3. card counting and the movie 21 about MIT students (local copy)
mean(x), median(x),
summary(x), range(x);
quantile(x), quantile(x, ) (or better yet: l9quantile(x, ));
qplot  (S2.2), boxplots (S2.5)
symmetry plot (S2.8)
Hw #1 due,
opt. properties of the mean and median; variance, 
stem-and-leaf displays (S2.7), 34N vs. 34S parallel
Feb 6, W3 opt. properties of the mean and median; variance, more on (empirical) qqplot (S3.2) Lab 2-3: bdtest();
NJ Pick-it lottery (l9lab2n2)--but be careful with those lotteries (local copy);
l9.0(x);
data manipulation, notched boxplots;
l9quantile (or Q), l9IQR (or IQRQ),
l9echo(l9qqplotplus(cl,cn),side=4),
l9drop1(), l9echouse(),
RUNS: l9m8(100,T), l9m9(classsize)
probability (classical), prob. trees, axioms, Venn-diagrams Class 9:  conditional probability, smoking vs. cancer (Take 1)

 

Feb 13, W4 Hw #2 due, more on cond. prob.:  Bayes' Theorem, contingency tables, smoking vs. cancer (Take 2) Lab 3-4: review earlier labs+geysers
34N vs. 34S parallel and more
review, independence, bottleneck of a problem (scanner vs. chips) more on bottleneck (scanner vs. chips)
Feb 20, W5

President's Day

Lab 4-5: review Lab4, lsfit (also l1fit) vs. lowess, superimposing graphs (via par(new=T)), functions: paste, cut, split, l980, screenfix, row.names cond. prob.: Bayes' Theorem; more on bottleneck, the effect of high false positive test probability (for rare diseases),
manufacturer's claim: P(false positive)=P("+"|-)  vs. customer's interest: P(false alarm)=P(-|"+"), note: manufacturer's claim: P(false negative)=P("-"|+)  vs. customer's interest: P(false pass)=P(+|"-");
facts about breast cancer (local copy in pdf  format), more facts (local copy), most recent developments (local copy),

a very good article: "A ‘99% Accurate’ Antibody Test", 05/02/20,Saturday (local copy):
sensitivity=P(valid positive)=1-P(false negative)=P("+"|+) and
specificity
=P(valid negative)=1-P(false positive)=P("-"|-)

the case of bad reporting:
exhibit #1 [on Study: Mammograms Lead to Many False-Positive Results] (local copy) and exhibit #2 [on False positive results prompt pregnancy test recall] (local copy),

the case of  (an almost!) good reporting: How Well Can Dogs Detect Cancer, Parade, 09/29/19, Sunday (local copy)

on prostate cancer (local copy);
Too many medical tests may harm, not help, older patients (local copy), see interview too [false alarm vs. false positive--re: breast cancer testing (again)-the case of bad reporting #2]

...and when people use different terms yet they can not make the difference between the notion of false positive and false alarm (local copy)

 

Class 14: Hw #3 due, more on independence and false positive (from previous class),
an unusual application of cond. prob.,
the two dice problem: probability that the die chosen shows red when rolled again;
Let's make a deal (the 3-door problem),
review
Feb 27, W6 scatter plots, a robust fit: l1fit, i.e., LAD, properties of the l1fit line (e.g., the special LAD lines), l1fit vs. lsfit; lmsreg; corr. coeff. r Lab 5-6: matches: birthdays and birth months--a one-liner in S-PLUS;
l9echo without name+timestamp and in smaller font size;
col=0:
remove objects printed in any color (except those made by identify);
draft lottery (+ leftover from previous labs), l9lab6id2(...)
more on lsfit: invariance, transformations, reading graphs, Q&A Hw #4 due,
a non-robust fit: more on lsfit, |r|<=1;  review: properties of the lsfit line, factor/response variables, strip median (an example: l9sx, better yet: stripmedian(x,y,#)), Q&A
March 6, Spring break

Spring break

Spring break

Spring break
March 13 [Day Light Savings starts at 2:00 a.m.)], W7

Class 18: Hw #5 due, lowess, review of other problems,
lowess

Lab 6-7 (see Lab 8, too): reading graphs to guess r,
l9.allfit(): to UR panel of Fig 4.2 on p.78: lsfit/l1fit/lmsreg/lowess,
lsfit/cor.test, l1fit and lmsreg--more on robustness, strip median: l9sx() and stripmedian(x,y,#),
multiplots, matplot, cbind, text,
l9sh(ozy,ozs) vs. l9sh(ozy,ozs,F);
l9fig3.7() and l92h(x,y);
l9dev.off()

Class 19:

Exam 1

TESTING ON-LINE TECHNOLOGY;
more on correlation/robustness -- (GMDA) Sections 4.5-8, Q1-Q4, local sharpening (l9sh)
(exam comments/solutions)

March 20, (W8)
 
draftsman and casement displays -- Sections 5.1-5.3 (S5.3 in particular);
l9.animal,
symbols, l9pairs (potentially combined with cbind);
Lab 7-8: passing parameters: l9lab6id2, matplot with cbind, symbolic plots: pairs, l9pairs (with panel programming), cased,
symbols, l9sh,
preview: l9pairsym, l9pairsmat, l9multiw, l9cased, panel programming, barley, l9trellis
random variables -- (Durrett) Sections 1.4, 2.2 Hw #6 due,

Class 23:  S-PLUS demos to illustrate 2 TRELLIS one-liners:
   l9m2 (or l9m222) and l9m3:
* with option panel= panel.lowess2 to add lsfit/l1fit/lowess if 2/3 or more points in the panel--used with coplots and xyplots;
(warning: all panel=panel.lowess2id/ids/idp/all settings require f99=f99 or f99= value between 0 and 1)
March 27, W9
* [use it with xyplot] with further improved option panel= panel.lowess2id (which shows the index of the observation) or panel=panel.lowess2idp or panel=panel.lowess2ids  (with id= specified in the xyplot call):
* panel=panel.lowess2idp is used to interactively IDENTIFY objects or
* panel
=panel.lowess2ids to add specified IDs (given in ids=) to the proper panels;

Note that id= must be an ABSOLUTE reference and NOT a reference relative to data=, and it must match data: if you drop some lines of data= then you should explicitly drop them from id= too; or use option subset= to select a subset of data!!!;

* see also the option panel= panel.lowess2idalll and panel.lowess2idall to add all names(?) [w/cex= and pch= options]--used with xyplots (and coplots (e.g., in l9m11--so don't use it!!!));

panel=panel.lowess2cor adds correlations (only if significantly different from 0) to coplots and xyplots (no l1fit though)

panel=panel.superpose adds different symbols (see pch=c(.,.,...)) to different categories identified by group= to dotplots (i.e., scatterplots) and xyplots; no line/curve fitting though, see barley and managers problems 

Lab 8-9: plot, pairs vs. l9pairs, cased vs. l9cased (use casedd though), symbols,
panel programming: l9pairs, 
l9pairsym, l9pairsmat,
l9trellis, l9trellis2, l9trellis3, l9multiw, l9m1, l9m2, l9m222, l9m3, l9m4("one-liners")
trellis functions: dotplot, histogram, stripplot, xyplot, coplot, 
DeMere, 
edit functions
more on random variables,  the DeMéré's problem (local copy),  calc. with binomial distr. --  (Durrett) Sections 2.2, Exercise 1.7.33 Hw #7 due,

S-PLUS:
l9pairs.demo() with panel=panel.pairs.lowess2ids (needs id= and ids=) and f99= to control the size of vertical stip neighborhoods in lowess;

identifying countries by panel programming;
Trellis user manuals;
panel.lowess2ids (cf. graph);
more on random variables

the (Newton-)Pepys problem (local copy) --  (Durrett) Sections 2.2, Exercise 1.7.34

l9.demere() and l9.pepys()
Apr 3, W10 more on fair games, and not so fair games (roulette: (American) roulette, how to play it?, Casablanca (movie clip), variance, standard deviation, and skewness of random variables --  (Durrett) Sections 1.5-6), back to DeMéré;

Q&A, more on appr. by Poisson --  (Durrett) Sections 2.2-3, homework,
the max of the binomial and Poisson distributions
Lab 9-10: l9lab9, l9nice(n,p), l9n(n,p): Poisson/normal appr. to binomial, l9demere, l9binom, dbinom, pbinom, qbinom, rbinom,

trellis functions: coplot and xyplot with panel.lowess, panel.lowess2, panel.lowess2id or panel.lowess.2idp (if dimnames(data)[[1]] fails),
panel.lowess2ids, panel.lowess2idall (wow!)

l9m22() and l9m222() (panel.lowess2idp vs. panel.lowess2ids; you can print these graphs in landscape mode, after a 125% rescaling)
The rich get richer (fair games) + prop. of exp. value + Wald's eq. (local copy) --  (Durrett) Section4.5;


family planning (China's one child policy [from Wikipedia] and update [from The Atlantic Monthly] from fall of 2013)  --  (Durrett) Example 1.24; update from the Washington Post from spring of 2019; and a strange follow up article
Hw #8 due,

Q&A: Binomial vs. Poisson distribution vs. table-based calculations, most likely values, (more family planning for gender equality/balance) --  (Durrett) Sections 2.2-3);

fair games, natural fluctuations in fair games (Gambler's Ruin -- (Durrett) Section 4.5): why fairness and equal opportunity are different concepts; also see Gambler's Ruin vs. Kelly's Betting in horse race betting))
Apr 10, W11  

Class 30:

Exam 2

(see help with calculators)  

 

Lab 10-11: l9lab10, l9trellis, l9trellis2, l9n(n,p), sapply, normal distribution, inverse problems,

(to Labs 12-14): l9m6(),
confidence intervals, t-test, normal probability plots, power normal transformation, l9qqnorm

review: l9m22() and l9m222() 

exam solutions,

normal distributions --  (Durrett) Sections 6.4-5, inverse problems and normal distr. w/S-PLUS, CLT, normal appr. to binomial

(a little fun: Riemann sums->integrals or see local version)
for instance, in "Graph the Riemann sum of " enter 1/Sqrt[2Pi] Exp[-x^2/2] and in "as x goes from" enter e.g., -1 "to" 1 then check "Estimated and Actual Areas"

(in case of concerns, please see me regarding your final project; team memberships are due)
 

Hw #9 due,

grading on the curve, more on CLT,
the IQ distribution ~ N[μ=100,
σ=15],
SAT  (range: 0-1600, ?~N[μ=1055,
σ=200]?) breakdown state by state (local copy),
ACT (range 1-36, ?~N[μ=21,
σ=5.4]?) (local copy) and ACT percentiles (local copy);
normal appr. to binomial,
conf. intervals, hypothesis testing,  t-test, review, Q&A;

[for a good textbook/resource with statistical examples/exercises/coverage look for the class homepage (cf. last item under
"Texts, people and calculators")]

Apr 17, W12 Hw #10 due,
p-value, more on CIs and hypothesis testing: t.test,
public opinion poll of ≈1,100 people to estimate the preference rate within ±3% with 95% reliability (cf. election polls: example 1 and example 2 (?), methodology);
how about a different voting system? exhibit 1 and exhibit 2 (with a little more background); and a similar, historic system: exhibit 3 and exhibit 4
the power of your vote--
when your vote does matter: a vote with a tie (see local version or in pdf);
when your vote would have mattered: random drawing scheduled to break tie in disputed house race (local copy) and its outcome (local copy);
Swing vote (2018) movie

Lab #12
(Founders Day)

Exam #3

Class 35: hypothesis testing, power normal transformations: l9thp, l9qqnorm--theoretical QQ-plots

l9qqnorm--theoretical QQ-plots,
power normal transformations: diagnostic tools, l9power called by l9thp,

lin. regr., lin. regr. models, more on testing

 

Hw #11due,
more on lin. regr., (e.g., stepwise linear regr.) ???;
back to constructing boxplots: from IQR to the "effective" range of the variable in standard deviation units, hypothesis testing for equal medians;
cor.test(x,y);
l9thp(rnorm(100000,10,1)^.5,mark=50,to=3)

Apr 24, W13


sharing data with your partner(s): YOU DO: share(), {repeat: CHECK, x<-x}, detach(1); 
SHE DOES: share(), l9ret(), {repeat: CHECK, x<-x}, share(), detach(1);

 

Lab 13: to generate rolls of a die and coin flips: rsample(6,100,T) (=> l9dice(n=2,m=10000)) and
l900coinflips(), l9m9(19) to simulate the length distribution of the max run,
t.test,  l9qqnorm--theoretical QQ-plots, rnorm, 
l9thp(...,mark=, from=, to=), l9.1(x) --diagnostic tool (w/4 panels),
l9.0(x) --numerical measures of central tendency and spread of data,
l9.2(x,y) --diagnostics for simple regression
Lab 14: review

review (Last day of classes),
review
(May 1)

 Projects are due: MONDAY, May 2, 11:30 a.m.
in Fowler 322

 

FINAL EXAM
WED., May 4,
1:00-4:00 p.m.
in Fowler 307
 

Occidental's Calendar: Academic Calendar
(cf.  the Registrar's webpage with the appropriate link final exams)

Final Exam is given in Fowler 307, from from 1:00 p.m. to 4:00 p.m., on Wednesday, May 4, 2022.
Don't forget your project which is due 11:30 a.m., on Monday, May 2, 2022 in my office (Fowler 322). (Please note the difference from the in class final exam time!!!)

 


last updated: 02/25/2022, by tl