Statistics 435
Nonparametric Statistical Methods
Fall 2009
William F. Christensen
Professor, Department of Statistics
219 TMCB
801-422-7057
william “at” stat “dot” byu “dot” edu
http://statistics.byu.edu/faculty/wfc
Office Hours:
Tues & Thurs 9:00-10:30 a.m.,
or by appt.
(see announcement below about
temporary time changes 11/3 – 11/12)
ANNOUNCEMENTS
11/6/2009: Note that the final exam is Mon Dec 14, 2:30 – 5:30
p.m. in Room 299 (the wrong date was previously posted).
11/6/2009: Mini-project 2 description and accompanying data set
(steps.xls) are posted below.
10/19/2009: Note that on 11/3, 11/5, 11/10, and 11/12, I will have
to change my office hours to 9-9:30 and 11-11:30. I will also try to be
available for questions on 11/4 and 11/11. Thanks in advance for your
cooperation.
9/1/2009: To save you time entering data sets by hand, the
following link may be helpful:
9/1/2009: All HW must be turned in by the due date unless you
have received permission from me. Please do not ask the grader to grade the
work unless it has my signature on it. I understand that things will happen to
complicate things occasionally, but rarely should anyone expect to get more
than one short extension during the semester.
Course Description and Objectives
In
this course, we address several approaches for estimation and hypothesis
testing when no underlying data distribution is assumed. Models for categorical
data are also considered. The content of the course focuses on methods which
are applicable to a wide range of modern statistical methods. Thus, classical
rank-based nonparametric methods are discussed, but greater emphasis is placed
on the generally more flexible (but computationally intensive) tools such as
permutation tests, bootstrap methods, and curve smoothing. Objectives for the
course include:
o identifying appropriate analyses given assumptions
about the problem
o providing meaningful analysis of data using
nonparametric methods
o effectively communicating findings and conclusions
Prerequisites
Required:
Stat 336 (Statistical Methods 1), Stat 337 (Statistical Methods 2) or
concurrent enrollment
Lectures
2:30 – 3:50, TTh; 299 TMCB
Course
Materials
* Textbook: Introduction to Modern Nonparametric Statistics (we’ll call it “JJH”)
by J. J.
Higgins, Duxbury (Thomson)
*
Supplementary textbook: Categorical Data
Analysis Using the SAS System, 2nd Edition (we’ll call it
“SDK”)
by M. E. Stokes, C. S.
Davis, and G. G. Koch (SAS Institute, Inc.); I have a copy you can borrow for
our limited use; do not purchase unless you’d like to have the book for
future reference
*
Lecture notes: Stat 435 Lecture Notes are available below. I recommend printing
2 slides to a page.
o
chap0.pdf (chap0twopage.pdf)
o
chap1.pdf (chap1twopage.pdf)
o
chap2.pdf (chap2twopage.pdf)
o
chap3.pdf (chap3twopage.pdf)
o
chap4.pdf (chap4twopage.pdf)
o
chap5.pdf (chap5twopage.pdf)
o
chap5B.pdf
(chap5Btwopage.pdf)
o
chap8.pdf (chap8twopage.pdf)
o
chap10.pdf
(chap10twopage.pdf)
o
ParametricVsNonparametricChap1-5.pdf
Grader
Scott
Morris (morris.scottlee “at” gmail “dot” com)
Office Hours: M 9-10, Th 11-12 in 198 TMCB
Grading
Your semester grade will be determined as follows:
|
Midterm
Exam |
20% |
Date
& Time: Oct 27-30 in |
|
Final
Exam |
30% |
Date
& Time: Mon, Dec 14, 2:30 – 5:30 p.m. in Room 299 |
|
Homework |
20% |
Mostly
textbook and textbook-like problems |
|
Mini-Projects |
30% |
Data
analysis and report |
Notes
on grading of homework problems:
Please
make sure that when SAS or S-Plus output is included, it is clearly highlighted
and annotated. I won't sort through pages of output to verify your conclusions.
Notes
on grading of mini-projects and other reports:
Your
project grades will be based on two equally weighted areas:
"technical" and "exposition." Below is a description of
what is expected for each of the two areas.
*
Technical
o
Evidence
of substantial breadth and/or depth of analysis
o
Proper
implementation of statistical methods
o
Well-documented
computer code (SAS, R, etc.) attached to back of report. (These pages will not count
against the page limit for the report.)
* Exposition: At a level appropriate for the target audience,
the report has the following qualities:
o
Introduction
with explanation of the problem and the important associated issues
o
Motivation
and justification for statistical methods being used
o
Understandable
interpretation and conclusions
o
Professional
and attractive document
-
free
of spelling or other writing errors
-
conforming
to specifications
-
all
included tables and figures are discussed in the text
-
placing
figures and tables in the body of the text instead of placing them in
appendices is strongly recommended
o Important findings summarized in a brief conclusion
o
When introducing technical concepts, give both: (1) a technical
definition (formula) AND (2) an intuitive explanation of the technical
concept/statistic.
IMPORTANT
NOTE: You are really writing to two audiences simultaneously. First, you are
writing to your client for the purpose of solving his problem and explaining
the solution at his level. Second, you are writing to your professor to
demonstrate your mastery of the subject. This is a difficult task. Save at
least a couple of days just for writing and revising—even masterful
analysis cannot salvage a poorly written report.
Tentative
Schedule & Textbook Sections for JJH and SDK
|
DATE |
TOPIC & READING ASSIGNMENT (to be
completed in advance) |
HW due at 2:30 pm |
|
9/1
(#1) |
0. Introductory
Concepts (Chapter 0 in JJH) 1. One-Sample Methods
(Chapter 1 in JJH) |
|
|
9/3
(#2) |
2. Two-Sample Methods
(Chapter 2 in JJH) |
|
|
9/8
(#3) |
(continued) |
1,
2, 3, 4, 5 |
|
9/10
(#4) |
(continued) |
|
|
9/15
(#5) |
(continued) |
6,
7, 8, 9, 10, 11 |
|
9/17
(#6) |
(continued) |
|
|
9/22
(#7) |
3. K-Sample Methods
(Chapter 3 in JJH) |
12,
13, 14 |
|
9/24
(#8) |
(continued) |
|
|
9/29
(#9) |
4. Paired Comparisons
and Blocked Designs (Chapter 4 in JJH) |
15,
16, 17, 18, |
|
10/1
(#10) |
(continued) |
|
|
10/6
(#11) |
(continued) |
19,
20, 21, 22, 23, 24 |
|
10/8
(#12) |
5. Tests for Trends and
Association (Chapter 5 in JJH) |
|
|
10/13
(#13) |
(continued) |
25,
26 |
|
10/15
(#14) |
(continued) |
|
|
10/20
(#15) |
(continued) |
27,
28 |
|
10/22
(#16) |
(continued) |
|
|
10/27 |
---
class cancelled --- |
29, 30, 31, 32, 33, 34 (turn in to TA: Scott Morris by 5:00 p.m.) |
|
|
MIDTERM EXAM 10/27
– 10/30 in the Testing Center |
|
|
10/29
(#17) |
5B.
Logistic Regression for Dichotomous Responses (Chapter 8 in SDK) |
|
|
11/3
(#18) |
(continued) |
|
|
11/5
(#19) |
(continued) |
|
|
11/10
(#20) |
(continued) |
35 |
|
11/12
(#21) |
8.
Nonparametric Bootstrap Methods (Chapter 8 in JJH) |
|
|
11/17
(#22) |
(continued) |
36 |
|
11/19
(#23) |
(continued) |
|
|
|
----Thanksgiving
Week – No Class---- |
|
|
12/1
(#24) |
(continued) |
-- |
|
12/3
(#25) |
(continued) |
37,
38 |
|
12/8
(#26) |
10. Smoothing Methods
and Robust Model Fitting (Chapter 10 in JJH) |
No
HW for Chapter 10 – practice
problems only |
|
12/10
(#27) |
(continued) |
Mini-Project
#2 (see
Assignments below) |
|
12/14
(Mon) |
FINAL
EXAM 2:30 – 5:30 p.m. in classroom |
|
ERRORS
IN CLASS NOTES (email corrections to william@stat.byu.edu):
ASSIGNMENTS
(note:
“#1.2” means problem 2 from chapter 1)
Chapter 1
1.
#1.1 Calculate p-value using: (a) binomial distribution, (b) normal approximation
without continuity correction, (c) normal approximation with continuity
correction)
2.
Repeat #1.1 but test whether the median exam score is equal to 75.
3.
#1.2
4.
#1.5
5.
#1.7
Chapter 2
6.
#2.1
7.
#2.3
8.
#2.4
9.
#2.6
10.
#2.7
11.
#2.8 (include in your comparisons a permutation test based on medians)
12.
#2.10 (but instead of using the data in Exercise 4, use the Multivariate
Analysis Exam data presented on page 2.15 of the notes and in chap2.ssc)
13.
#2.12
15.
Using the data on page 105 of your text (described in problem #3.2), compare
the femur loads from the 1700 lb vehicles with the femur loads from the 3700 lb
vehicles.
a.
Test to
see if the 3700 lb vehicles have larger loads (one-tailed test). Also, test
whether the loads are different from each other (two-tailed test). Is the two
tailed p-value equal to 2 times the one tailed p-value? Why or why not?
b. Repeat part a, but conduct the tests based on the ranks of the femur loads.
c.
Test to
see if the scale for the 1700 lb vehicles is the same as for the 3700 lb
vehicles. Which test procedure is most appropriate and why?
d.
Create a
plot comparing the empirical cdf’s for the two
groups. Use the K-S test to evaluate whether the distributions of the scores
for the two groups are the same.
e.
Comment on
the p-values in parts a, c, and d, discussing the advantages and disadvantages
of using K-S to detect differences in populations.
16. Use
the data from problem #2.4 to compare approaches for conducting the Wilcoxon rank-sum test. Obtain two-sided p-values from the
permutation distribution and from the large-sample normal approximation.
Chapter 3
17.
#3.2
18. Using
the data from problem #3.2, carry out the Kruskal-Wallis
test by conducting a permutation test on the ranks of the data. Compare the
p-value obtained from the permutation distribution with the p-value obtained
using the approximate chi-square distribution for the KW statistic.
19.
Consider comparing all pairs of treatment groups for the data in problem #3.2.
Determine which treatments are significantly different using (a) Bonferroni method, (b) Fisher’s LSD method, and (c) Tukey’s HSD method.
20.
#3.9
21.
#3.10
22.
#3.11 (do not carry out any formal tests—just provide reasoning for all
your answers)
Chapter 4
23.
Using the file reactiontime.txt, compare the reaction times of 20 subjects
before drinking alcohol (1st column) and after drinking alcohol (2nd
column).
a.
Test to see if
reactions are slower after drinking alcohol using a permutation test on mean
differences.
b.
Address the same
hypotheses using the Wilcoxon Signed-Rank test.
c.
Address the same
hypotheses using the Sign test.
d.
Address the same
hypotheses using the standard normal-theory paired t-test.
e.
Compare your
answers for parts a-d and discuss the relative advantages of each of the 4
tests.
24.
Using the file skulls2.txt, test the hypothesis that the median MaxBreadth is
130.5 against the hypothesis that the median MaxBreadth
is not equal to 130.5. In skulls2.txt,
the first column is “MaxBreadth”, the
second column is “BasHeight”, the third
column is “BasLength”, and the fourth
column is “NasHeight”, and the fifth
column is “Period”.
a.
Use the approach
discussed in Chapter 1.
b.
Use the approach
discussed in Chapter 4 for testing the median of a symmetric distribution.
c.
Compare your
answers for parts a-b and discuss the relative advantages of each of the 2
tests.
25.
Use the file marketing.txt to compare the relative sales success of 5 different
shelf heights in marketing a product. The first column is sales ($), the second
column is treatments (shelf height), and the third column is day of week (the
blocking factor).
a.
Use the RBCD
permutation test comparing treatment means to evaluate the null hypothesis that
the treatment means are equal.
b.
Use the Friedman
test to evaluate the same hypothesis.
c.
Suppose that the
alternative hypothesis you wanted to test a priori is Ha: t3 <= t1 <= t4
<= t2 <= t5. Conduct this test (you may need to re-number the treatments
in order to use my code from class.)
d.
Compare the
results of the tests in parts a and c and explain why you did or did not get
similar results.
26.
The file fish.txt gives judges scores to cooked fish that were prepared using
each of three different methods. The judges scored each fish dish on aroma,
flavor, texture, and moisture. Use the file fish.txt to assess whether or not
judges are in agreement on the three methods. For which of the 4 variables
(aroma, flavor, texture, moisture) do the judges agree? [Note: the first column
is the method, and the following columns are aroma, flavor, texture, and
moisture. The judge/block variable can be created using “block <-
rep(1:12,3)”.]
Chapter 5
27. #5.3
28. In
the 1991 General Social Survey (NORC, 1991), white subjects were asked: (1)
“Do you favor busing of black and white school children from one school
district to another?” and (2) “If your party nominated a black
person for President, would you vote for him if he were qualified for the
job?” Responses were yes/no/don’t know. The table is given below.
Is there a relationship between attitude on busing and attitude on electing a
black President? (Choose your method carefully.)
|
|
Yes
(Pres) |
No
(Pres) |
Don’t
know (Pres) |
|
Yes
(busing) |
106 |
7 |
4 |
|
No
(busing) |
228 |
47 |
10 |
|
Don’t
know (busing) |
18 |
1 |
1 |
29. In
the 1988 General Social Survey (NORC, 1988), white subjects were asked: (1)
“Do you support or oppose having the government pay all health care costs
of AIDS patients?” and (2) “Do you support or oppose a government
information program to promote safe sex practices, such as the use of
condoms?” The table is given below. Is there a relationship between
attitude on the funding question and the information question? Compare your
answer based on a chi-square statistic with your answer based on Fisher’s
exact test.
|
|
Support
(info) |
Oppose
(info) |
|
Support
(funding) |
190 |
341 |
|
Oppose
(funding) |
17 |
73 |
30.
Promotions during a three-month period among black and white
|
|
Promoted |
Not
promoted |
|
Black |
0 |
22 |
|
White |
10 |
42 |
31. Response
to sequential chemotherapy for lung cancer is recorded for each of a sample of
male and female patients. Is there reason to believe that one of the genders
responds better to chemotherapy than the other?
|
|
Progressive
Disease |
No
Change |
Partial
Remission |
Complete
Remission |
|
Male |
28 |
45 |
29 |
26 |
|
Female |
4 |
12 |
5 |
2 |
32. #5.9
33. #5.11
34. #5.12
Chapter 5B (Logistic Regression)
35. LR #1 nparhwLogisticRegression.pdf
36. LR #2 nparhwLogisticRegression.pdf
Chapter 8 (Bootstrap)
37. BS #1 nparhwbootstrap.pdf
38. BS #2 nparhwbootstrap.pdf
Some Hints for HW Problems (use at your own risk!)
Mini-Project #1 – Due Nov 5 in
class.
Mini-Project #2 – Due Dec 10 at
11:59 p.m.
DATA
TimeSeriesElectionData.csv (first column=state name,
second column=old data for obama, third column=old
data for mccain, fourth column=sample size for old
data, fifth column=Aug 1 for obama, sixth column=Aug
1 for mccain, seventh column=sample size for Aug 1,
etc.)
places.row to read places into
an R data frame, use
placemat.row <- scan("C:/xgobi/data_xgobi/places.row",what=character(),
sep="\n")
placemat.col <- scan("C:/xgobi/data_xgobi/places.col",what=character(),
sep="\n")
placemat <- read.table("C:/xgobi/data_xgobi/places", col.names=placemat.col, row.names=placemat.row)
CLASS
EXAMPLES
Splus.intro.ssc (see also the Splus Cheat Sheet available at StatLib—Splus and R are very similar.)
--
COMBINAT.Q (this
collection of functions includes “combn”
which we use extensively in the early part of the course)
NOTE: make sure that you
have read in the COMBINAT.Q functions into your R .Rdata
directory since "combn" does not exist in
R. You can do this by following this procedure:
(1) On the course
webpage, there are links to COMBINAT.Q and chap2.R. Download COMBINAT.Q and
place it somewhere on your machine.
(2) Then submit the first
line of chap2.R which reads in the COMBINAT.Q functions--just make sure to change
the file reference to represent the location of COMBINAT.Q on your machine.
You should now have
access to all the functions in COMBINAT.Q. To get information on any function
in COMBINAT.Q, you can just type "help(FUNCTIONNAME)", e.g.,
"help(combn)"
HONOR CODE STANDARDS
In keeping
with the principles of the BYU Honor Code, students are expected to be honest
in all of their academic work. Academic honesty means, most fundamentally, that
any work you present as your own must in fact be your own work and not that of
another. Violations of this principle may result in a failing grade in the
course and additional disciplinary action by the university.
Students are also expected to adhere to the Dress and Grooming Standards. Adherence demonstrates respect for yourself and others and ensures an effective learning and working environment. It is the university’s expectation, and my own expectation in class, that each student will abide by all Honor Code standards. Please call the Honor Code Office at 422-2847 if you have questions about those standards.
PREVENTING SEXUAL DISCRIMINATION OR HARASSMENT
Title
IX of the Education Amendments of 1972 prohibits sex discrimination against any
participant in an educational program or activity that receives federal funds.
The act is intended to eliminate sex discrimination in education and pertains
to admissions, academic and athletic programs, and university-sponsored
activities. Title IX also prohibits sexual harassment of students by university
employees, other students, and visitors to campus. If you encounter sexual
harassment or gender-based discrimination, please talk to your professor;
contact the Equal Employment Office at 801-422-5895 or 1-888-238-1062
(24-hours), or http://www.ethicspoint.com; or contact
the Honor Code Office at 801-422-2847.
STUDENTS WITH DISABILITIES
If
you have a disability that may affect your performance in this course, you
should get in touch with the office of Services for Students with Disabilities
(1520 WSC). This office can evaluate your disability and assist the professor
in arranging for reasonable accommodations.