Statistics 435
Nonparametric Statistical Methods
Fall 2009

 

William F. Christensen
Professor, Department of Statistics
219 TMCB
801-422-7057

 

william “at” stat “dot” byu “dot” edu
http://statistics.byu.edu/faculty/wfc

 

Office Hours:
Tues & Thurs 9:00-10:30 a.m.,

or by appt.

(see announcement below about temporary time changes 11/3 – 11/12)



ANNOUNCEMENTS

 

11/6/2009: Note that the final exam is Mon Dec 14, 2:30 – 5:30 p.m. in Room 299 (the wrong date was previously posted).

 

11/6/2009: Mini-project 2 description and accompanying data set (steps.xls) are posted below.

 

10/19/2009: Note that on 11/3, 11/5, 11/10, and 11/12, I will have to change my office hours to 9-9:30 and 11-11:30. I will also try to be available for questions on 11/4 and 11/11. Thanks in advance for your cooperation.

 

9/1/2009: To save you time entering data sets by hand, the following link may be helpful:

http://www.duxbury.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&discipline_number=17&product_isbn_issn=0534387756

 

9/1/2009: All HW must be turned in by the due date unless you have received permission from me. Please do not ask the grader to grade the work unless it has my signature on it. I understand that things will happen to complicate things occasionally, but rarely should anyone expect to get more than one short extension during the semester.

 



Course Description and Objectives

In this course, we address several approaches for estimation and hypothesis testing when no underlying data distribution is assumed. Models for categorical data are also considered. The content of the course focuses on methods which are applicable to a wide range of modern statistical methods. Thus, classical rank-based nonparametric methods are discussed, but greater emphasis is placed on the generally more flexible (but computationally intensive) tools such as permutation tests, bootstrap methods, and curve smoothing. Objectives for the course include:

o       identifying appropriate analyses given assumptions about the problem

o       providing meaningful analysis of data using nonparametric methods

o       effectively communicating findings and conclusions

 

Prerequisites

Required: Stat 336 (Statistical Methods 1), Stat 337 (Statistical Methods 2) or concurrent enrollment

 

Lectures
2:30 – 3:50, TTh; 299 TMCB

 

Course Materials
* Textbook: Introduction to Modern Nonparametric Statistics (we’ll call it “JJH”)

by J. J. Higgins, Duxbury (Thomson)

 

* Supplementary textbook: Categorical Data Analysis Using the SAS System, 2nd Edition (we’ll call it “SDK”)

by M. E. Stokes, C. S. Davis, and G. G. Koch (SAS Institute, Inc.); I have a copy you can borrow for our limited use; do not purchase unless you’d like to have the book for future reference

 

* Lecture notes: Stat 435 Lecture Notes are available below. I recommend printing 2 slides to a page.

o       chap0.pdf (chap0twopage.pdf)

o       chap1.pdf (chap1twopage.pdf)

o       chap2.pdf (chap2twopage.pdf)

o       chap3.pdf (chap3twopage.pdf)

o       chap4.pdf (chap4twopage.pdf)

o       chap5.pdf (chap5twopage.pdf)

o       chap5B.pdf (chap5Btwopage.pdf)

o       chap8.pdf (chap8twopage.pdf)

o       chap10.pdf (chap10twopage.pdf)

o       BayesianVsFreqNotes.pdf

o       ParametricVsNonparametricChap1-5.pdf

 

 

Grader

Scott Morris (morris.scottlee “at” gmail “dot” com)

Office Hours: M 9-10, Th 11-12 in 198 TMCB

 

 

Grading
Your semester grade will be determined as follows:

Midterm Exam 

20%

Date & Time: Oct 27-30 in Testing Center (any time of day that you want…2 hour limit)

Final Exam 

30%

Date & Time: Mon, Dec 14, 2:30 – 5:30 p.m. in Room 299

Homework

20%

Mostly textbook and textbook-like problems

Mini-Projects

30%

Data analysis and report

 

Notes on grading of homework problems:

 

Please make sure that when SAS or S-Plus output is included, it is clearly highlighted and annotated. I won't sort through pages of output to verify your conclusions.

 

Notes on grading of mini-projects and other reports:

 

Your project grades will be based on two equally weighted areas: "technical" and "exposition." Below is a description of what is expected for each of the two areas.

 

* Technical

o        Evidence of substantial breadth and/or depth of analysis

o        Proper implementation of statistical methods

o        Well-documented computer code (SAS, R, etc.) attached to back of report. (These pages will not count against the page limit for the report.)

* Exposition: At a level appropriate for the target audience, the report has the following qualities:

o        Introduction with explanation of the problem and the important associated issues

o        Motivation and justification for statistical methods being used

o        Understandable interpretation and conclusions

o        Professional and attractive document

-          free of spelling or other writing errors 

-          conforming to specifications 

-          all included tables and figures are discussed in the text

-          placing figures and tables in the body of the text instead of placing them in appendices is strongly recommended

o        Important findings summarized in a brief conclusion

o        When introducing technical concepts, give both: (1) a technical definition (formula) AND (2) an intuitive explanation of the technical concept/statistic.

IMPORTANT NOTE: You are really writing to two audiences simultaneously. First, you are writing to your client for the purpose of solving his problem and explaining the solution at his level. Second, you are writing to your professor to demonstrate your mastery of the subject. This is a difficult task. Save at least a couple of days just for writing and revising—even masterful analysis cannot salvage a poorly written report.

Tentative Schedule & Textbook Sections for JJH and SDK

DATE

TOPIC & READING ASSIGNMENT (to be completed in advance)

HW due at 2:30 pm

9/1 (#1)

0. Introductory Concepts (Chapter 0 in JJH)

1. One-Sample Methods (Chapter 1 in JJH)

 

9/3 (#2)

2. Two-Sample Methods (Chapter 2 in JJH)

 

9/8 (#3)

(continued)

1, 2, 3, 4, 5

9/10 (#4)

(continued)

 

9/15 (#5)

(continued)

6, 7, 8, 9, 10, 11

9/17 (#6)

(continued)

 

9/22 (#7)

3. K-Sample Methods (Chapter 3 in JJH)

12, 13, 14

9/24 (#8)

(continued)

 

9/29 (#9)

4. Paired Comparisons and Blocked Designs (Chapter 4 in JJH)

15, 16, 17, 18,

10/1 (#10)

(continued)

 

10/6 (#11)

(continued)

19, 20, 21, 22, 23, 24

10/8 (#12)

5. Tests for Trends and Association (Chapter 5 in JJH)

 

10/13 (#13)

(continued)

25, 26

10/15 (#14)

(continued)

 

10/20 (#15)

(continued)

27, 28

10/22 (#16)

(continued)

 

10/27

--- class cancelled ---

29, 30, 31, 32, 33, 34

(turn in to TA: Scott Morris

by 5:00 p.m.)

 

MIDTERM EXAM 10/27 – 10/30 in the Testing Center

 

10/29 (#17)

5B. Logistic Regression for Dichotomous Responses (Chapter 8 in SDK)

 

11/3 (#18)

(continued)

 

11/5 (#19)

(continued)

Mini-Project #1

11/10 (#20)

(continued)

35

11/12 (#21)

8. Nonparametric Bootstrap Methods (Chapter 8 in JJH)

 

11/17 (#22)

(continued)

36

11/19 (#23)

(continued)

 

 

----Thanksgiving Week – No Class----

 

12/1 (#24)

(continued)

--

12/3 (#25)

(continued)

37, 38

12/8 (#26)

10. Smoothing Methods and Robust Model Fitting (Chapter 10 in JJH)

No HW for Chapter 10 –

practice problems only

12/10 (#27)

(continued)

Mini-Project #2

(see Assignments

below)

 

12/14 (Mon)

FINAL EXAM 2:30 – 5:30 p.m. in classroom

 

 


 

 

ERRORS IN CLASS NOTES (email corrections to william@stat.byu.edu):

 

 


 

 

 

ASSIGNMENTS (note: “#1.2” means problem 2 from chapter 1)

Chapter 1

1. #1.1 Calculate p-value using: (a) binomial distribution, (b) normal approximation without continuity correction, (c) normal approximation with continuity correction)

2. Repeat #1.1 but test whether the median exam score is equal to 75.

3. #1.2

4. #1.5

5. #1.7

 

Chapter 2

6. #2.1

7. #2.3

8. #2.4

9. #2.6

10. #2.7

11. #2.8 (include in your comparisons a permutation test based on medians)

12. #2.10 (but instead of using the data in Exercise 4, use the Multivariate Analysis Exam data presented on page 2.15 of the notes and in chap2.ssc)

13. #2.12

14. #2.15 from the book

15. Using the data on page 105 of your text (described in problem #3.2), compare the femur loads from the 1700 lb vehicles with the femur loads from the 3700 lb vehicles.

a.       Test to see if the 3700 lb vehicles have larger loads (one-tailed test). Also, test whether the loads are different from each other (two-tailed test). Is the two tailed p-value equal to 2 times the one tailed p-value? Why or why not?

b.      Repeat part a, but conduct the tests based on the ranks of the femur loads.

c.       Test to see if the scale for the 1700 lb vehicles is the same as for the 3700 lb vehicles. Which test procedure is most appropriate and why?

d.      Create a plot comparing the empirical cdf’s for the two groups. Use the K-S test to evaluate whether the distributions of the scores for the two groups are the same.

e.       Comment on the p-values in parts a, c, and d, discussing the advantages and disadvantages of using K-S to detect differences in populations.

16. Use the data from problem #2.4 to compare approaches for conducting the Wilcoxon rank-sum test. Obtain two-sided p-values from the permutation distribution and from the large-sample normal approximation.

 

Chapter 3

17. #3.2

18. Using the data from problem #3.2, carry out the Kruskal-Wallis test by conducting a permutation test on the ranks of the data. Compare the p-value obtained from the permutation distribution with the p-value obtained using the approximate chi-square distribution for the KW statistic.

19. Consider comparing all pairs of treatment groups for the data in problem #3.2. Determine which treatments are significantly different using (a) Bonferroni method, (b) Fisher’s LSD method, and (c) Tukey’s HSD method.

20. #3.9

21. #3.10

22. #3.11 (do not carry out any formal tests—just provide reasoning for all your answers)

 

Chapter 4

23. Using the file reactiontime.txt, compare the reaction times of 20 subjects before drinking alcohol (1st column) and after drinking alcohol (2nd column).

a.     Test to see if reactions are slower after drinking alcohol using a permutation test on mean differences.

b.     Address the same hypotheses using the Wilcoxon Signed-Rank test.

c.      Address the same hypotheses using the Sign test.

d.     Address the same hypotheses using the standard normal-theory paired t-test.

e.      Compare your answers for parts a-d and discuss the relative advantages of each of the 4 tests.

24. Using the file skulls2.txt, test the hypothesis that the median MaxBreadth is 130.5 against the hypothesis that the median MaxBreadth is not equal to 130.5. In skulls2.txt, the first column is “MaxBreadth”, the second column is “BasHeight”, the third column is “BasLength”, and the fourth column is “NasHeight”, and the fifth column is “Period”.

a.     Use the approach discussed in Chapter 1.

b.     Use the approach discussed in Chapter 4 for testing the median of a symmetric distribution.

c.      Compare your answers for parts a-b and discuss the relative advantages of each of the 2 tests.

25. Use the file marketing.txt to compare the relative sales success of 5 different shelf heights in marketing a product. The first column is sales ($), the second column is treatments (shelf height), and the third column is day of week (the blocking factor).

a.     Use the RBCD permutation test comparing treatment means to evaluate the null hypothesis that the treatment means are equal.

b.     Use the Friedman test to evaluate the same hypothesis.

c.      Suppose that the alternative hypothesis you wanted to test a priori is Ha: t3 <= t1 <= t4 <= t2 <= t5. Conduct this test (you may need to re-number the treatments in order to use my code from class.)

d.     Compare the results of the tests in parts a and c and explain why you did or did not get similar results.

26. The file fish.txt gives judges scores to cooked fish that were prepared using each of three different methods. The judges scored each fish dish on aroma, flavor, texture, and moisture. Use the file fish.txt to assess whether or not judges are in agreement on the three methods. For which of the 4 variables (aroma, flavor, texture, moisture) do the judges agree? [Note: the first column is the method, and the following columns are aroma, flavor, texture, and moisture. The judge/block variable can be created using “block <- rep(1:12,3)”.]

 

Chapter 5

27. #5.3

28. In the 1991 General Social Survey (NORC, 1991), white subjects were asked: (1) “Do you favor busing of black and white school children from one school district to another?” and (2) “If your party nominated a black person for President, would you vote for him if he were qualified for the job?” Responses were yes/no/don’t know. The table is given below. Is there a relationship between attitude on busing and attitude on electing a black President? (Choose your method carefully.)

 

Yes (Pres)

No (Pres)

Don’t know (Pres)

Yes (busing)

106

7

4

No (busing)

228

47

10

Don’t know (busing)

18

1

1

29. In the 1988 General Social Survey (NORC, 1988), white subjects were asked: (1) “Do you support or oppose having the government pay all health care costs of AIDS patients?” and (2) “Do you support or oppose a government information program to promote safe sex practices, such as the use of condoms?” The table is given below. Is there a relationship between attitude on the funding question and the information question? Compare your answer based on a chi-square statistic with your answer based on Fisher’s exact test.

 

Support (info)

Oppose (info)

Support (funding)

190

341

Oppose (funding)

17

73

30. Promotions during a three-month period among black and white U.S. government computer specialists are given below. Is there reason to believe that the rate of promotions is different for the two groups of employees? Compare your answer based on a chi-square statistic with your answer based on Fisher’s exact test.

 

Promoted

Not promoted

Black

0

22

White

10

42

31. Response to sequential chemotherapy for lung cancer is recorded for each of a sample of male and female patients. Is there reason to believe that one of the genders responds better to chemotherapy than the other?

 

Progressive Disease

No Change

Partial Remission

Complete Remission

Male

28

45

29

26

Female

4

12

5

2

32. #5.9

33. #5.11

34. #5.12

 

Chapter 5B (Logistic Regression)

35. LR #1 nparhwLogisticRegression.pdf

36. LR #2 nparhwLogisticRegression.pdf

 

Chapter 8 (Bootstrap)

37. BS #1 nparhwbootstrap.pdf

38. BS #2 nparhwbootstrap.pdf

 

 

Some Hints for HW Problems (use at your own risk!)

Chapter2&3HWHints.pdf

Chapter4HWHints.pdf

Solutions for Chapter 2

Solutions for Chapter 3

Solutions for Chapter 4

Solutions for Chapter 5

Solutions for Chapter 8

Solutions for Chapter 10

 

Mini-Project #1 – Due Nov 5 in class.

Mini-Project #2 – Due Dec 10 at 11:59 p.m.

 

 

DATA

 

stat1301scores.txt

skulls2.txt

survtime.txt

wear.txt

reagent.txt

soybean.txt

wart.txt

reactiontime.txt

marketing.txt

fish.txt

election2008.csv

TimeSeriesElectionData.csv (first column=state name, second column=old data for obama, third column=old data for mccain, fourth column=sample size for old data, fifth column=Aug 1 for obama, sixth column=Aug 1 for mccain, seventh column=sample size for Aug 1, etc.)

football.txt

seatbelt.txt

crab.txt

sleep.csv

places

places.col

places.row to read places into an R data frame, use

placemat.row <- scan("C:/xgobi/data_xgobi/places.row",what=character(), sep="\n")

placemat.col <- scan("C:/xgobi/data_xgobi/places.col",what=character(), sep="\n")

placemat <- read.table("C:/xgobi/data_xgobi/places", col.names=placemat.col, row.names=placemat.row)

steps.xls

 

 

CLASS EXAMPLES

Splus.intro.ssc (see also the Splus Cheat Sheet available at StatLibSplus and R are very similar.)

Sas.intro.sas

--

COMBINAT.Q (this collection of functions includes “combn” which we use extensively in the early part of the course)

NOTE: make sure that you have read in the COMBINAT.Q functions into your R .Rdata directory since "combn" does not exist in R. You can do this by following this procedure:

(1) On the course webpage, there are links to COMBINAT.Q and chap2.R. Download COMBINAT.Q and place it somewhere on your machine.

(2) Then submit the first line of chap2.R which reads in the COMBINAT.Q functions--just make sure to change the file reference to represent the location of COMBINAT.Q on your machine.

You should now have access to all the functions in COMBINAT.Q. To get information on any function in COMBINAT.Q, you can just type "help(FUNCTIONNAME)", e.g., "help(combn)"

chap2.R

chap3.R

chap4.R

chap5.R

chap5B.sas

chap8.R

chap10.R

 


 

HONOR CODE STANDARDS

In keeping with the principles of the BYU Honor Code, students are expected to be honest in all of their academic work. Academic honesty means, most fundamentally, that any work you present as your own must in fact be your own work and not that of another. Violations of this principle may result in a failing grade in the course and additional disciplinary action by the university.

Students are also expected to adhere to the Dress and Grooming Standards. Adherence demonstrates respect for yourself and others and ensures an effective learning and working environment. It is the university’s expectation, and my own expectation in class, that each student will abide by all Honor Code standards. Please call the Honor Code Office at 422-2847 if you have questions about those standards.

 

PREVENTING SEXUAL DISCRIMINATION OR HARASSMENT

Title IX of the Education Amendments of 1972 prohibits sex discrimination against any participant in an educational program or activity that receives federal funds. The act is intended to eliminate sex discrimination in education and pertains to admissions, academic and athletic programs, and university-sponsored activities. Title IX also prohibits sexual harassment of students by university employees, other students, and visitors to campus. If you encounter sexual harassment or gender-based discrimination, please talk to your professor; contact the Equal Employment Office at 801-422-5895 or 1-888-238-1062 (24-hours), or http://www.ethicspoint.com; or contact the Honor Code Office at 801-422-2847.

 

STUDENTS WITH DISABILITIES

If you have a disability that may affect your performance in this course, you should get in touch with the office of Services for Students with Disabilities (1520 WSC). This office can evaluate your disability and assist the professor in arranging for reasonable accommodations.