#---------------------------------------------------------
# File:   MIT18_05S22_in-class19-script.txt
# Author: Jeremy Orloff
#
# MIT OpenCourseWare: https://ocw.mit.edu
# 18.05 Introduction to Probability and Statistics
# Spring 2022
# For information about citing these materials or our Terms of Use, visit:
# https://ocw.mit.edu/terms.
#
#---------------------------------------------------------
Class 19 Gallery of NHST

Jerry
  Slide 1:

  Slide 2: Announcements/Agenda (2 minutes)

  Slides 3,4: Discussion of studio 7 (5 minutes)
     Usual point: frequentist methods cannot give P(hypothesis)
     Simulating probabilities means counting occurences

  Slide 5: Concept question: t-test odds (5 minutes)
     Significance is not probability of hypotheses

  Slide 6: review of NHST (2 minutes)
    Quickly

  Slide 7-9: chi-square example (6 minutes)
     Note this is a chi-square test because the test stat is (approximately) chi-square. The reason for this is straightforward but complicated. We won't give it.
     NOT TO SAY IN CLASS: Note: G is called the likelihood ratio statistic. It is actually exp(G) which is the likelihood ratio --see the book by Rice.


Jen
  Slide 10, 11: BQs (Khans restaurant and genetic linkage (Work 15 minutes, discuss 8 minutes)
     Have them do both before discussing
     Have the fast groups open the slides on MITx and do the second problem.
     DISCUSSION Khan
       Can look at which cells didn't match.
       In this case, M,S,T are the three biggest contributers to the X2 stat.
     DISCUSSION Genes-- Someone will be able to explain the biology
       (The genes are close together on the same chromosome. So they tend to be inherited together)

  Slide 12,13:  the F distribuion, F-test (4 minutes)
     Briefly: Can look this up. In the reading.
     All we need to know is that it is the null distribution for an F-test and has mean \approx 1.
     I found an online reference that says that when the counts are equal this is robust to differences in the variances.

  Slide 14:  ANOVA  (Work 8 minutes, discuss 5 minutes)
     DISCUSSION: Assume: recovery times follow a normal distribution with same variance.
     After that: plug and chug

  Slide 15ab  Concept questions (two of them)  Multiple testing (6 minutes)
     The second question is tricky because the pairs are not independent.
     The short answer is, that even with dependence, 15 different
     comparisons gives a probability much greater than 0.05.
     An R simulation with normal data puts the prob of at least one rejection at about 0.36

     DON'T DO THIS IN CLASS: One low estimate is:
     There are at least 3 independent pairs of tests so
     P(at least one rejection) > 1 - pbinom(0,3,.05) = .14

  Slide 16ab Discussion: (2 minutes)
     There is a pause in this slide
     For the CQ: we have the F-test to test if all are the same

Only if time.
  Slide 17:  chi-square for independence (Work to end of class)