15.075J | Fall 2011 | Undergraduate

Statistical Thinking and Data Analysis

Calendar

This class will cover the following topics from the course textbook:

Tamhane, Ajit C., and Dorothy D. Dunlop. Statistics and Data Analysis: From Elementary to Intermediate. Prentice Hall, 1999. ISBN: 9780137444267.

CHAPTERS TITLES TOPICS
2 Review of Probability

  • Computing probabilities: conditional probabilities and Bayes’ rule
  • Quantiles/percentiles, CDF’s, mean, median, variance, standard deviation, covariance, various distributions and what they are used for (particularly Bernoulli, Binomial, Multinomial, Hypergeometric, Poisson, normal)
3 Collecting Data

  • Sampling terminology: convenience sampling, SRS, stratified random sampling, multistage cluster sampling, 1 in K
4 Summarizing and Exploring Data

  • Summarizing univariate data: numerically (sample mean, IQR, etc.) and by plotting (pie/bar/pareto chart for categorical data, histogram, box plot, normal plot)
  • Summarizing bivariate data: Simpson’s paradox, scatter plot, sample correlation coefficient
  • Time series: MA, EWMA, forecast error and MAPE, auto-correlation coefficient
Exam 1: Chapters 2-4
5 Sampling Distributions of Statistics

  • Normal approximation to binomial distribution (which relies on the CLT), computing probabilities with chi-square distribution, t-distribution, F-distribution
6 Basic Concepts of Inference

  • Bias, MSE, setting up hypotheses, Type I error, Type II error, power
  • For z-test: z-scores, p-values, confidence intervals
7 Inferences for Single Samples

  • Sample size calculation for confidence intervals on z-test, sample calculation for z-test, sample size calculation for power on z-test, t-test, chi-square test for variance
8 Inferences for Two Samples

  • QQ plots
  • Comparison of two means for independent samples design (large samples z-test, small sample t-test using either a pooled variance or the Welch-Sattethwaite method)
  • Comparison of two means for matched pairs design (t-test, power and sample-size calculation for power)
  • Comparison of variance using the F-test
Exam 2: Chapters 5-8
9 Inferences for Proportions and Count Data

  • Comparison to a given proportion using large sample z-test, sample size calculation for confidence intervals
  • Comparison of two proportions using large sample z-test
  • Chi-square test (multinomial and goodness of fit)
10 Similar Linear Regression and Correlation

  • Computing the least square line, computing r^2, hypothesis testing on beta_1, understanding ANOVA regression tables
  • Checking model assumptions and transforming data
11 Multiple Linear Regression

  • Understanding ANOVA regression tables, t-tests on individual regression coefficients
  • Multicollinearity
  • Logical regression
Exam 3: Chapters 9-11
14 Nonparametric Statistical Methods

  • Comparison to a given median using: sign test, Wilcoxon signed rank test (these tests can also be used on the di’s for matched pairs)
  • Comparison of two distributions using Rank Sum test or MWU test
  • Rank correlation methods: Spearman’s rank coefficient, Kendall’s Tau
Exam 4: Chapters 12, 14
Learning Resource Types
Problem Sets with Solutions
Exams with Solutions
Lecture Notes
Programming Assignments with Examples