18.05 | Spring 2022 | Undergraduate

Introduction to Probability and Statistics

Syllabus

Course Meeting Times

Lectures: 2 sessions / week, 80 minutes / session

Studios: 1 session / week, 50 minutes / session

Prerequisites

18.02 Multivariable Calculus

Course Description

This course provides an elementary introduction to probability and statistics with applications. Topics include basic combinatorics, random variables, probability distributions, Bayesian inference, hypothesis testing, confidence intervals, and linear regression.

Broad Course Goals

  • Learn the language and core concepts of probability theory.
  • Understand basic principles of statistical inference (both Bayesian and frequentist).
  • Build a starter statistical toolbox with appreciation for both the utility and limitations of these techniques.
  • Use software and simulation to do statistics (R).
  • Become an informed consumer of statistical information.
  • Prepare for further coursework or on-the-job study.

Course Arc

  • Probability (uncertain world, perfect knowledge of the uncertainty)
    • Counting
    • Random variables, distributions, quantiles, mean variance
    • Conditional probability, Bayes’ theorem, base rate fallacy
    • Joint distributions, covariance, correlation, independence
    • Central limit theorem
  • Statistics I: pure applied probability (data in an uncertain world, perfect knowledge of the uncertainty)
    • Bayesian inference with known priors, probability intervals
    • Conjugate priors
  • Statistics II: applied probability (data in an uncertain world, imperfect knowledge of the uncertainty)
    • Bayesian inference with unknown priors
    • Frequentist significance tests and confidence intervals
    • Resampling methods: bootstrapping
    • Linear regression
  • Computation, simulation, and visualization using R and applets will be used throughout the course.

Specific Learning Objectives

Probability

Students completing the course will

  1. be able to use basic counting techniques (multiplication rule, combinations, permutations) to compute probability and odds;
  2. be able to use R to run basic simulations of probabilistic scenarios;
  3. be able to compute conditional probabilities directly and using Bayes’ theorem, and check for independence of events;
  4. be able to set up and work with discrete random variables; in particular, to understand the Bernoulli, binomial, geometric, and Poisson distributions;
  5. be able to work with continuous random variables. In particular, know the properties of uniform, normal, and exponential distributions;
  6. know what expectation and variance mean and be able to compute them;
  7. understand the law of large numbers and the central limit theorem;
  8. be able to compute the covariance and correlation between jointly distributed variables; and
  9. be able to use available resources (the internet or books) to learn about and use other distributions as they arise.

Statistics

Students completing the course will

  1. be able to create and interpret scatter plots and histograms;
  2. understand the difference between probability and likelihood functions and be able to find the maximum likelihood estimate for a model parameter;
  3. be able to do Bayesian updating with discrete priors to compute posterior distributions and posterior odds;
  4. be able to do Bayesian updating with continuous priors;
  5. be able to construct estimates and predictions using the posterior distribution;
  6. be able to find credible intervals for parameter estimates;
  7. be able to use null hypothesis significance testing (NHST) to test the significance of results and to understand and be able to compute the p-value for these tests
  8. be able to use specific significance tests, including z-test, t-test (one- and two-sample), and chi-squared test;
  9. be able to find confidence intervals for parameter estimates;
  10. be able to use bootstrapping to estimate confidence intervals;
  11. be able to compute and interpret simple linear regression between two variables; and
  12. be able to set up a least squares fit of data to a model.

Basic Structure of the Course

  • We will take an active learning approach similar in some respects to Technology Enabled Active Learning (TEAL).
  • You must do the reading and answer reading questions before each class.
    • In class we will assume you have done the reading. We will lecture and work problems assuming this.
    • We do not expect that you will have mastered the material on first reading. The goal is to start the process, so class will be more productive.
    • The reading questions will prepare you for the harder questions we will work during class and on the problem sets.
    • The reading questions will count toward your grade.
  • We meet three times a week.
    • Tuesday and Thursday will be a blend of lecture, concept questions and group problem solving.
    • Friday will be a “studio” day. It will involve longer problems and the use of R. You will need to have your computer available on Fridays.
  • We will use “clicker questions” in class.
    • Participation on these questions will also count toward your grade.
  • R
    • We will make frequent use of R for computation, simulation, and visualization.
    • We will teach you everything you need to know to use R as a tool in this class.
    • You will not be expected to do any hardcore computer programming.

Note to OCW Users

The interactive components of this course – the Online Reading Questions and Problem Checkers – are available on MIT’s Open Learning Library, which is free to use. You have the option to sign up and track your progress, or you can view and use the materials without enrolling.

Textbook

There will be no assigned textbook for the class. will be assigned.

Problem Sets and Exams

  • Problem sets
    • Problem sets will be due most weeks, usually on Monday.
    • Problem sets will have a problem checker on the Open Learning Library (OLL). This will allow you to check many of your answers before turning in the problem sets. 
  • Exams
    • There will be two in-class midterm exams and a comprehensive final exam.
    • The midterms will be designed to take one hour, but you will have the entire 80 minutes of class to finish.
    • We will have one R-based quiz. For this quiz, you will be allowed to use the internet in any way except to communicate with other people.

Groups

For in-class problem solving you will work in groups of 3.

  • You will be able to choose your own group.
  • After a week or so, groups will be more or less permanent.
  • Groups should sit together at tables.

Collaboration

MIT has a culture of teamwork. We encourage you to work with study partners.

  • Collaboration on homework is encouraged.
  • You must write your solutions yourself, in your own words.
  • You must list all collaborators and outside sources of information.

Grading

  • Reading questions and in-class clicker questions will each count for 5% of your grade.
  • Problem sets will count for 25% of your grade. In computing your problem set average we will drop your lowest score.
  • The R studios will count 5% of your grade.
  • The midterm exams and R quiz combined will count for 30% of your grade (12.5%, 12.5%, 5%).
  • The final exam will count for 30% of your grade.

Course Info

Departments
As Taught In
Spring 2022
Learning Resource Types
Lecture Notes
Problem Sets with Solutions
Exams with Solutions
Readings
Activity Assignments with Examples
Exam Materials
Tools
Instructor Insights