14.310x | Spring 2023 | Graduate

Data Analysis for Social Scientists


Week One: Introduction

  • Introduction to the software R with exercises. Suggested resources for learning more on the web.
  • Introduction to the power of data and data analysis, overview of what will be covered in the course.

Week Two: Fundamentals of Probability, Random Variables, Joint Distributions, and Collecting Data

  • Basics of probability and introduction to random variables
  • Discussion of distributions and joint distributions
  • Introduction to collecting data through surveys, web scraping, and other data collection methods

Week Three: Describing Data, Joint, and Conditional Distributions of Random Variables

  • Principles and practical steps for protection of human subjects in research
  • Discussion of kernel density estimates
  • Builds on basics from module 2 to cover joint, marginal, and conditional distributions

Week Four: Joint, Marginal, and Conditional Distributions and Functions of Random Variables

  • Similarly builds on the basics from week 2 to cover functions of random variables
  • Discussion of moments of a distribution, expectation, and variance
  • Basics of regression analysis
  • Application: Application of some principles of probability to the analysis of auctions 

Week Five: Special Distributions, The Sample Mean, The Central Limit Theorem, and Estimation

  • Discussion of properties of special distribution with several examples
  • Statistics: Introduction to the sample mean, central limit theorem, and estimation

Week Six: Assessing and Deriving Estimators, Confidence Intervals

  • Deriving and assessing estimators
  • Constructing and interpreting confidence intervals
  • Introduction to hypothesis testing

Week Seven: Causality, Analyzing Randomized Experiments, and Nonparametric Regression

  • Understanding randomization in the context of experimentation
  • Introduction to nonparametric regression techniques

Week Eight: Single and Multivariate Linear Models

  • In-depth discussion of the linear model and the multivariate linear model

Week Nine: Practical Issues in Running Regressions and Omitted Variable Bias

  • Covariates, fixed effects, and other functional forms
  • Introduction to regression discontinuity design

Week Ten: Endogeneity, Instrumental Variables, and Experimental Design

  • Understanding the problem of endogeneity; introduction to instrumental variables and two-stage least squares, with a discussion of how to assess the validity of an instrument
  • Discussion of how to design the effective experiment, followed by an example from Indonesia
  • Principles of data visualization with examples of well-crafted visual presentations of data

Week Eleven: Intro to Machine Learning and Data Visualizations

  • Introduction to the use of machine learning for prediction. Covers tuning and training. [Note: These lectures were given by a guest lecturer and are not available to OCW users.]

Course Info

As Taught In
Spring 2023
Learning Resource Types
Lecture Notes
Lecture Videos
Problem Sets