14.310x | Spring 2023 | Graduate

# Data Analysis for Social Scientists

## Calendar

### Week One: Introduction

• Introduction to the software R with exercises. Suggested resources for learning more on the web.
• Introduction to the power of data and data analysis, overview of what will be covered in the course.

### Week Two: Fundamentals of Probability, Random Variables, Joint Distributions, and Collecting Data

• Basics of probability and introduction to random variables
• Discussion of distributions and joint distributions
• Introduction to collecting data through surveys, web scraping, and other data collection methods

### Week Three: Describing Data, Joint, and Conditional Distributions of Random Variables

• Principles and practical steps for protection of human subjects in research
• Discussion of kernel density estimates
• Builds on basics from module 2 to cover joint, marginal, and conditional distributions

### Week Four: Joint, Marginal, and Conditional Distributions and Functions of Random Variables

• Similarly builds on the basics from week 2 to cover functions of random variables
• Discussion of moments of a distribution, expectation, and variance
• Basics of regression analysis
• Application: Application of some principles of probability to the analysis of auctions

### Week Five: Special Distributions, The Sample Mean, The Central Limit Theorem, and Estimation

• Discussion of properties of special distribution with several examples
• Statistics: Introduction to the sample mean, central limit theorem, and estimation

### Week Six: Assessing and Deriving Estimators, Confidence Intervals

• Deriving and assessing estimators
• Constructing and interpreting confidence intervals
• Introduction to hypothesis testing

### Week Seven: Causality, Analyzing Randomized Experiments, and Nonparametric Regression

• Understanding randomization in the context of experimentation
• Introduction to nonparametric regression techniques

### Week Eight: Single and Multivariate Linear Models

• In-depth discussion of the linear model and the multivariate linear model

### Week Nine: Practical Issues in Running Regressions and Omitted Variable Bias

• Covariates, fixed effects, and other functional forms
• Introduction to regression discontinuity design

### Week Ten: Endogeneity, Instrumental Variables, and Experimental Design

• Understanding the problem of endogeneity; introduction to instrumental variables and two-stage least squares, with a discussion of how to assess the validity of an instrument
• Discussion of how to design the effective experiment, followed by an example from Indonesia
• Principles of data visualization with examples of well-crafted visual presentations of data

### Week Eleven: Intro to Machine Learning and Data Visualizations

• Introduction to the use of machine learning for prediction. Covers tuning and training. [Note: These lectures were given by a guest lecturer and are not available to OCW users.]

Spring 2023
Lecture Notes
Lecture Videos
Problem Sets