#--------------------------------------------------------- # File: MIT18_05S22_in-class26-script.txt # Author: Jeremy Orloff # # MIT OpenCourseWare: https://ocw.mit.edu # 18.05 Introduction to Probability and Statistics # Spring 2022 # For information about citing these materials or our Terms of Use, visit: # https://ocw.mit.edu/terms. # #--------------------------------------------------------- Class 26 Linear Regression Jerry Slide 1: Intro and thanks (3 minutes) Slide 2: Announcements/Agenda (3 minutes) Slide 3: RQuiz (4 minutes) Slides 4-9: Review ( 6 minutes) Don't go into lots of details. The board question is the place for that. Jen Slide 10: BOARD Question: Compute and set up several least squares (Work 12 minutes, discussion 8 minutes) DISCUSSION a and b: Go through setup and derivatives. DO NOT do the algebra to solve, just give the numbers c. Take log and stop d. Set up sum of squares Slide 11: What is linear about linear regression (2 minutes) This will have been pointed out in the discussion above, so can cover it quickly Slides 12-13: Homo and heteroscedacicity (3 minutes) Amusing name Slide 14: Formulas for simple linear regression (2 minutes) These are in the reading and on the next slide. Only highlight the warning Jerry Slide 15: BOARD question: Use the formulas, MLE connection (work 10 minutes, discussion 6 minutes) Work: Don't let the groups do (d) Discussion --do not attempt to prove or justify the formulas in any way They can find the derivation in the reading. They will get all the computations so no need to say anything about them except that formulas make things easy. Give warning again To point out: (c) is a useful thing to know (d) Theoretical underpinnings. Finding the MLE is just calculus, but a bit tedious Slide 16: Measuring the fit R^2 (3 minutes) They are not responsible for this on the final Don't dwell: key is we have a measure for goodness of fit. Will talk about fit/complexity trade-off in demos Slide 17: Overfitting, demo with R (6 minutes) Have it cued up and ready to go --See comment in slides.tex on this R demonstration! % Uses class26-prep.r. Set doOverFittingExample = 1, and from the console, up arrow to source the file. This will show the data points. Then it will do one at a time: plot data, m=1, fit, m=2 fit, m = 9 fit. Each fit also prints the R^2 value, which jumps from m=1 to m=2 and goes to 1.0 at m=9 Slide 18: Outliers (6 minutes) Use linear regression applet: Clear data, Set to add data mode Show best fit line Add a series of points in the first quadrant --slope about -1 After 10 points add a point in the third quadrant and watch the line jump ***We probably won't get to slides 19 and 20 Slide 19: Regression to the mean Point them to the reading for technical details Spend time on the education example Slide 20: Multiple linear regression Point is it's very similiar looking to bivariate linear regression.