WEBVTT

00:00:04.500 --> 00:00:06.110
Towards the beginning
of this lecture,

00:00:06.110 --> 00:00:08.310
we stated that the
goal of a baseball team

00:00:08.310 --> 00:00:11.700
is to make the playoffs and
we built predictive models

00:00:11.700 --> 00:00:13.790
to achieve this goal.

00:00:13.790 --> 00:00:16.140
But why isn't the goal
of a baseball team

00:00:16.140 --> 00:00:20.180
to win the playoffs or
win the World Series?

00:00:20.180 --> 00:00:23.060
Billy Beane and Paul
Depodesta see their job

00:00:23.060 --> 00:00:26.080
as making sure the team
makes it to the playoffs,

00:00:26.080 --> 00:00:28.180
and after that,
all bets are off.

00:00:28.180 --> 00:00:32.800
The A's made it to the playoffs
four years in a row-- 2000,

00:00:32.800 --> 00:00:38.680
2001, 2002, and 2003-- but they
didn't win the World Series.

00:00:38.680 --> 00:00:39.920
Why not?

00:00:39.920 --> 00:00:43.740
In Moneyball, they say that
"over a long season luck

00:00:43.740 --> 00:00:46.350
evens out, and skill
shines through.

00:00:46.350 --> 00:00:48.450
But in a series of
three out of five,

00:00:48.450 --> 00:00:52.360
or even four out of seven,
anything can happen."

00:00:52.360 --> 00:00:55.570
In other words, the playoffs
suffer from the sample size

00:00:55.570 --> 00:00:56.620
problem.

00:00:56.620 --> 00:01:00.440
There are not enough games to
make any statistical claims.

00:01:00.440 --> 00:01:05.000
Let's see if we can verify
this using our data set.

00:01:05.000 --> 00:01:06.760
The number of teams
in the playoffs

00:01:06.760 --> 00:01:08.510
has changed over the years.

00:01:08.510 --> 00:01:10.840
So let's only use the
years with eight teams

00:01:10.840 --> 00:01:13.820
in the playoffs, which was the
number of teams in the playoffs

00:01:13.820 --> 00:01:17.450
in 2002, the year
Moneyball discusses.

00:01:17.450 --> 00:01:20.520
We can compute the correlation
between whether or not

00:01:20.520 --> 00:01:24.250
the team wins the World
Series-- a binary variable--

00:01:24.250 --> 00:01:26.650
and the number of
regular season wins,

00:01:26.650 --> 00:01:29.200
since we would expect
teams with more wins

00:01:29.200 --> 00:01:32.180
to be more likely to
win the World Series.

00:01:32.180 --> 00:01:37.140
This correlation is
0.03, which is very low.

00:01:37.140 --> 00:01:40.660
So it turns out that winning
regular season games gets you

00:01:40.660 --> 00:01:43.610
to the playoffs, but in
the playoffs, there too

00:01:43.610 --> 00:01:46.580
few games for luck to even out.

00:01:46.580 --> 00:01:49.280
Next week, we'll discuss
logistic regression,

00:01:49.280 --> 00:01:52.289
which we'll be able to use
to predict whether or not

00:01:52.289 --> 00:01:54.789
the team will win
the World Series.