WEBVTT

00:00:04.150 --> 00:00:07.010
The goal of a basketball team
is similar to that of a baseball

00:00:07.010 --> 00:00:10.300
team, making the playoffs.

00:00:10.300 --> 00:00:12.140
So how many games
does a team need

00:00:12.140 --> 00:00:15.370
to win in order to
make the playoffs?

00:00:15.370 --> 00:00:17.400
Recall that in the lecture
we found this number

00:00:17.400 --> 00:00:19.520
by looking at a graph.

00:00:19.520 --> 00:00:21.770
Here in R, let's use
the table command

00:00:21.770 --> 00:00:24.490
to figure this out for the NBA.

00:00:24.490 --> 00:00:26.240
So that's just
table(NBA$W, NBA$Playoffs).

00:00:38.460 --> 00:00:39.760
So our table pops up.

00:00:39.760 --> 00:00:44.670
Let's scroll to the top
so we see what's going on.

00:00:44.670 --> 00:00:50.350
OK so as we typed in, we've
got the number of wins

00:00:50.350 --> 00:00:52.620
here as the rows.

00:00:52.620 --> 00:00:56.950
And 0 if a team didn't make
the playoffs, 1 if a team did

00:00:56.950 --> 00:00:59.470
make the playoffs
in the columns.

00:00:59.470 --> 00:01:03.730
So for all of our
data, for example,

00:01:03.730 --> 00:01:07.940
consider all the times
that a team won 17 games.

00:01:07.940 --> 00:01:10.500
So this happened
11 times in total.

00:01:10.500 --> 00:01:13.920
And all 11 times the teams
didn't make it to the playoffs

00:01:13.920 --> 00:01:16.400
when they won 17 games.

00:01:16.400 --> 00:01:19.060
Let's scroll down and look
at a much higher number

00:01:19.060 --> 00:01:21.060
for contrast.

00:01:21.060 --> 00:01:24.060
For example, 61 wins.

00:01:24.060 --> 00:01:28.890
If a team 61 games then 10
of those times they made it

00:01:28.890 --> 00:01:31.930
to the playoffs, and
0 times they didn't.

00:01:31.930 --> 00:01:34.220
So it seems like
if you win 61 games

00:01:34.220 --> 00:01:37.229
you are definitely going
to make it to the playoffs.

00:01:37.229 --> 00:01:39.710
But I'm sure we can find
a much better threshold.

00:01:39.710 --> 00:01:42.300
Let's take a look at the table,
say around the middle section.

00:01:46.490 --> 00:01:51.180
OK, so here we can
see that a team who

00:01:51.180 --> 00:01:57.310
wins say about 35 games
or fewer almost never

00:01:57.310 --> 00:01:58.360
makes it to the playoffs.

00:01:58.360 --> 00:02:02.820
We see a lot of 0s and 1s
in this column up until 35.

00:02:02.820 --> 00:02:05.330
After 35 we start seeing
some numbers over here.

00:02:05.330 --> 00:02:08.690
So teams are starting to
make it to the playoffs.

00:02:08.690 --> 00:02:15.050
And if we scroll down, we
see that after about 45 wins,

00:02:15.050 --> 00:02:16.970
teams almost always
make it to the playoffs.

00:02:16.970 --> 00:02:22.360
We see very few 1s and 0s in
the category of not making it.

00:02:22.360 --> 00:02:24.150
So it seems like
a good goal would

00:02:24.150 --> 00:02:26.950
be to try to win about 42 games.

00:02:26.950 --> 00:02:29.260
If a team can win
about 42 games then

00:02:29.260 --> 00:02:31.680
they have a very good chance
of making it to the playoffs.

00:02:37.490 --> 00:02:40.650
So in basketball, games are
won by scoring more points

00:02:40.650 --> 00:02:43.200
than the other team.

00:02:43.200 --> 00:02:45.550
Can we use the difference
between points scored

00:02:45.550 --> 00:02:48.790
and points allowed throughout
the regular season in order

00:02:48.790 --> 00:02:52.150
to predict the number of
games that a team will win?

00:02:52.150 --> 00:02:54.180
Let's give it a try.

00:02:54.180 --> 00:02:57.210
First we add a variable that is
the difference between points

00:02:57.210 --> 00:02:59.320
scored and points allowed.

00:02:59.320 --> 00:03:00.920
Let's call this NBA$PTSdiff.

00:03:05.180 --> 00:03:10.700
And that's just the difference
between points scored,

00:03:10.700 --> 00:03:14.760
which is points, and
points allowed, which

00:03:14.760 --> 00:03:16.270
is opponent's points.

00:03:18.970 --> 00:03:22.310
All right, so we've
created a variable.

00:03:22.310 --> 00:03:23.900
Let's first make a
scatter plot to see

00:03:23.900 --> 00:03:26.730
if it looks like there's
a linear relationship

00:03:26.730 --> 00:03:28.970
between the number of
wins that a team wins

00:03:28.970 --> 00:03:31.300
and the point difference.

00:03:31.300 --> 00:03:34.740
So this is easy to do just
with the Plot command.

00:03:34.740 --> 00:03:43.850
NBA$PTSdiff and NBA$W.

00:03:43.850 --> 00:03:46.490
So our graph pops
up and it looks

00:03:46.490 --> 00:03:49.170
like there's an incredibly
strong linear relationship

00:03:49.170 --> 00:03:51.940
between these two variables.

00:03:51.940 --> 00:03:53.630
So it seems like
linear regression

00:03:53.630 --> 00:03:56.090
is going to be a good way
to predict how many wins

00:03:56.090 --> 00:03:59.960
a team will have given
the point difference.

00:03:59.960 --> 00:04:01.010
Let's try to verify this.

00:04:06.410 --> 00:04:08.400
So we're going to
have points diff

00:04:08.400 --> 00:04:11.760
as our independent
variable in our regression,

00:04:11.760 --> 00:04:15.130
and W for wins as the
dependent variable.

00:04:15.130 --> 00:04:16.940
So let's call this WinsReg.

00:04:20.060 --> 00:04:26.580
And we just use the lm command
as before progressing w

00:04:26.580 --> 00:04:33.270
on the points diff and
using the NBA data.

00:04:33.270 --> 00:04:35.070
All right, so we've
created our regression.

00:04:35.070 --> 00:04:36.440
Let's take a look
at the summary.

00:04:41.380 --> 00:04:45.450
OK, so the first
thing that we notice

00:04:45.450 --> 00:04:49.000
is that we've got very
significant variables

00:04:49.000 --> 00:04:50.180
over here.

00:04:50.180 --> 00:04:54.360
And an R squared of
0.9423, which is very high.

00:04:54.360 --> 00:04:58.060
And this is verifying
the scatter plot

00:04:58.060 --> 00:05:01.290
we saw before that there's a
very strong linear relationship

00:05:01.290 --> 00:05:03.080
between the wins and
the points difference.

00:05:06.770 --> 00:05:08.290
So let's write
down the regression

00:05:08.290 --> 00:05:10.490
equation that we found.

00:05:10.490 --> 00:05:18.290
We see that the number of
wins, W, is equal to 41.

00:05:18.290 --> 00:05:22.360
That's coming from the
coefficient estimate

00:05:22.360 --> 00:05:23.950
for the intercept.

00:05:23.950 --> 00:05:42.240
Plus 0.0326*PTSdiff.

00:05:42.240 --> 00:05:46.690
And that 0.0326 is coming
from the coefficient estimate

00:05:46.690 --> 00:05:49.720
for points difference.

00:05:49.720 --> 00:05:53.130
So we saw earlier with
the table that a team

00:05:53.130 --> 00:05:56.250
would want to win
about at least 42 games

00:05:56.250 --> 00:05:59.920
in order to have a good chance
of making it to the playoffs.

00:05:59.920 --> 00:06:03.070
So what does this mean in terms
of their points difference?

00:06:03.070 --> 00:06:04.700
Well, we can calculate it.

00:06:04.700 --> 00:06:09.830
If we want this to be
greater than or equal to 42,

00:06:09.830 --> 00:06:15.940
that means that the
points difference

00:06:15.940 --> 00:06:26.290
would need to be greater
than or equal to 42 minus 41

00:06:26.290 --> 00:06:34.120
divided by 0.0326.

00:06:34.120 --> 00:06:36.530
So if we actually
do that calculation,

00:06:36.530 --> 00:06:47.590
we see that this
is equal to 30.67.

00:06:47.590 --> 00:06:50.510
So we need to score at
least 31 more points

00:06:50.510 --> 00:06:54.950
than we allow in order
to win at least 42 games.