WEBVTT

00:00:04.500 --> 00:00:07.710
In this lecture, we'll be
using analytical models

00:00:07.710 --> 00:00:09.900
to prevent heart disease.

00:00:09.900 --> 00:00:13.050
The first step is to
identify risk factors,

00:00:13.050 --> 00:00:17.350
or the independent variables,
that we will use in our model.

00:00:17.350 --> 00:00:21.490
Then, using data, we'll create
a logistic regression model

00:00:21.490 --> 00:00:23.800
to predict heart disease.

00:00:23.800 --> 00:00:26.570
Using more data, we'll
validate our model

00:00:26.570 --> 00:00:29.520
to make sure it performs
well out of sample

00:00:29.520 --> 00:00:32.340
and on different populations
than the training set

00:00:32.340 --> 00:00:34.310
population.

00:00:34.310 --> 00:00:37.260
Lastly, we'll discuss
how medical interventions

00:00:37.260 --> 00:00:39.350
can be defined using the model.

00:00:41.930 --> 00:00:44.040
We'll be predicting
the 10-year risk

00:00:44.040 --> 00:00:47.670
of coronary heart
disease or CHD.

00:00:47.670 --> 00:00:51.600
This was the subject of
an important 1998 paper

00:00:51.600 --> 00:00:55.420
introducing what is known as
the Framingham Risk Score.

00:00:55.420 --> 00:00:58.210
This is one of the most
influential applications

00:00:58.210 --> 00:01:00.890
of the Framingham
Heart Study data.

00:01:00.890 --> 00:01:05.630
We'll use logistic regression
to create a similar model.

00:01:05.630 --> 00:01:10.170
CHD is a disease of the blood
vessels supplying the heart.

00:01:10.170 --> 00:01:12.320
This is one type of
heart disease, which

00:01:12.320 --> 00:01:17.510
has been the leading cause of
death worldwide since 1921.

00:01:17.510 --> 00:01:23.230
In 2008, $7.3 million
people died from CHD.

00:01:23.230 --> 00:01:27.500
Even though the number of deaths
due to CHD is still very high,

00:01:27.500 --> 00:01:29.480
age-adjusted death
rates have actually

00:01:29.480 --> 00:01:33.860
declined 60% since 1950.

00:01:33.860 --> 00:01:38.140
This is in part due to earlier
detection and monitoring partly

00:01:38.140 --> 00:01:40.210
because of the
Framingham Heart Study.

00:01:43.050 --> 00:01:45.920
Before building a
logistic regression model,

00:01:45.920 --> 00:01:48.570
we need to identify the
independent variables

00:01:48.570 --> 00:01:50.509
we want to use.

00:01:50.509 --> 00:01:52.530
When predicting the
risk of a disease,

00:01:52.530 --> 00:01:57.070
we want to identify what
are known as risk factors.

00:01:57.070 --> 00:01:59.020
These are the
variables that increase

00:01:59.020 --> 00:02:02.340
the chances of
developing a disease.

00:02:02.340 --> 00:02:04.480
The term risk
factors was actually

00:02:04.480 --> 00:02:07.140
coined by William
Kannell and Roy Dawber

00:02:07.140 --> 00:02:10.020
from the Framingham Heart Study.

00:02:10.020 --> 00:02:12.050
Identifying these
risk factors is

00:02:12.050 --> 00:02:14.450
the key to successful
prediction of CHD.

00:02:17.220 --> 00:02:20.140
In this lecture, we'll
focus on the risk factors

00:02:20.140 --> 00:02:23.060
that they collected data
for in the original data

00:02:23.060 --> 00:02:26.100
collection for the
Framingham Heart Study.

00:02:26.100 --> 00:02:28.320
We'll be using an
anonymized version

00:02:28.320 --> 00:02:31.690
of the original data
that was collected.

00:02:31.690 --> 00:02:35.490
This data set includes several
demographic risk factors--

00:02:35.490 --> 00:02:38.690
the sex of the patient,
male or female;

00:02:38.690 --> 00:02:43.200
the age of the patient in
years; the education level coded

00:02:43.200 --> 00:02:45.590
as either 1 for
some high school,

00:02:45.590 --> 00:02:48.900
2 for a high school
diploma or GED,

00:02:48.900 --> 00:02:51.920
3 for some college
or vocational school,

00:02:51.920 --> 00:02:55.700
and 4 for a college degree.

00:02:55.700 --> 00:02:58.680
The data set also includes
behavioral risk factors

00:02:58.680 --> 00:03:02.060
associated with
smoking-- whether or not

00:03:02.060 --> 00:03:06.120
the patient is a current smoker
and the number of cigarettes

00:03:06.120 --> 00:03:09.510
that the person smoked
on average in one day.

00:03:09.510 --> 00:03:12.930
While it is now widely
known that smoking increases

00:03:12.930 --> 00:03:14.980
the risk of heart
disease, the idea

00:03:14.980 --> 00:03:20.579
of smoking being bad for you
was a novel idea in the 1940s.

00:03:20.579 --> 00:03:24.230
Medical history risk
factors were also included.

00:03:24.230 --> 00:03:25.940
These were whether or
not the patient was

00:03:25.940 --> 00:03:29.660
on blood pressure medication,
whether or not the patient had

00:03:29.660 --> 00:03:33.260
previously had a stroke,
whether or not the patient was

00:03:33.260 --> 00:03:36.720
hypertensive, and whether or
not the patient had diabetes.

00:03:39.650 --> 00:03:42.220
Lastly, the data set
includes risk factors

00:03:42.220 --> 00:03:45.740
from the first physical
examination of the patient.

00:03:45.740 --> 00:03:49.720
The total cholesterol level,
systolic blood pressure,

00:03:49.720 --> 00:03:55.370
diastolic blood pressure, Body
Mass Index, or BMI, heart rate,

00:03:55.370 --> 00:03:59.260
and blood glucose level of
the patient were measured.

00:03:59.260 --> 00:04:02.480
In the next video, we'll
use these risk factors

00:04:02.480 --> 00:04:06.450
to see if we can predict
the 10-year risk CHD.