WEBVTT

00:00:04.500 --> 00:00:07.510
To build an analytics model,
let us discuss the variables

00:00:07.510 --> 00:00:09.850
we used.

00:00:09.850 --> 00:00:13.280
First, we used the
13,000 diagnoses.

00:00:13.280 --> 00:00:18.580
It's for the codes for diagnosis
that claims data utilize.

00:00:18.580 --> 00:00:21.970
There were also 22,000
different codes for procedures

00:00:21.970 --> 00:00:24.910
and 45,000 codes for
prescription drugs.

00:00:24.910 --> 00:00:29.380
To work with this massive
amount of variables,

00:00:29.380 --> 00:00:32.150
we aggregated the
variables as follows.

00:00:32.150 --> 00:00:39.500
Out of the 13,000 diagnoses, we
defined 217 diagnosis groups.

00:00:39.500 --> 00:00:43.730
Out of the 20,000 procedures,
we aggregated the data

00:00:43.730 --> 00:00:46.410
to develop 213 procedure groups.

00:00:46.410 --> 00:00:49.330
And, finally, from 45,000
prescription drugs,

00:00:49.330 --> 00:00:54.530
we developed 189
therapeutic groups.

00:00:54.530 --> 00:00:58.620
To illustrate an example of how
we infer further information

00:00:58.620 --> 00:01:03.340
from the data, the
graph here shows

00:01:03.340 --> 00:01:08.190
on the horizontal axis, time,
and on the vertical axis,

00:01:08.190 --> 00:01:13.280
costs in thousands of dollars.

00:01:13.280 --> 00:01:23.190
So patient one is a patient
who, on a monthly basis,

00:01:23.190 --> 00:01:29.289
has costs on the order of
$10,000 to $15,000, a fairly

00:01:29.289 --> 00:01:32.570
significant cost but
fairly constant in time.

00:01:32.570 --> 00:01:37.340
Patient two has
also an annual cost

00:01:37.340 --> 00:01:39.590
of a similar size
to patient one.

00:01:39.590 --> 00:01:45.620
But in all but the third
month, the costs are almost $0.

00:01:45.620 --> 00:01:51.250
Whereas in the third month,
it cost about $70,000.

00:01:51.250 --> 00:01:53.020
In fact, this is
additional data we

00:01:53.020 --> 00:01:59.140
defined indicating
whether the patient has

00:01:59.140 --> 00:02:01.560
a chronic or an acute condition.

00:02:01.560 --> 00:02:06.360
In addition to the initial
variables, the 217 procedure

00:02:06.360 --> 00:02:10.150
groups, and 189 drugs,
and so forth, we also

00:02:10.150 --> 00:02:13.240
defined in collaboration
with medical doctors,

00:02:13.240 --> 00:02:17.450
269 medically-defined rules.

00:02:17.450 --> 00:02:20.320
For example, the
first type of rule

00:02:20.320 --> 00:02:23.960
indicates the interaction
between various indices.

00:02:23.960 --> 00:02:26.620
For example, obesity
and depression.

00:02:36.460 --> 00:02:39.440
Then new variables
regarding interaction

00:02:39.440 --> 00:02:41.110
between diagnosis and age.

00:02:41.110 --> 00:02:44.950
For example, more than
65 years old and coronary

00:02:44.950 --> 00:02:45.610
artery disease.

00:02:50.480 --> 00:02:51.690
Noncompliance with treatment.

00:02:51.690 --> 00:02:55.930
For example, non-fulfillment
of a particular drug order.

00:02:55.930 --> 00:02:58.790
And, finally, illness severity.

00:02:58.790 --> 00:03:01.000
For example, severe
depression as

00:03:01.000 --> 00:03:02.620
opposed to regular depression.

00:03:05.520 --> 00:03:09.520
And the last set of variables
involve demographic information

00:03:09.520 --> 00:03:11.080
like gender and age.

00:03:15.300 --> 00:03:18.600
An important aspect
of the variables

00:03:18.600 --> 00:03:22.380
are the variables
related to cost.

00:03:22.380 --> 00:03:24.590
So rather than using
costs directly,

00:03:24.590 --> 00:03:31.079
we bucketed costs and considered
everyone in the group equally.

00:03:31.079 --> 00:03:34.460
So we defined five buckets.

00:03:34.460 --> 00:03:37.579
So the buckets were
partitioned in such a way

00:03:37.579 --> 00:03:45.570
so that 20% of all
costs is in bucket five,

00:03:45.570 --> 00:03:49.700
20% is in bucket
four, and so forth.

00:03:52.520 --> 00:03:58.920
So the partitions were from 0
to 3,000, from 3,000 to 8,000,

00:03:58.920 --> 00:04:04.000
from 8,000 to 19,000,
from 19,000 to 55,000,

00:04:04.000 --> 00:04:06.580
and above 55,000.

00:04:06.580 --> 00:04:13.360
The number of patients
that were below 3,000

00:04:13.360 --> 00:04:22.180
was-- 78% of the patients
had costs below 3,000.

00:04:22.180 --> 00:04:26.010
Just to remind you,
we created a bucket

00:04:26.010 --> 00:04:30.980
so that the total cost in each
bucket was 20% of the total.

00:04:30.980 --> 00:04:33.840
But the number of patients
in bucket one, for example,

00:04:33.840 --> 00:04:34.670
is very high (78%).

00:04:37.170 --> 00:04:41.250
Let us interpret the
buckets medically.

00:04:41.250 --> 00:04:44.540
So this shows the
various levels of risk.

00:04:44.540 --> 00:04:50.170
Bucket one consists of patients
that have rather low risk.

00:04:50.170 --> 00:04:54.400
Bucket two has what is
called emerging risk.

00:04:54.400 --> 00:04:57.460
In bucket three,
moderate level of risk.

00:04:57.460 --> 00:04:59.230
Bucket four, high risk.

00:04:59.230 --> 00:05:01.880
And bucket five, very high risk.

00:05:01.880 --> 00:05:04.930
So from a medical perspective,
buckets two and three,

00:05:04.930 --> 00:05:07.820
the medical and the
moderate risk patients,

00:05:07.820 --> 00:05:11.620
are candidates for
wellness programs.

00:05:11.620 --> 00:05:13.920
Whereas bucket four,
the high risk patients,

00:05:13.920 --> 00:05:16.740
are candidates for disease
management programs.

00:05:16.740 --> 00:05:20.210
And finally bucket five,
the very high risk patients,

00:05:20.210 --> 00:05:22.590
are candidates for
case management.