15.071 | Spring 2017 | Graduate

The Analytics Edge

6 Clustering

6.3 Predictive Diagnosis: Discovering Patterns for Disease Detection

Quick Question

In this class, we’ve learned many different methods for predicting outcomes. Which of the following methods is designed to be used to predict an outcome like whether or not someone will experience a heart attack? Select all that apply.

 
 
 
 

Explanation Logistic Regression, CART, and Random Forest are all designed to be used to predict whether or not someone has a heart attack, since this is a classification problem. Linear regression would be appropriate for a problem with a continuous outcome, such as the amount of time until someone has a heart attack. In this lecture, we'll use random forest, but the other methods could be used too.

Continue: Video 2: The Data

Quick Question

In the previous video, we discussed how we split the data into three groups, or buckets, according to cost.

Which bucket has the most data, in terms of number of patients?

Explanation Cost Bucket 1 contains the most patients (see slide 7 of the previous video), and Cost Bucket 3 probably has the densest data, since these are the patients with the highest cost in terms of claims.

Which bucket probably has the densest data, in terms of number of claims per person?

Explanation Cost Bucket 1 contains the most patients (see slide 7 of the previous video), and Cost Bucket 3 probably has the densest data, since these are the patients with the highest cost in terms of claims.

Quick Question

K-means clustering differs from Hierarchical clustering in a couple important ways. Which of the following statements is true?

Explanation In k-means clustering, you have to pick the number of clusters before you run the algorithm, but the computational effort needed is much less than that for hierarchical clustering (we'll see this in more detail during the recitation).

Quick Question

As we saw in the previous video, the clusters can be used to find interesting patterns of health in addition to being used to improve predictive models. By changing the number of clusters, you can find more general or more specific patterns.

If you wanted to find more unusual patterns shared by a small number of people, would you increase or decrease the number of clusters?

Explanation If you wanted to find more unusual patterns, you would increase the number of clusters since the clusters would become smaller and more patterns would probably emerge.

Course Info

As Taught In
Spring 2017
Level
Learning Resource Types
Lecture Videos
Lecture Notes
Problem Sets with Solutions