### Video 1: Heart Attacks

The slides from all videos in this Lecture Sequence can be downloaded here: Discovering Patterns for Disease Detection (PDF - 1.4MB).

6 Clustering

The slides from all videos in this Lecture Sequence can be downloaded here: Discovering Patterns for Disease Detection (PDF - 1.4MB).

In this class, we’ve learned many different methods for predicting outcomes. Which of the following methods is designed to be used to predict an outcome like whether or not someone will experience a heart attack? Select all that apply.

Explanation Logistic Regression, CART, and Random Forest are all designed to be used to predict whether or not someone has a heart attack, since this is a classification problem. Linear regression would be appropriate for a problem with a continuous outcome, such as the amount of time until someone has a heart attack. In this lecture, we'll use random forest, but the other methods could be used too.

In the previous video, we discussed how we split the data into three groups, or buckets, according to cost.

Which bucket has the most data, in terms of number of patients?

Explanation Cost Bucket 1 contains the most patients (see slide 7 of the previous video), and Cost Bucket 3 probably has the densest data, since these are the patients with the highest cost in terms of claims.

Which bucket probably has the densest data, in terms of number of claims per person?

K-means clustering differs from Hierarchical clustering in a couple important ways. Which of the following statements is true?

Explanation In k-means clustering, you have to pick the number of clusters before you run the algorithm, but the computational effort needed is much less than that for hierarchical clustering (we'll see this in more detail during the recitation).

As we saw in the previous video, the clusters can be used to find interesting patterns of health in addition to being used to improve predictive models. By changing the number of clusters, you can find more general or more specific patterns.

If you wanted to find more unusual patterns shared by a small number of people, would you increase or decrease the number of clusters?

Explanation If you wanted to find more unusual patterns, you would increase the number of clusters since the clusters would become smaller and more patterns would probably emerge.

Spring
2017