WEBVTT

00:00:09.930 --> 00:00:14.480
In this lecture, we'll discuss
the idea of using visualization

00:00:14.480 --> 00:00:17.970
to better understand data
and to provide insights

00:00:17.970 --> 00:00:21.240
on the problem we're addressing.

00:00:21.240 --> 00:00:22.870
Why visualization?

00:00:22.870 --> 00:00:26.450
People often say that a picture
is like a thousand words.

00:00:26.450 --> 00:00:30.120
In the same spirit, John
Tukey, a major statistician

00:00:30.120 --> 00:00:34.290
at Princeton, wrote that
"the picture-examining eye

00:00:34.290 --> 00:00:40.610
is the best finder we have
of the wholly unanticipated."

00:00:40.610 --> 00:00:44.170
Visualizing data allows us
to discern relationships,

00:00:44.170 --> 00:00:48.940
structures, distributions,
outliers, patterns, behaviors,

00:00:48.940 --> 00:00:51.260
dependencies, and outcomes.

00:00:51.260 --> 00:00:55.980
Visualization is further useful
for initial data exploration,

00:00:55.980 --> 00:00:59.200
for interpreting models, and
for communicating results

00:00:59.200 --> 00:01:01.640
effectively.

00:01:01.640 --> 00:01:04.440
Let us give some examples
of different modes

00:01:04.440 --> 00:01:07.380
of visualization that
illustrate these points.

00:01:07.380 --> 00:01:09.960
The figure shows
the miles per gallon

00:01:09.960 --> 00:01:13.090
of a car as a function
of the car's weight.

00:01:13.090 --> 00:01:16.320
The figure clearly illustrates
that as the weight of the car

00:01:16.320 --> 00:01:21.220
increases, the miles
per gallon decrease.

00:01:21.220 --> 00:01:26.250
The same graph, but now
colors of the points

00:01:26.250 --> 00:01:30.340
signify the number of cylinders
in the car: four for red,

00:01:30.340 --> 00:01:34.759
six for green,
and eight in blue.

00:01:39.160 --> 00:01:42.450
On the same data, we now
plot a regression line

00:01:42.450 --> 00:01:45.670
that captures the intuition
that as the weight of the car

00:01:45.670 --> 00:01:50.770
increases, the miles
per gallon decrease.

00:01:50.770 --> 00:01:53.190
In this plot, we'll
visualize burglaries

00:01:53.190 --> 00:01:56.770
in the city of Houston by
combining data and geographical

00:01:56.770 --> 00:01:59.950
location in a map.

00:01:59.950 --> 00:02:03.210
This plot illustrates,
using a heat map,

00:02:03.210 --> 00:02:07.500
the usage of rented bicycles
from the Hubway company.

00:02:07.500 --> 00:02:10.460
The horizontal axis is
the hour of the day,

00:02:10.460 --> 00:02:15.500
and the vertical axis the day
of the week, starting on Sunday.

00:02:15.500 --> 00:02:18.520
The heat map shows that
the usage increases

00:02:18.520 --> 00:02:22.910
during the morning and night
rush hours on weekdays.

00:02:25.570 --> 00:02:28.550
The next plot helps us
visualize histograms

00:02:28.550 --> 00:02:34.250
of different categories
using the Hubway data.

00:02:34.250 --> 00:02:38.520
This plot shows US
unemployment by state.

00:02:38.520 --> 00:02:42.990
The lighter colors corresponding
to smaller unemployment,

00:02:42.990 --> 00:02:45.270
and the darker
colors corresponding

00:02:45.270 --> 00:02:46.810
to larger unemployment rates.

00:02:50.140 --> 00:02:54.630
The plan this week is to create
all of these visualizations.

00:02:54.630 --> 00:02:58.340
We'll see how visualizations
can be used to better understand

00:02:58.340 --> 00:03:01.870
data, communicate
information more effectively,

00:03:01.870 --> 00:03:04.670
show the results of
analytical models.

00:03:04.670 --> 00:03:08.810
In the next video, we'll discuss
the World Health Organization,

00:03:08.810 --> 00:03:12.850
and how they use
visualizations effectively.