WEBVTT

00:00:04.580 --> 00:00:07.280
In this video, we'll
add the hour of the day

00:00:07.280 --> 00:00:09.640
to our line plot,
and then create

00:00:09.640 --> 00:00:13.310
an alternative visualization
using a heat map.

00:00:13.310 --> 00:00:17.870
We can do this by creating a
line for each day of the week

00:00:17.870 --> 00:00:21.570
and making the x-axis
the hour of the day.

00:00:21.570 --> 00:00:23.860
We first need to
create a counts table

00:00:23.860 --> 00:00:26.070
for the weekday, and hour.

00:00:26.070 --> 00:00:30.500
So we'll use the table function
and give as the first variable,

00:00:30.500 --> 00:00:33.170
the Weekday variable
in our data frame.

00:00:33.170 --> 00:00:37.650
and as the second variable,
the Hour variable.

00:00:37.650 --> 00:00:41.640
This table gives, for each
day of the week and each hour,

00:00:41.640 --> 00:00:45.080
the total number of motor
vehicle thefts that occurred.

00:00:45.080 --> 00:00:48.650
For example, on
Friday at 4 AM, there

00:00:48.650 --> 00:00:53.300
were 473 motor vehicle
thefts, whereas on Saturday

00:00:53.300 --> 00:00:58.860
at midnight, there were
2,050 motor vehicle thefts.

00:00:58.860 --> 00:01:01.260
Let's save this
table to a data frame

00:01:01.260 --> 00:01:03.960
so that we can use it
in our visualizations.

00:01:03.960 --> 00:01:11.490
We'll call it DayHourCounts and
use the as.data.frame function,

00:01:11.490 --> 00:01:16.140
run on our table, where the
first variable is the Weekday

00:01:16.140 --> 00:01:18.430
and the second
variable is the Hour.

00:01:21.289 --> 00:01:23.300
Let's take a look at the
structure of the data

00:01:23.300 --> 00:01:24.220
frame we just created.

00:01:28.220 --> 00:01:32.360
We can see that we have
168 observations-- one

00:01:32.360 --> 00:01:35.259
for each day of the
week and hour pair,

00:01:35.259 --> 00:01:37.490
and three different variables.

00:01:37.490 --> 00:01:40.930
The first variable, Var1,
gives the day of the week.

00:01:40.930 --> 00:01:45.229
The second variable, Var2,
gives the hour of the day.

00:01:45.229 --> 00:01:48.620
And the third variable,
Freq for frequency,

00:01:48.620 --> 00:01:51.100
gives the total crime count.

00:01:51.100 --> 00:01:54.440
Let's convert the
second variable, Var2,

00:01:54.440 --> 00:01:57.009
to actual numbers
and call it Hour,

00:01:57.009 --> 00:01:58.759
since this is the
hour of the day,

00:01:58.759 --> 00:02:01.430
and it makes sense
that it's numerical.

00:02:01.430 --> 00:02:07.520
So we'll add a new variable to
our data frame called Hour =

00:02:07.520 --> 00:02:09.400
as.numeric(as.character(DayHourCounts$Var2)).

00:02:21.910 --> 00:02:27.920
This is how we convert a factor
variable to a numeric variable.

00:02:27.920 --> 00:02:30.410
Now we're ready to
create our plot.

00:02:30.410 --> 00:02:33.840
We just need to change
the group to Var1,

00:02:33.840 --> 00:02:35.620
which is the day of the week.

00:02:35.620 --> 00:02:38.510
So we'll use the
ggplot function where

00:02:38.510 --> 00:02:43.860
our data frame is DayHourCounts,
and then in our aesthetic,

00:02:43.860 --> 00:02:47.420
we want the x-axis
to be Hour this time,

00:02:47.420 --> 00:02:53.620
the y-axis to be Freq, and
then in the geom_line option,

00:02:53.620 --> 00:02:56.100
like we used in
the previous video,

00:02:56.100 --> 00:03:01.740
we want the aesthetic to
have the group equal to Var1,

00:03:01.740 --> 00:03:03.790
which is the day of the week.

00:03:03.790 --> 00:03:04.970
Go ahead and hit Enter.

00:03:04.970 --> 00:03:09.570
You should see a new plot show
up in the graphics window.

00:03:09.570 --> 00:03:13.860
It has seven lines, one
for each day of the week.

00:03:13.860 --> 00:03:15.670
While this is
interesting, we can't

00:03:15.670 --> 00:03:18.280
tell which line is which
day, so let's change

00:03:18.280 --> 00:03:20.570
the colors of the
lines to correspond

00:03:20.570 --> 00:03:22.410
to the days of the week.

00:03:22.410 --> 00:03:26.320
To do that, just scroll
up in your R console,

00:03:26.320 --> 00:03:32.630
and after group =
Var1, add color = Var1.

00:03:32.630 --> 00:03:34.370
This will make the
colors of the lines

00:03:34.370 --> 00:03:37.050
correspond to the
day of the week.

00:03:37.050 --> 00:03:40.040
After that parenthesis,
go ahead and type comma,

00:03:40.040 --> 00:03:41.820
and then size = 2.

00:03:41.820 --> 00:03:43.410
We'll make our lines
a little thicker.

00:03:46.000 --> 00:03:49.579
Now in our plot, each line
is colored corresponding

00:03:49.579 --> 00:03:51.600
to the day of the week.

00:03:51.600 --> 00:03:54.290
This helps us see that
on Saturday and Sunday,

00:03:54.290 --> 00:03:57.790
for example, the green
and the teal lines,

00:03:57.790 --> 00:04:01.570
there's less motor vehicle
thefts in the morning.

00:04:01.570 --> 00:04:04.630
While we can get some
information from this plot,

00:04:04.630 --> 00:04:06.870
it's still quite
hard to interpret.

00:04:06.870 --> 00:04:09.020
Seven lines is a lot.

00:04:09.020 --> 00:04:14.170
Let's instead visualize the same
information with a heat map.

00:04:14.170 --> 00:04:16.519
To make a heat map,
we'll use our data

00:04:16.519 --> 00:04:19.230
in our data frame DayHourCounts.

00:04:19.230 --> 00:04:23.050
First, though, we need to
fix the order of the days

00:04:23.050 --> 00:04:26.140
so that they'll show up
in chronological order

00:04:26.140 --> 00:04:28.240
instead of in
alphabetical order.

00:04:28.240 --> 00:04:31.620
We'll do the same thing we
did in the previous video.

00:04:31.620 --> 00:04:37.110
So for DayHourCounts$Var1,
which is the day of the week,

00:04:37.110 --> 00:04:41.090
we're going to use the factor
function where the first

00:04:41.090 --> 00:04:46.850
argument is our variable,
DayHourCounts$Var1,

00:04:46.850 --> 00:04:51.210
the second argument
is ordered = TRUE,

00:04:51.210 --> 00:04:54.000
and the third argument is
the order we want the days

00:04:54.000 --> 00:04:55.760
of the week to show up in.

00:04:55.760 --> 00:05:00.660
So we'll set levels,
equals, and then c,

00:05:00.660 --> 00:05:02.800
and then list your
days of the week.

00:05:02.800 --> 00:05:06.440
Let's put the weekdays first
and the weekends at the end.

00:05:06.440 --> 00:05:11.290
So we'll start with Monday,
and then Tuesday, then

00:05:11.290 --> 00:05:22.600
Wednesday, then Thursday,
Friday, Saturday and Sunday.

00:05:26.450 --> 00:05:28.490
Now let's make our heat map.

00:05:28.490 --> 00:05:32.280
We'll use the ggplot
function like we always do,

00:05:32.280 --> 00:05:36.570
and give our data frame
name, DayHourCounts.

00:05:36.570 --> 00:05:39.980
Then in our aesthetic,
we want the x-axis

00:05:39.980 --> 00:05:43.409
to be the hour of the
day, and the y-axis

00:05:43.409 --> 00:05:46.860
to be the day of the
week, which is Var1.

00:05:46.860 --> 00:05:48.680
Then we're going
to add geom_tile.

00:05:51.230 --> 00:05:54.210
This is the function we
use to make a heat map.

00:05:54.210 --> 00:05:57.040
And then in the
aesthetic for our tiles,

00:05:57.040 --> 00:06:00.930
we want the fill to
be equal to Freq.

00:06:00.930 --> 00:06:05.000
This will define the colors of
the rectangles in our heat map

00:06:05.000 --> 00:06:08.850
to correspond to
the total crime.

00:06:08.850 --> 00:06:12.530
You should see a heat map pop
up in your graphics window.

00:06:12.530 --> 00:06:14.500
So how do we read this?

00:06:14.500 --> 00:06:17.440
For each hour and
each day of the week,

00:06:17.440 --> 00:06:20.150
we have a rectangle
in our heat map.

00:06:20.150 --> 00:06:23.450
The color of that rectangle
indicates the frequency,

00:06:23.450 --> 00:06:26.420
or the number of crimes
that occur in that hour

00:06:26.420 --> 00:06:28.070
and on that day.

00:06:28.070 --> 00:06:31.120
Our legend tells us
that lighter colors

00:06:31.120 --> 00:06:33.680
correspond to more crime.

00:06:33.680 --> 00:06:36.250
So we can see that
a lot of crime

00:06:36.250 --> 00:06:41.720
happens around midnight,
particularly on the weekends.

00:06:41.720 --> 00:06:45.090
We can change the
label on the legend,

00:06:45.090 --> 00:06:49.510
and get rid of the y label to
make our plot a little nicer.

00:06:49.510 --> 00:06:52.930
We can do this by just scrolling
up to our previous command

00:06:52.930 --> 00:06:56.659
in our R console and then
adding scale_fill_gradient.

00:07:02.180 --> 00:07:04.960
This defines properties
of the legend,

00:07:04.960 --> 00:07:11.400
and we want name =
"Total MV Thefts",

00:07:11.400 --> 00:07:12.930
for total motor vehicle thefts.

00:07:15.930 --> 00:07:17.820
Then let's add, in the
theme(axis.title.y =

00:07:17.820 --> 00:07:18.530
element_blank()).

00:07:28.200 --> 00:07:29.660
This is what you
can do if you want

00:07:29.660 --> 00:07:33.100
to get rid of one
of the axis labels.

00:07:33.100 --> 00:07:35.090
Go ahead and hit Enter.

00:07:35.090 --> 00:07:37.130
And now on our heat
map, the legend

00:07:37.130 --> 00:07:43.010
is titled "Total MV Thefts"
and the y-axis label is gone.

00:07:43.010 --> 00:07:45.880
We can also change
the color scheme.

00:07:45.880 --> 00:07:49.360
We can do this by scrolling
up in our R console,

00:07:49.360 --> 00:07:52.490
and going to that
scale_fill_gradient function,

00:07:52.490 --> 00:07:55.610
the one that defines
properties of our legend,

00:07:55.610 --> 00:08:00.890
and after name =
"Total MV Thefts",

00:08:00.890 --> 00:08:08.230
low = "white", high = "red".

00:08:08.230 --> 00:08:10.510
We'll make lower
values correspond

00:08:10.510 --> 00:08:13.020
to white colors
and higher values

00:08:13.020 --> 00:08:15.060
correspond to red colors.

00:08:15.060 --> 00:08:18.350
If you hit enter, a
new plot should show up

00:08:18.350 --> 00:08:20.610
with different colors.

00:08:20.610 --> 00:08:23.340
This is a common color
scheme in policing.

00:08:23.340 --> 00:08:27.950
It shows the hot spots, or the
places with more crime, in red.

00:08:27.950 --> 00:08:31.480
So now the most crime is
shown by the red spots

00:08:31.480 --> 00:08:35.280
and the least crime is
shown by the lighter areas.

00:08:35.280 --> 00:08:38.570
It looks like Friday night
is a pretty common time

00:08:38.570 --> 00:08:40.220
for motor vehicle thefts.

00:08:40.220 --> 00:08:43.200
We saw something that we didn't
really see in the heat map

00:08:43.200 --> 00:08:44.750
before.

00:08:44.750 --> 00:08:48.010
It's often useful to change
the color scheme depending

00:08:48.010 --> 00:08:50.610
on whether you want high
values or low values

00:08:50.610 --> 00:08:55.340
to pop out, and the feeling
you want the plot to portray.

00:08:55.340 --> 00:08:59.710
In this video, we've seen how to
create some new types of plots.

00:08:59.710 --> 00:09:02.940
In the next video, we'll
see how to add data

00:09:02.940 --> 00:09:05.470
to geographical maps.