1
00:00:00,890 --> 00:00:05,649
Non-volatile memories are used to maintain
system state even when the system is powered

2
00:00:05,649 --> 00:00:06,790
down.

3
00:00:06,790 --> 00:00:12,160
In flash memories, long-term storage is achieved
by storing charge on an well-insulated conductor

4
00:00:12,160 --> 00:00:16,920
called a floating gate, where it will remain
stable for years.

5
00:00:16,920 --> 00:00:22,190
The floating gate is incorporated in a standard
MOSFET, placed between the MOSFET's gate and

6
00:00:22,190 --> 00:00:24,130
the MOSFET's channel.

7
00:00:24,130 --> 00:00:29,619
If there is no charge stored on the floating
gate, the MOSFET can be turned on, i.e., be

8
00:00:29,619 --> 00:00:35,190
made to conduct, by placing a voltage V_1
on the gate terminal, creating an inversion

9
00:00:35,190 --> 00:00:40,210
layer that connects the MOSFET's source and
drain terminals.

10
00:00:40,210 --> 00:00:45,219
If there is a charge stored on the floating
gate, a higher voltage V_2 is required to

11
00:00:45,219 --> 00:00:46,800
turn on the MOSFET.

12
00:00:46,800 --> 00:00:51,469
By setting the gate terminal to a voltage
between V_1 and V_2, we can determine if the

13
00:00:51,469 --> 00:00:55,989
floating gate is charged by testing to see
if the MOSFET is conducting.

14
00:00:55,989 --> 00:01:01,030
In fact, if we can measure the current flowing
through the MOSFET, we can determine how much

15
00:01:01,030 --> 00:01:06,360
charge is stored on the floating gate, making
it possible to store multiple bits of information

16
00:01:06,360 --> 00:01:11,700
in one flash cell by varying the amount of
charge on its floating gate.

17
00:01:11,700 --> 00:01:17,270
Flash cells can be connected in parallel or
series to form circuits resembling CMOS NOR

18
00:01:17,270 --> 00:01:22,750
or NAND gates, allowing for a variety of access
architectures suitable for either random or

19
00:01:22,750 --> 00:01:26,039
sequential access.

20
00:01:26,039 --> 00:01:30,860
Flash memories are very dense, approaching
the areal density of DRAMs, particularly when

21
00:01:30,860 --> 00:01:35,250
each cell holds multiple bits of information.

22
00:01:35,250 --> 00:01:39,939
Read access times for NOR flash memories are
similar to that of DRAMs, several tens of

23
00:01:39,939 --> 00:01:41,710
nanoseconds.

24
00:01:41,710 --> 00:01:47,420
Read times for NAND flash memories are much
longer, on the order of 10 microseconds.

25
00:01:47,420 --> 00:01:51,610
Write times for all types of flash memories
are quite long since high voltages have to

26
00:01:51,610 --> 00:01:58,969
be used to force electrons to cross the insulating
barrier surrounding the floating gate.

27
00:01:58,969 --> 00:02:03,530
Flash memories can only be written some number
of times before the insulating layer is damaged

28
00:02:03,530 --> 00:02:08,699
to the point that the floating gate will no
longer reliably store charge.

29
00:02:08,699 --> 00:02:13,070
Currently the number of guaranteed writes
varies between 100,000 and

30
00:02:13,070 --> 00:02:19,060
To work around this limitation, flash chips
contain clever address mapping algorithms

31
00:02:19,060 --> 00:02:23,709
so that writes to the same address actually
are mapped to different flash cells on each

32
00:02:23,709 --> 00:02:26,320
successive write.

33
00:02:26,320 --> 00:02:31,290
The bottom line is that flash memories are
a higher-performance but higher-cost replacement

34
00:02:31,290 --> 00:02:37,530
for the hard-disk drive, the long-time technology
of choice for non-volatile storage.

35
00:02:37,530 --> 00:02:43,540
A hard-disk drive (HDD) contains one or more
rotating platters coated with a magnetic material.

36
00:02:43,540 --> 00:02:48,460
The platters rotate at speeds ranging from
5400 to 15000 RPM.

37
00:02:48,460 --> 00:02:55,010
A read/write head positioned above the surface
of a platter can detect or change the orientation

38
00:02:55,010 --> 00:02:59,320
of the magnetization of the magnetic material
below.

39
00:02:59,320 --> 00:03:04,049
The read/write head is mounted an actuator
that allows it to be positioned over different

40
00:03:04,049 --> 00:03:06,319
circular tracks.

41
00:03:06,319 --> 00:03:11,230
To read a particular sector of data, the head
must be positioned radially over the correct

42
00:03:11,230 --> 00:03:17,220
track, then wait for the platter to rotate
until it's over the desired sector.

43
00:03:17,220 --> 00:03:23,049
The average total time required to correctly
position the head is on the order of 10 milliseconds,

44
00:03:23,049 --> 00:03:26,019
so hard disk access times are quite long.

45
00:03:26,019 --> 00:03:30,629
However, once the read/write head is in the
correct position, data can be transferred

46
00:03:30,629 --> 00:03:33,739
at the respectable rate of 100 megabytes/second.

47
00:03:33,739 --> 00:03:39,019
If the head has to be repositioned between
each access, the effective transfer rate drops

48
00:03:39,019 --> 00:03:44,770
1000-fold, limited by the time it takes to
reposition the head.

49
00:03:44,770 --> 00:03:50,930
Hard disk drives provide cost-effective non-volatile
storage for terabytes of data, albeit at the

50
00:03:50,930 --> 00:03:53,849
cost of slow access times.

51
00:03:53,849 --> 00:03:57,459
This completes our whirlwind tour of memory
technologies.

52
00:03:57,459 --> 00:04:03,720
If you'd like to learn a bit more, Wikipedia
has useful articles on each type of device.

53
00:04:03,720 --> 00:04:08,579
SRAM sizes and access times have kept pace
with the improvements in the size and speed

54
00:04:08,579 --> 00:04:10,239
of integrated circuits.

55
00:04:10,239 --> 00:04:15,970
Interestingly, although capacities and transfer
rates for DRAMs and HDDs have improved, their

56
00:04:15,970 --> 00:04:19,019
initial access times have not improved nearly
as rapidly.

57
00:04:19,019 --> 00:04:25,190
Thankfully over the past decade flash memories
have helped to fill the performance gap between

58
00:04:25,190 --> 00:04:28,670
processor speeds and HDDs.

59
00:04:28,670 --> 00:04:34,830
But the gap between processor cycle times
and DRAM access times has continued to widen,

60
00:04:34,830 --> 00:04:40,770
increasing the challenge of designing low-latency
high-capacity memory systems.

61
00:04:40,770 --> 00:04:46,290
The capacity of the available memory technologies
varies over 10 orders of magnitude, and the

62
00:04:46,290 --> 00:04:50,530
variation in latencies varies over 8 orders
of magnitude.

63
00:04:50,530 --> 00:04:57,200
This creates a considerable challenge in figuring
out how to navigate the speed vs size tradeoffs.

64
00:04:57,200 --> 00:05:02,320
Each transition in memory hierarchy shows
the same fundamental design choice: we can

65
00:05:02,320 --> 00:05:06,180
pick smaller-and-faster or larger-and-slower.

66
00:05:06,180 --> 00:05:12,170
This is a bit awkward actually - can we figure
how to get the best of both worlds?

67
00:05:12,170 --> 00:05:17,110
We want our system to behave as if it had
a large, fast, and cheap main memory.

68
00:05:17,110 --> 00:05:21,530
Clearly we can't achieve this goal using any
single memory technology.

69
00:05:21,530 --> 00:05:27,700
Here's an idea: can we use a hierarchical
system of memories with different tradeoffs

70
00:05:27,700 --> 00:05:32,820
to achieve close to the same results as a
large, fast, cheap memory?

71
00:05:32,820 --> 00:05:37,780
Could we arrange for memory locations we're
using often to be stored, say, in SRAM and

72
00:05:37,780 --> 00:05:40,960
have those accesses be low latency?

73
00:05:40,960 --> 00:05:45,659
Could the rest of the data could be stored
in the larger and slower memory components,

74
00:05:45,659 --> 00:05:47,550
moving the between the levels when necessary?

75
00:05:47,550 --> 00:05:52,180
Let's follow this train of thought and see
where it leads us.

76
00:05:52,180 --> 00:05:54,600
There are two approaches we might take.

77
00:05:54,600 --> 00:05:59,520
The first is to expose the hierarchy, providing
some amount of each type of storage and let

78
00:05:59,520 --> 00:06:05,100
the programmer decide how best to allocate
the various memory resources for each particular

79
00:06:05,100 --> 00:06:06,880
computation.

80
00:06:06,880 --> 00:06:11,430
The programmer would write code that moved
data into fast storage when appropriate, then

81
00:06:11,430 --> 00:06:16,830
back to the larger and slower memories when
low-latency access was no longer required.

82
00:06:16,830 --> 00:06:21,841
There would only be a small amount of the
fastest memory, so data would be constantly

83
00:06:21,841 --> 00:06:26,940
in motion as the focus of the computation
changed.

84
00:06:26,940 --> 00:06:29,820
This approach has had notable advocates.

85
00:06:29,820 --> 00:06:35,530
Perhaps the most influential was Seymour Cray,
the "Steve Jobs" of supercomputers.

86
00:06:35,530 --> 00:06:40,880
Cray was the architect of the world's fastest
computers in each of three decades, inventing

87
00:06:40,880 --> 00:06:46,690
many of the technologies that form the foundation
of high-performance computing.

88
00:06:46,690 --> 00:06:52,480
His insight to managing the memory hierarchy
was to organize data as vectors and move vectors

89
00:06:52,480 --> 00:06:55,740
in and out of fast memory under program control.

90
00:06:55,740 --> 00:07:00,880
This was actually a good data abstraction
for certain types of scientific computing

91
00:07:00,880 --> 00:07:06,030
and his vector machines had the top computing
benchmarks for many years.

92
00:07:06,030 --> 00:07:11,910
The second alternative is to hide the hierarchy
and simply tell the programmer they have a

93
00:07:11,910 --> 00:07:15,680
large, uniform address space to use as they
wish.

94
00:07:15,680 --> 00:07:20,780
The memory system would, behind the scenes,
move data between the various levels of the

95
00:07:20,780 --> 00:07:25,810
memory hierarchy, depending on the usage patterns
it detected.

96
00:07:25,810 --> 00:07:30,780
This would require circuitry to examine each
memory access issued by the CPU to determine

97
00:07:30,780 --> 00:07:34,990
where in the hierarchy to find the requested
location.

98
00:07:34,990 --> 00:07:40,340
And then, if a particular region of addresses
was frequently accessed - say, when fetching

99
00:07:40,340 --> 00:07:43,810
instructions in a loop -
the memory system would arrange for those

100
00:07:43,810 --> 00:07:48,540
accesses to be mapped to the fastest memory
component and automatically move the loop

101
00:07:48,540 --> 00:07:49,900
instructions there.

102
00:07:49,900 --> 00:07:55,250
All of this machinery would be transparent
to the programmer: the program would simply

103
00:07:55,250 --> 00:08:01,110
fetch instructions and access data and the
memory system would handle the rest.

104
00:08:01,110 --> 00:08:05,410
Could the memory system automatically arrange
for the right data to be in the right place

105
00:08:05,410 --> 00:08:07,590
at the right time?

106
00:08:07,590 --> 00:08:10,580
Cray was deeply skeptical of this approach.

107
00:08:10,580 --> 00:08:14,450
He famously quipped "that you can't fake what
you haven't got".

108
00:08:14,450 --> 00:08:18,750
Wouldn't the programmer, with her knowledge
of how data was going to be used by a particular

109
00:08:18,750 --> 00:08:24,440
program, be able to do a better job by explicitly
managing the memory hierarchy?

110
00:08:24,440 --> 00:08:30,190
It turns out that when running general-purpose
programs, it is possible to build an automatically

111
00:08:30,190 --> 00:08:37,090
managed, low-latency, high-capacity hierarchical
memory system that appears as one large, uniform

112
00:08:37,090 --> 00:08:38,090
memory.

113
00:08:38,090 --> 00:08:41,510
What's the insight that makes this possible?

114
00:08:41,510 --> 00:08:43,140
That's the topic of the next section.