1 00:00:00,320 --> 00:00:05,400 In this lecture we return to the memory system that we last discussed in Lecture 14 of Part 2 00:00:05,400 --> 00:00:06,760 2. 3 00:00:06,760 --> 00:00:11,480 There we learned about the fundamental tradeoff in current memory technologies: as the memory's 4 00:00:11,480 --> 00:00:15,580 capacity increases, so does it access time. 5 00:00:15,580 --> 00:00:19,720 It takes some architectural cleverness to build a memory system that has a large capacity 6 00:00:19,720 --> 00:00:22,960 and a small average access time. 7 00:00:22,960 --> 00:00:27,570 The cleverness is embodied in the cache, a hardware subsystem that lives between the 8 00:00:27,570 --> 00:00:29,880 CPU and main memory. 9 00:00:29,880 --> 00:00:35,760 Modern CPUs have several levels of cache, where the modest-capacity first level has 10 00:00:35,760 --> 00:00:41,010 an access time close to that of the CPU, and higher levels of cache have slower access 11 00:00:41,010 --> 00:00:44,790 times but larger capacities. 12 00:00:44,790 --> 00:00:50,199 Caches give fast access to a small number of memory locations, using associative addressing 13 00:00:50,199 --> 00:00:55,320 so that the cache has the ability to hold the contents of the memory locations the CPU 14 00:00:55,320 --> 00:00:57,960 is accessing most frequently. 15 00:00:57,960 --> 00:01:02,770 The current contents of the cache are managed automatically by the hardware. 16 00:01:02,770 --> 00:01:07,960 Caches work well because of the principle of locality: if the CPU accesses location 17 00:01:07,960 --> 00:01:14,850 X at time T, it's likely to access nearby locations in the not-too-distant future. 18 00:01:14,850 --> 00:01:20,560 The cache is organized so that nearby locations can all reside in the cache simultaneously, 19 00:01:20,560 --> 00:01:25,850 using a simple indexing scheme to choose which cache location should be checked for a matching 20 00:01:25,850 --> 00:01:27,380 address. 21 00:01:27,380 --> 00:01:34,020 If the address requested by the CPU resides in the cache, access time is quite fast. 22 00:01:34,020 --> 00:01:39,080 In order to increase the probability that requested addresses reside in the cache, we 23 00:01:39,080 --> 00:01:43,969 introduced the notion of "associativity", which increased the number of cache locations 24 00:01:43,969 --> 00:01:48,658 checked on each access and solved the problem of having, say, instructions 25 00:01:48,658 --> 00:01:52,420 and data compete for the same cache locations.. 26 00:01:52,420 --> 00:01:58,520 We also discussed appropriate choices for block size (the number of words in a cache 27 00:01:58,520 --> 00:02:02,369 line), replacement policy (how to choose which cache 28 00:02:02,369 --> 00:02:08,060 line to reuse on a cache miss), and write policy (deciding when to write changed 29 00:02:08,060 --> 00:02:09,850 data back to main memory). 30 00:02:09,850 --> 00:02:16,010 We'll see these same choices again in this lecture as we work to expand the memory hierarchy 31 00:02:16,010 --> 00:02:18,670 beyond main memory. 32 00:02:18,670 --> 00:02:23,910 We never discussed where the data in main memory comes from and how the process of filling 33 00:02:23,910 --> 00:02:25,970 main memory is managed. 34 00:02:25,970 --> 00:02:28,890 That's the topic of today's lecture.. 35 00:02:28,890 --> 00:02:33,950 Flash drives and hard disks provide storage options that have more capacity than main 36 00:02:33,950 --> 00:02:40,579 memory, with the added benefit of being non-volatile, i.e., they continue to store data even when 37 00:02:40,579 --> 00:02:42,360 turned off. 38 00:02:42,360 --> 00:02:47,340 The generic name for these new devices is "secondary storage", where data will reside 39 00:02:47,340 --> 00:02:53,310 until it's moved to "primary storage", i.e., main memory, for use. 40 00:02:53,310 --> 00:02:57,970 So when we first turn on a computer system, all of its data will be found in secondary 41 00:02:57,970 --> 00:03:03,340 storage, which we'll think of as the final level of our memory hierarchy. 42 00:03:03,340 --> 00:03:07,810 As we think about the right memory architecture, we'll build on the ideas from our previous 43 00:03:07,810 --> 00:03:13,790 discussion of caches, and, indeed, think of main memory as another level of cache for 44 00:03:13,790 --> 00:03:16,650 the permanent, high-capacity secondary storage. 45 00:03:16,650 --> 00:03:22,840 We'll be building what we call a virtual memory system, which, like caches, will automatically 46 00:03:22,840 --> 00:03:27,650 move data from secondary storage into main memory as needed. 47 00:03:27,650 --> 00:03:32,970 The virtual memory system will also let us control what data can be accessed by the program, 48 00:03:32,970 --> 00:03:38,130 serving as a stepping stone to building a system that can securely run many programs 49 00:03:38,130 --> 00:03:39,810 on a single CPU. 50 00:03:39,810 --> 00:03:42,950 Let's get started! 51 00:03:42,950 --> 00:03:47,620 Here we see the cache and main memory, the two components of our memory system as developed 52 00:03:47,620 --> 00:03:49,650 in Lecture 14. 53 00:03:49,650 --> 00:03:52,500 And here's our new secondary storage layer. 54 00:03:52,500 --> 00:03:57,120 The good news: the capacity of secondary storage is huge! 55 00:03:57,120 --> 00:04:01,280 Even the most modest modern computer system will have 100's of gigabytes of secondary 56 00:04:01,280 --> 00:04:08,130 storage and having a terabyte or two is not uncommon on medium-size desktop computers. 57 00:04:08,130 --> 00:04:13,830 Secondary storage for the cloud can grow to many petabytes (a petabyte is 10^15 bytes 58 00:04:13,830 --> 00:04:16,548 or a million gigabytes). 59 00:04:16,548 --> 00:04:23,260 The bad news: disk access times are 100,000 times longer that those of DRAM. 60 00:04:23,260 --> 00:04:28,430 So the change in access time from DRAM to disk is much, much larger than the change 61 00:04:28,430 --> 00:04:31,780 from caches to DRAM. 62 00:04:31,780 --> 00:04:36,070 When looking at DRAM timing, we discovered that the additional access time for retrieving 63 00:04:36,070 --> 00:04:41,560 a contiguous block of words was small compared to the access time for the first word, 64 00:04:41,560 --> 00:04:47,310 so fetching a block was the right plan assuming we'd eventually access the additional words. 65 00:04:47,310 --> 00:04:52,340 For disks, the access time difference between the first word and successive words is even 66 00:04:52,340 --> 00:04:53,670 more dramatic. 67 00:04:53,670 --> 00:04:59,760 So, not surprisingly, we'll be reading fairly large blocks of data from disk. 68 00:04:59,760 --> 00:05:06,400 The consequence of the much, much larger secondary-storage access time is that it will be very time consuming 69 00:05:06,400 --> 00:05:10,790 to access disk if the data we need is not in main memory. 70 00:05:10,790 --> 00:05:16,960 So we need to design our virtual memory system to minimize misses when accessing main memory. 71 00:05:16,960 --> 00:05:23,710 A miss, and the subsequent disk access, will have a huge impact on the average memory access 72 00:05:23,710 --> 00:05:29,530 time, so the miss rate will need to be very, very small compared to, say, the rate of executing 73 00:05:29,530 --> 00:05:31,620 instructions. 74 00:05:31,620 --> 00:05:36,580 Given the enormous miss penalties of secondary storage, what does that tell us about how 75 00:05:36,580 --> 00:05:39,640 it should be used as part of our memory hierarchy? 76 00:05:39,640 --> 00:05:44,450 We will need high associativity, i.e., we need a great deal of flexibility on how data 77 00:05:44,450 --> 00:05:47,770 from disk can be located in main memory. 78 00:05:47,770 --> 00:05:52,990 In other words, if our working set of memory accesses fit in main memory, our virtual memory 79 00:05:52,990 --> 00:05:59,030 system should make that possible, avoiding unnecessary collisions between accesses to 80 00:05:59,030 --> 00:06:02,160 one block of data and another. 81 00:06:02,160 --> 00:06:06,490 We'll want to use a large block size to take advantage of the low incremental cost of reading 82 00:06:06,490 --> 00:06:08,610 successive words from disk. 83 00:06:08,610 --> 00:06:13,340 And, given the principle of locality, we'd expect to be accessing other words of the 84 00:06:13,340 --> 00:06:18,780 block, thus amortizing the cost of the miss over many future hits. 85 00:06:18,780 --> 00:06:25,190 Finally, we'll want to use a write-back strategy where we'll only update the contents of disk 86 00:06:25,190 --> 00:06:30,880 when data that's changed in main memory needs to be replaced by data from other blocks of 87 00:06:30,880 --> 00:06:32,150 secondary storage. 88 00:06:32,150 --> 00:06:36,020 There is upside to misses having such long latencies. 89 00:06:36,020 --> 00:06:44,070 We can manage the organization of main memory and the accesses to secondary storage in software. 90 00:06:44,070 --> 00:06:48,820 Even it takes 1000's of instructions to deal with the consequences of a miss, executing 91 00:06:48,820 --> 00:06:52,970 those instructions is quick compared to the access time of a disk. 92 00:06:52,970 --> 00:06:58,639 So our strategy will be to handle hits in hardware and misses in software. 93 00:06:58,639 --> 00:07:03,210 This will lead to simple memory management hardware and the possibility of using very 94 00:07:03,210 --> 00:07:07,860 clever strategies implemented in software to figure out what to do on misses.