1 00:00:00,000 --> 00:00:02,520 The following content is provided under a Creative 2 00:00:02,520 --> 00:00:03,970 Commons license. 3 00:00:03,970 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,690 continue to offer high quality educational resources for free. 5 00:00:10,690 --> 00:00:13,350 To make a donation or view additional materials 6 00:00:13,350 --> 00:00:17,190 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,190 --> 00:00:18,400 at ocw.mit.edu. 8 00:00:30,400 --> 00:00:33,050 PROFESSOR: So this is the last but one lecture of the term. 9 00:00:33,050 --> 00:00:37,360 So what I'm going to do today is in about 45 minutes, 10 00:00:37,360 --> 00:00:40,900 give you a quick history of the internet from-- 11 00:00:40,900 --> 00:00:45,460 we'll start in the late 1950s and then get to today. 12 00:00:45,460 --> 00:00:47,440 And then next time on Monday, I'll 13 00:00:47,440 --> 00:00:50,407 conclude this class by first doing a wrap-up of 6.02, 14 00:00:50,407 --> 00:00:52,490 and then also telling you a little bit about where 15 00:00:52,490 --> 00:00:54,520 I think the future of communications systems 16 00:00:54,520 --> 00:00:57,160 might be going I'll probably be wrong about it, 17 00:00:57,160 --> 00:00:59,170 but I'll be confident about it. 18 00:01:03,680 --> 00:01:06,250 And today, the idea is to try to connect 19 00:01:06,250 --> 00:01:08,740 some of the topics we've studied in the class so far 20 00:01:08,740 --> 00:01:10,670 to this history. 21 00:01:10,670 --> 00:01:12,920 Of course, we're not going to be able to do all of it. 22 00:01:12,920 --> 00:01:15,790 So the story so far in terms of the history, 23 00:01:15,790 --> 00:01:17,590 you have to assume-- 24 00:01:17,590 --> 00:01:20,680 so we're going to start in 1959, or 1957. 25 00:01:20,680 --> 00:01:25,570 And by this time, the history of communication systems 26 00:01:25,570 --> 00:01:27,670 has had a lot of successes. 27 00:01:27,670 --> 00:01:30,740 And a common theme is that there's 28 00:01:30,740 --> 00:01:34,480 the technology that comes around, it succeeds, it 29 00:01:34,480 --> 00:01:36,650 tries to take over the world. 30 00:01:36,650 --> 00:01:38,230 And right at the time and it looks 31 00:01:38,230 --> 00:01:40,272 like there's nothing else that's going to happen, 32 00:01:40,272 --> 00:01:44,360 some other new technology comes around that over time kills it. 33 00:01:44,360 --> 00:01:47,770 So the first successful network technology 34 00:01:47,770 --> 00:01:49,960 was the electric telegraph. 35 00:01:49,960 --> 00:01:52,745 And the electric telegraph was done by a number 36 00:01:52,745 --> 00:01:53,620 of different people-- 37 00:01:53,620 --> 00:01:56,140 Wheatstone and Cooke built an electric telegraph 38 00:01:56,140 --> 00:01:59,170 in England, Morse and Vail built one in the US. 39 00:01:59,170 --> 00:02:02,920 The Morse code was developed as part of the electric telegraph. 40 00:02:02,920 --> 00:02:07,480 And that did a great job in the 1830s, 41 00:02:07,480 --> 00:02:09,919 1840s, 1850s, and so forth. 42 00:02:09,919 --> 00:02:13,360 And then other technologies came around. 43 00:02:13,360 --> 00:02:16,360 Now, this sort of story keeps repeating. 44 00:02:16,360 --> 00:02:19,450 And so by the 1950s, the dominant player 45 00:02:19,450 --> 00:02:22,580 in the communication area is the telephone network. 46 00:02:22,580 --> 00:02:25,600 So we have the Bell Telephone Company, and in the US, 47 00:02:25,600 --> 00:02:27,010 it dominates. 48 00:02:27,010 --> 00:02:29,590 There's equivalent telephone companies in other countries. 49 00:02:29,590 --> 00:02:33,730 And you have this massive, amazing telephone network. 50 00:02:33,730 --> 00:02:37,900 And increasingly, many people have telephones. 51 00:02:37,900 --> 00:02:42,640 On the wireless side, there isn't a wireless telephone 52 00:02:42,640 --> 00:02:44,080 system in the 1950s. 53 00:02:44,080 --> 00:02:46,240 But what we have are radio-- 54 00:02:46,240 --> 00:02:48,880 broadcast radio and broadcast television, 55 00:02:48,880 --> 00:02:53,620 and some very powerful companies that own wireless spectrum 56 00:02:53,620 --> 00:02:55,170 to offer television and radio. 57 00:02:58,390 --> 00:03:00,400 So the story starts in the late 1950s. 58 00:03:00,400 --> 00:03:02,440 And in 1957, a big thing happened-- 59 00:03:02,440 --> 00:03:03,400 Sputnik launched. 60 00:03:03,400 --> 00:03:07,210 And that caused the US to assume that they were falling behind 61 00:03:07,210 --> 00:03:08,320 in science and technology. 62 00:03:08,320 --> 00:03:11,200 And that led to the creation of ARPA, the Advanced Research 63 00:03:11,200 --> 00:03:12,760 Projects Agency that still exists. 64 00:03:12,760 --> 00:03:13,910 It's known as DARPA today. 65 00:03:13,910 --> 00:03:16,690 And it is probably the biggest federal funder 66 00:03:16,690 --> 00:03:19,720 of fundamental research and a lot of applied research 67 00:03:19,720 --> 00:03:23,060 and development. 68 00:03:23,060 --> 00:03:26,030 Paul Baran in many ways is one of the fathers, 69 00:03:26,030 --> 00:03:27,953 or the father of packet switch technologies. 70 00:03:27,953 --> 00:03:29,870 He was working at a think tank called the Rand 71 00:03:29,870 --> 00:03:34,190 Corporation, which was an organization that 72 00:03:34,190 --> 00:03:35,540 really was a think tank. 73 00:03:35,540 --> 00:03:38,990 It allowed people to think about long-term fundamental 74 00:03:38,990 --> 00:03:41,120 directions in terms of technology 75 00:03:41,120 --> 00:03:43,620 and where technology was heading. 76 00:03:43,620 --> 00:03:45,860 And he was looking at the problem 77 00:03:45,860 --> 00:03:48,560 of trying to build what he called 78 00:03:48,560 --> 00:03:50,870 a survivable communication network. 79 00:03:50,870 --> 00:03:52,700 And the story is that he's trying 80 00:03:52,700 --> 00:03:55,670 to build a network that can continue to work 81 00:03:55,670 --> 00:03:57,032 in the face of a nuclear war. 82 00:03:57,032 --> 00:03:59,240 But that's really not what he was trying to go after. 83 00:03:59,240 --> 00:04:01,340 What he was trying to build was just understanding 84 00:04:01,340 --> 00:04:04,460 how do you design communication networks that 85 00:04:04,460 --> 00:04:07,160 allow you to handle failures? 86 00:04:07,160 --> 00:04:09,200 And a lot of the then telephone network 87 00:04:09,200 --> 00:04:10,970 was a very centralized kind of structure 88 00:04:10,970 --> 00:04:15,020 where you would have a network that had very little redundancy 89 00:04:15,020 --> 00:04:15,920 built into it. 90 00:04:15,920 --> 00:04:19,047 And you'd have lots of these central star-like topologies. 91 00:04:19,047 --> 00:04:21,380 And the problem, of course, when you connect these stars 92 00:04:21,380 --> 00:04:23,660 together, and these star-like pieces connect 93 00:04:23,660 --> 00:04:27,230 to other star-like pieces, is that you had better 94 00:04:27,230 --> 00:04:30,590 make these points here, these nodes or switches 95 00:04:30,590 --> 00:04:34,293 here, extremely, extremely reliable, 96 00:04:34,293 --> 00:04:35,960 and those links that are connecting them 97 00:04:35,960 --> 00:04:38,033 to these other central structures. 98 00:04:38,033 --> 00:04:39,950 Because otherwise, the failure of those things 99 00:04:39,950 --> 00:04:41,240 would kill the system. 100 00:04:41,240 --> 00:04:43,850 And the Bell Telephone Company actually 101 00:04:43,850 --> 00:04:46,140 understood how to build these very expensive, very, 102 00:04:46,140 --> 00:04:50,060 very reliable switches, but they were very, very, extremely 103 00:04:50,060 --> 00:04:51,690 expensive. 104 00:04:51,690 --> 00:04:54,620 The other problem is, and we'll get back to this later today, 105 00:04:54,620 --> 00:04:56,810 is that a telephone network is a great network. 106 00:04:56,810 --> 00:04:59,000 But it supports exactly one application. 107 00:04:59,000 --> 00:05:01,760 The application is you pick up the phone and you talk. 108 00:05:01,760 --> 00:05:03,543 It's very hard, on the face of it, 109 00:05:03,543 --> 00:05:05,210 to imagine how a telephone network would 110 00:05:05,210 --> 00:05:09,860 do a great job at supporting the web, for example. 111 00:05:09,860 --> 00:05:11,540 No one even thought of the web. 112 00:05:11,540 --> 00:05:13,490 But even something basic like being 113 00:05:13,490 --> 00:05:18,140 able to watch a video stream whose quality might 114 00:05:18,140 --> 00:05:21,163 vary with time-- 115 00:05:21,163 --> 00:05:22,580 these are things that the internet 116 00:05:22,580 --> 00:05:24,650 did differently and did better than the telephone network. 117 00:05:24,650 --> 00:05:26,275 But the telephone network fundamentally 118 00:05:26,275 --> 00:05:29,173 had, in those days, a fault tolerance issue. 119 00:05:29,173 --> 00:05:30,590 And the way they dealt with it was 120 00:05:30,590 --> 00:05:34,280 to build extremely reliable components. 121 00:05:34,280 --> 00:05:36,453 Paul Baran's idea, which other people 122 00:05:36,453 --> 00:05:38,120 had been thinking about and toying with, 123 00:05:38,120 --> 00:05:39,828 was he observed that-- and in those days, 124 00:05:39,828 --> 00:05:42,900 the telephone network was largely analog. 125 00:05:42,900 --> 00:05:45,000 The digital telephony wasn't really there. 126 00:05:45,000 --> 00:05:47,750 And digital computers were just starting to come in. 127 00:05:47,750 --> 00:05:50,180 And Paul Baran was probably one of the first few people 128 00:05:50,180 --> 00:05:53,510 who realized that computing and digital technologies can 129 00:05:53,510 --> 00:05:56,690 change life in terms of how you build these systems. 130 00:05:56,690 --> 00:06:01,610 Because the digital abstraction allows you to know for sure 131 00:06:01,610 --> 00:06:03,380 whether a component is working or not. 132 00:06:03,380 --> 00:06:05,210 Because if it works, it gives you an answer 133 00:06:05,210 --> 00:06:06,830 that then you can verify. 134 00:06:06,830 --> 00:06:09,860 And if it doesn't work, it just stops working. 135 00:06:09,860 --> 00:06:14,180 It isn't like an analog system where it may or may not 136 00:06:14,180 --> 00:06:16,940 be working, and as the noise increases or some fault occurs, 137 00:06:16,940 --> 00:06:18,830 you're starting to see a lot of noise. 138 00:06:18,830 --> 00:06:20,960 And it's garbled, you're not quite sure. 139 00:06:20,960 --> 00:06:23,460 With a digital system, it either works or it doesn't work. 140 00:06:23,460 --> 00:06:24,950 You can build systems like that. 141 00:06:24,950 --> 00:06:27,170 And he noticed that you could now 142 00:06:27,170 --> 00:06:32,060 start to think of building reliable systems out 143 00:06:32,060 --> 00:06:34,550 of lots of unreliable individual components. 144 00:06:34,550 --> 00:06:36,920 And this is the fundamental guiding theme 145 00:06:36,920 --> 00:06:40,670 for large-scale computing systems from the late 1950s. 146 00:06:40,670 --> 00:06:42,170 I mean, this goes all the way to how 147 00:06:42,170 --> 00:06:44,810 Google, or Amazon, or Facebook, or any of these big data 148 00:06:44,810 --> 00:06:45,590 centers work. 149 00:06:45,590 --> 00:06:47,150 Any single one of those computers 150 00:06:47,150 --> 00:06:49,640 there is highly unreliable relative to what 151 00:06:49,640 --> 00:06:51,710 you could do if you put in a lot of money 152 00:06:51,710 --> 00:06:53,370 to build a very reliable computer. 153 00:06:53,370 --> 00:06:55,730 But the ensemble is highly reliable. 154 00:06:55,730 --> 00:06:58,100 And to do that requires a lot of cleverness and care. 155 00:06:58,100 --> 00:06:59,780 And the first real example of this 156 00:06:59,780 --> 00:07:02,330 is a digital communication network built out 157 00:07:02,330 --> 00:07:05,500 of this idea of packets. 158 00:07:05,500 --> 00:07:07,070 In a couple of papers that he wrote 159 00:07:07,070 --> 00:07:10,190 in the late 1950s and early 1960s, 160 00:07:10,190 --> 00:07:12,290 he said that with the digital computer, 161 00:07:12,290 --> 00:07:15,590 you could now start to build highly reliable communication 162 00:07:15,590 --> 00:07:21,060 networks out of unreliable components. 163 00:07:21,060 --> 00:07:23,880 And so he articulated this idea that you 164 00:07:23,880 --> 00:07:27,030 can connect these switches or nodes together 165 00:07:27,030 --> 00:07:29,560 in highly redundant structures. 166 00:07:29,560 --> 00:07:31,980 And if you have a stream of data to send, 167 00:07:31,980 --> 00:07:35,400 you don't have to really pick a particular path 168 00:07:35,400 --> 00:07:37,080 through the structure. 169 00:07:37,080 --> 00:07:39,630 You could take that message and break it up 170 00:07:39,630 --> 00:07:42,930 into different pieces and ship them in different directions. 171 00:07:42,930 --> 00:07:45,940 And even if you chose to ship them all in one direction, 172 00:07:45,940 --> 00:07:48,570 if a failure occurred, these switches 173 00:07:48,570 --> 00:07:51,210 could themselves start moving the data 174 00:07:51,210 --> 00:07:54,180 in different directions, which meant that you no longer think 175 00:07:54,180 --> 00:07:57,660 of a communication as a big stream 176 00:07:57,660 --> 00:07:59,910 that you have to send in one way, 177 00:07:59,910 --> 00:08:02,010 but you can start thinking about splitting it up 178 00:08:02,010 --> 00:08:04,180 into these different pieces. 179 00:08:04,180 --> 00:08:07,890 Now, like with many of these ideas, it's rare to find-- 180 00:08:07,890 --> 00:08:11,278 sometimes it happens, but it's rare to find exactly one person 181 00:08:11,278 --> 00:08:12,570 in the world thinking about it. 182 00:08:12,570 --> 00:08:14,417 No matter how groundbreaking the idea, 183 00:08:14,417 --> 00:08:16,000 there were other people working on it. 184 00:08:16,000 --> 00:08:18,690 And Donald Davies in the UK in the early '60s 185 00:08:18,690 --> 00:08:20,430 was looking at similar ideas. 186 00:08:20,430 --> 00:08:22,530 And he actually coined the term packet. 187 00:08:22,530 --> 00:08:26,160 We use packets now to mean these little messages 188 00:08:26,160 --> 00:08:28,260 that you ship through the network that 189 00:08:28,260 --> 00:08:30,660 are atomic units of delivery. 190 00:08:30,660 --> 00:08:34,840 This term was coined by Donald Davies in the 1960s. 191 00:08:34,840 --> 00:08:37,289 Now, all of this was wonderful and sort 192 00:08:37,289 --> 00:08:38,880 of theoretical abstractions. 193 00:08:38,880 --> 00:08:41,789 But how do you start to come up with some design 194 00:08:41,789 --> 00:08:45,180 principles for building communication networks? 195 00:08:45,180 --> 00:08:47,820 In particular, how do you deal with the problem 196 00:08:47,820 --> 00:08:52,110 of having these links come together at a switch, 197 00:08:52,110 --> 00:08:54,360 and try to share the links going out 198 00:08:54,360 --> 00:08:57,990 of a switch in a way that allows traffic 199 00:08:57,990 --> 00:09:02,400 from different conversations to multiplex on the same link? 200 00:09:02,400 --> 00:09:05,730 The idea of having a queue in a switch in retrospect 201 00:09:05,730 --> 00:09:06,930 seems completely obvious. 202 00:09:06,930 --> 00:09:09,103 But if you're the first time you're seeing something 203 00:09:09,103 --> 00:09:10,770 like this, and the telephone network had 204 00:09:10,770 --> 00:09:13,620 really no queues, the idea that you would build a queue 205 00:09:13,620 --> 00:09:18,000 and now start to analyze is a pretty groundbreaking result. 206 00:09:18,000 --> 00:09:19,750 Again, there were many people involved. 207 00:09:19,750 --> 00:09:22,440 But probably the leading contributions 208 00:09:22,440 --> 00:09:24,960 came from a person called Len Kleinrock, who 209 00:09:24,960 --> 00:09:26,700 was a PhD student at MIT. 210 00:09:26,700 --> 00:09:31,710 And in his 1961 PhD thesis, "Information Flow 211 00:09:31,710 --> 00:09:34,410 in Large Communication Nets," wrote 212 00:09:34,410 --> 00:09:38,580 about how you could use queuing theory to analyze and to model 213 00:09:38,580 --> 00:09:41,090 communication networks. 214 00:09:41,090 --> 00:09:43,490 Now, at around the same time, again 215 00:09:43,490 --> 00:09:46,390 at MIT, Licklider and Clark wrote 216 00:09:46,390 --> 00:09:48,260 a really interesting paper. 217 00:09:48,260 --> 00:09:49,940 It's actually worth reading now. 218 00:09:49,940 --> 00:09:53,060 I mean, it's 50 years ago, but it's interesting. 219 00:09:53,060 --> 00:09:55,700 You have to go back and think, this was at a time 220 00:09:55,700 --> 00:09:58,797 when people didn't really have this idea that people could 221 00:09:58,797 --> 00:09:59,880 sit in front of computers. 222 00:09:59,880 --> 00:10:02,700 Computers were used to maybe count votes. 223 00:10:02,700 --> 00:10:04,790 I suspect they get it wrong today more 224 00:10:04,790 --> 00:10:06,350 than they did in those days. 225 00:10:06,350 --> 00:10:08,550 But computers were used to count votes, 226 00:10:08,550 --> 00:10:11,057 they were used to help with the US Census. 227 00:10:11,057 --> 00:10:13,640 But nobody thought about people sitting in front of computers. 228 00:10:13,640 --> 00:10:17,540 And they wrote this wonderful paper called "On-line Man 229 00:10:17,540 --> 00:10:18,890 Computer Communication." 230 00:10:18,890 --> 00:10:22,705 I guess in those days, you know, man meant people. 231 00:10:22,705 --> 00:10:24,080 So anyway, they wrote this paper. 232 00:10:24,080 --> 00:10:26,812 And in fact, Licklider had this vision 233 00:10:26,812 --> 00:10:28,520 of what he called a galactic network that 234 00:10:28,520 --> 00:10:31,700 would span the globe and beyond, which was-- 235 00:10:31,700 --> 00:10:36,243 for the early '60s, it was a pretty remarkable vision. 236 00:10:36,243 --> 00:10:37,910 Now, using these ideas, and particularly 237 00:10:37,910 --> 00:10:39,560 Len Kleinrock's ideas, and this idea 238 00:10:39,560 --> 00:10:41,630 of man-computer interactions-- which 239 00:10:41,630 --> 00:10:44,720 of course, the idea of everybody having their own computer 240 00:10:44,720 --> 00:10:45,928 wasn't this paper's vision. 241 00:10:45,928 --> 00:10:48,470 This paper's vision was there's a lot of computers out there. 242 00:10:48,470 --> 00:10:50,360 And people just had remote terminals, 243 00:10:50,360 --> 00:10:53,120 and you would log in and have these big computers 244 00:10:53,120 --> 00:10:54,470 that you could use. 245 00:10:54,470 --> 00:10:56,750 But then you would have nice interactions 246 00:10:56,750 --> 00:10:59,110 on your own terminal. 247 00:10:59,110 --> 00:11:00,980 That was what that paper was about. 248 00:11:00,980 --> 00:11:04,430 Larry Roberts was first at MIT, and then moved to ARPA 249 00:11:04,430 --> 00:11:07,760 to run this program, created something called the ARPANET, 250 00:11:07,760 --> 00:11:10,430 and wrote a paper and wrote a program 251 00:11:10,430 --> 00:11:14,330 that is a call for proposals for the ARPANET, which 252 00:11:14,330 --> 00:11:16,490 was a plan for timesharing remote computers. 253 00:11:16,490 --> 00:11:18,650 So the internet-- the ARPANET was 254 00:11:18,650 --> 00:11:19,910 the precursor to the internet. 255 00:11:19,910 --> 00:11:23,630 And it started not because we wanted to build a communication 256 00:11:23,630 --> 00:11:24,980 network to prevent-- 257 00:11:24,980 --> 00:11:27,410 for it to work when there was nuclear war or any 258 00:11:27,410 --> 00:11:28,640 of these major disasters. 259 00:11:28,640 --> 00:11:31,280 It actually had a very concrete goal-- 260 00:11:31,280 --> 00:11:34,580 just allow people-- computers were really, really expensive-- 261 00:11:34,580 --> 00:11:37,400 just allow people, no matter where they were, 262 00:11:37,400 --> 00:11:41,420 to be able to harness the power of expensive computing 263 00:11:41,420 --> 00:11:43,880 far away, and make it look to the extent 264 00:11:43,880 --> 00:11:47,390 possible as if the computers were with you. 265 00:11:47,390 --> 00:11:48,620 That was the vision-- 266 00:11:48,620 --> 00:11:51,380 pretty compelling, but simple. 267 00:11:51,380 --> 00:11:53,450 And they decided, for very good reason, 268 00:11:53,450 --> 00:11:55,040 to pick packet switching. 269 00:11:55,040 --> 00:11:58,075 The reasons primarily had to do with economics. 270 00:11:58,075 --> 00:11:59,450 This was a network that was being 271 00:11:59,450 --> 00:12:02,480 proposed for an application whose utility was questionable. 272 00:12:02,480 --> 00:12:06,170 And the idea of investing huge amounts of money 273 00:12:06,170 --> 00:12:08,487 was not quite palatable. 274 00:12:08,487 --> 00:12:10,070 And Larry Roberts and others were very 275 00:12:10,070 --> 00:12:12,030 taken by this vision of packet switching. 276 00:12:12,030 --> 00:12:14,030 So they said you know what, the ARPANET is going 277 00:12:14,030 --> 00:12:16,427 to be a packet switch network. 278 00:12:16,427 --> 00:12:18,260 I'll come back to this later, but of course, 279 00:12:18,260 --> 00:12:19,730 the telephone companies like AT&T 280 00:12:19,730 --> 00:12:21,680 just thought this was a terrible idea, 281 00:12:21,680 --> 00:12:25,820 and were using every opportunity to ridicule the idea. 282 00:12:25,820 --> 00:12:29,370 The ARPANET was created, and a few teams bid on the contract 283 00:12:29,370 --> 00:12:29,870 for it. 284 00:12:29,870 --> 00:12:31,760 And BBN-- Bolt, Beranek, and Newman, 285 00:12:31,760 --> 00:12:34,657 that's near Alewife in Cambridge. 286 00:12:34,657 --> 00:12:36,740 They're still there, they're part of Raytheon now. 287 00:12:36,740 --> 00:12:40,880 And they still continue to do pretty interesting research. 288 00:12:40,880 --> 00:12:44,557 They won the contract to build this network, 289 00:12:44,557 --> 00:12:46,140 build the technology for this network. 290 00:12:46,140 --> 00:12:48,560 And because the processing involved 291 00:12:48,560 --> 00:12:52,100 in the network, the protocols that were involved 292 00:12:52,100 --> 00:12:55,340 were considered complicated, and considered 293 00:12:55,340 --> 00:12:58,400 to be computationally intensive, they 294 00:12:58,400 --> 00:13:01,580 had to build a separate piece of hardware 295 00:13:01,580 --> 00:13:05,120 that they named the IMP, or the Interface Message Processor. 296 00:13:05,120 --> 00:13:07,190 And BBN won the contract to do that. 297 00:13:07,190 --> 00:13:09,350 The idea of an interface message processor 298 00:13:09,350 --> 00:13:11,540 is that every computer, as well as 299 00:13:11,540 --> 00:13:13,640 every switch, that the switch would have 300 00:13:13,640 --> 00:13:15,710 some hardware to forward stuff. 301 00:13:15,710 --> 00:13:18,170 But you needed something to do the computation 302 00:13:18,170 --> 00:13:21,872 of both the routing tables as well as actually every packet. 303 00:13:21,872 --> 00:13:23,330 Every packet would show up, and you 304 00:13:23,330 --> 00:13:25,880 would have to compute some sort of a checksum on it. 305 00:13:25,880 --> 00:13:30,110 And you'd have to do this computational task of figuring 306 00:13:30,110 --> 00:13:32,180 out how to forward that packet. 307 00:13:32,180 --> 00:13:34,760 And that was just considered too much work 308 00:13:34,760 --> 00:13:36,603 to have on an actual little computer. 309 00:13:36,603 --> 00:13:39,020 So they actually had to build a separate piece of hardware 310 00:13:39,020 --> 00:13:40,390 that you would attach to your computer, 311 00:13:40,390 --> 00:13:41,870 and it would probably be about as big. 312 00:13:41,870 --> 00:13:43,245 And you can see the picture here. 313 00:13:43,245 --> 00:13:45,980 These IMPs were attached to bigger computers or computers 314 00:13:45,980 --> 00:13:46,760 of the same size. 315 00:13:46,760 --> 00:13:48,410 And these did the networking. 316 00:13:48,410 --> 00:13:52,520 I mean, today, all of that stuff, a million times more 317 00:13:52,520 --> 00:13:54,200 is going on this device. 318 00:13:54,200 --> 00:13:57,170 But back in the day, that's how it was. 319 00:13:57,170 --> 00:13:59,060 So they won this interface message processor 320 00:13:59,060 --> 00:14:01,100 contract called the IMP. 321 00:14:01,100 --> 00:14:03,590 And when you win a big federal contract, 322 00:14:03,590 --> 00:14:06,570 oftentimes, your Congressman or Senator writes to you. 323 00:14:06,570 --> 00:14:09,320 In fact, it's funny because a bunch of us got-- 324 00:14:09,320 --> 00:14:12,200 for a period of time before the Senate election, a bunch of us 325 00:14:12,200 --> 00:14:15,860 were getting emails, letters from Scott Brown congratulating 326 00:14:15,860 --> 00:14:18,412 us on winning some dinky little NSF proposal. 327 00:14:18,412 --> 00:14:20,120 I don't know if other people here got it, 328 00:14:20,120 --> 00:14:21,203 other faculty here got it. 329 00:14:21,203 --> 00:14:24,170 But it's sort of like in those days, if you 330 00:14:24,170 --> 00:14:25,850 won a big contract, you'd get money-- 331 00:14:25,850 --> 00:14:27,558 you'd get a letter from your congressman. 332 00:14:27,558 --> 00:14:30,770 So in fact, Ted Kennedy, who was the senator at the time, 333 00:14:30,770 --> 00:14:33,332 and was for many years, congratulated the team 334 00:14:33,332 --> 00:14:34,040 for winning this. 335 00:14:34,040 --> 00:14:36,170 Except he got it wrong-- he congratulated them 336 00:14:36,170 --> 00:14:39,290 on winning the contract to build the interfaith message 337 00:14:39,290 --> 00:14:40,790 processor. 338 00:14:40,790 --> 00:14:42,290 I assume that if they actually had 339 00:14:42,290 --> 00:14:45,930 built that, it might have been a more useful contribution 340 00:14:45,930 --> 00:14:47,820 to world peace. 341 00:14:47,820 --> 00:14:50,250 But all they managed to get was the contract 342 00:14:50,250 --> 00:14:52,460 to build the interface message processor-- 343 00:14:52,460 --> 00:14:54,190 just details. 344 00:14:54,190 --> 00:14:57,470 Anyway, so this team was a pretty remarkable team. 345 00:14:57,470 --> 00:14:58,960 They built the first-- 346 00:14:58,960 --> 00:15:00,710 they didn't build the first email program. 347 00:15:00,710 --> 00:15:02,690 That was done over at MIT in the '60s. 348 00:15:02,690 --> 00:15:05,120 But what they did do was the first email program 349 00:15:05,120 --> 00:15:07,280 that crossed different organizations. 350 00:15:07,280 --> 00:15:09,870 And in fact, the @ symbol in your email addresses, 351 00:15:09,870 --> 00:15:12,380 which of course, is sort of the right symbol to use, 352 00:15:12,380 --> 00:15:13,880 if you use the @ symbol. 353 00:15:13,880 --> 00:15:17,058 But there was a person at BBN, Ray Tomlinson, who said, 354 00:15:17,058 --> 00:15:19,100 I'm going to put the @ symbol in email addresses. 355 00:15:19,100 --> 00:15:21,200 And a lot of early stuff happened 356 00:15:21,200 --> 00:15:23,525 that continues to this day. 357 00:15:23,525 --> 00:15:24,650 So they built this network. 358 00:15:24,650 --> 00:15:29,000 And Kleinrock over at UCLA was the principal investigator 359 00:15:29,000 --> 00:15:30,140 on this-- 360 00:15:30,140 --> 00:15:32,780 now building systems out of this piece of hardware 361 00:15:32,780 --> 00:15:37,000 that was built. And this was the picture of the ARPANET in 1969. 362 00:15:37,000 --> 00:15:39,080 This became the internet. 363 00:15:39,080 --> 00:15:42,560 There's a continuous evolution from this four-node picture 364 00:15:42,560 --> 00:15:44,670 to the internet today. 365 00:15:44,670 --> 00:15:47,480 And in 1969, they finally connected initially two, 366 00:15:47,480 --> 00:15:48,860 and then four nodes. 367 00:15:48,860 --> 00:15:51,920 And they had to do the first demonstration. 368 00:15:51,920 --> 00:15:54,920 And to listen to Len Kleinrock tell the story, 369 00:15:54,920 --> 00:15:57,180 this was his story. 370 00:15:57,180 --> 00:16:00,080 He says that his group at UCLA tried to log into a computer 371 00:16:00,080 --> 00:16:02,360 at SRI, which is in Palo Alto. 372 00:16:02,360 --> 00:16:06,410 And he said, we set up a telephone connection between us 373 00:16:06,410 --> 00:16:07,850 and the guys at SRI. 374 00:16:07,850 --> 00:16:09,800 We typed the L, and we asked on the phone-- 375 00:16:09,800 --> 00:16:11,210 because they had to check whether it was working, 376 00:16:11,210 --> 00:16:12,920 so they had the phone to check. 377 00:16:12,920 --> 00:16:14,720 We asked on the phone, do you see the L? 378 00:16:14,720 --> 00:16:16,400 Yes, we see the L, came the response. 379 00:16:16,400 --> 00:16:18,680 We typed the O, and we asked, do you see the O? 380 00:16:18,680 --> 00:16:24,170 Yes, we see the O. And we typed the G, and the system crashed. 381 00:16:24,170 --> 00:16:26,510 But you know, they got something working. 382 00:16:26,510 --> 00:16:28,908 And of course, there's a nice statement here. 383 00:16:28,908 --> 00:16:30,950 You know, a lot of people worry about performance 384 00:16:30,950 --> 00:16:31,700 optimizations. 385 00:16:31,700 --> 00:16:34,700 But the most important optimization in a system 386 00:16:34,700 --> 00:16:37,940 is going from not working to working. 387 00:16:37,940 --> 00:16:43,910 And the fact that something worked is extremely important. 388 00:16:43,910 --> 00:16:47,870 Very soon after, they connected the East Coast-- 389 00:16:47,870 --> 00:16:51,200 a bunch of computers and organizations on the East Coast 390 00:16:51,200 --> 00:16:53,467 to the West Coast, MIT among them. 391 00:16:53,467 --> 00:16:55,550 So there was a team over at BBN, and a lot of them 392 00:16:55,550 --> 00:16:56,570 were from MIT. 393 00:16:56,570 --> 00:17:01,010 So MIT, BBN, Harvard, and Lincoln Labs on this side, 394 00:17:01,010 --> 00:17:04,035 and MITRE got connected over at Carnegie-- 395 00:17:04,035 --> 00:17:05,660 today, it's Carnegie Mellon University, 396 00:17:05,660 --> 00:17:08,140 I think it was called Carnegie Tech at the time-- 397 00:17:08,140 --> 00:17:11,990 University of Illinois, and then long lines across the country. 398 00:17:11,990 --> 00:17:16,160 There was a group of Utah and in California. 399 00:17:16,160 --> 00:17:19,220 Now, what were these links? 400 00:17:19,220 --> 00:17:21,740 Anyone want to guess-- these links across the country, 401 00:17:21,740 --> 00:17:26,460 or between Harvard and MIT, or across over there, 402 00:17:26,460 --> 00:17:28,700 do you think they actually went and put in new cables 403 00:17:28,700 --> 00:17:30,230 and laid these wires? 404 00:17:30,230 --> 00:17:32,156 What do you think they were? 405 00:17:32,156 --> 00:17:33,530 They were phone lines. 406 00:17:33,530 --> 00:17:37,310 And so this idea shows up over and over again. 407 00:17:37,310 --> 00:17:40,190 The ARPANET was essentially an overlay 408 00:17:40,190 --> 00:17:42,290 built on top of the telephone network. 409 00:17:42,290 --> 00:17:44,990 And in fact, it was a hostile overlay. 410 00:17:44,990 --> 00:17:47,545 Because the telephone network didn't really like-- 411 00:17:47,545 --> 00:17:48,920 I mean, at the time, they thought 412 00:17:48,920 --> 00:17:50,212 this was just an academic joke. 413 00:17:50,212 --> 00:17:53,570 But over time, it became clear that this underlying network 414 00:17:53,570 --> 00:17:56,880 was being used on top to do something different. 415 00:17:56,880 --> 00:17:58,760 And so there is an overlay network 416 00:17:58,760 --> 00:18:01,460 that's built-- and overlays show up again and again and again. 417 00:18:01,460 --> 00:18:04,415 It's just that they're not as hostile these days. 418 00:18:04,415 --> 00:18:08,120 Another example of an overlay is BitTorrent, 419 00:18:08,120 --> 00:18:10,080 or any peer-to-peer applications-- 420 00:18:10,080 --> 00:18:12,080 Skype, all of these things are overlays that are 421 00:18:12,080 --> 00:18:14,210 built on top of the internet. 422 00:18:14,210 --> 00:18:18,428 And in fact, a lot of the reason for their existence 423 00:18:18,428 --> 00:18:19,970 is because the internet doesn't quite 424 00:18:19,970 --> 00:18:22,012 do the right thing in terms of the right behavior 425 00:18:22,012 --> 00:18:23,370 for certain applications. 426 00:18:23,370 --> 00:18:27,200 So people say, let me go build an overlay on top of it, 427 00:18:27,200 --> 00:18:30,080 wherein you take a path involving 428 00:18:30,080 --> 00:18:32,690 multiple links on the underlying network 429 00:18:32,690 --> 00:18:36,300 and make it look like one link in the higher level network. 430 00:18:36,300 --> 00:18:38,510 And when you do that, you get an overlay network. 431 00:18:38,510 --> 00:18:40,770 So this single link on the ARPANET 432 00:18:40,770 --> 00:18:42,770 is actually many, many links with many switches, 433 00:18:42,770 --> 00:18:45,110 and who knows how expensive it is underlying in the telephone 434 00:18:45,110 --> 00:18:45,710 network? 435 00:18:45,710 --> 00:18:48,140 But all you have to do is to pay the telephone network 436 00:18:48,140 --> 00:18:50,750 some amount of money and make a call or whatever, 437 00:18:50,750 --> 00:18:52,880 and you get to view it as a single link. 438 00:18:52,880 --> 00:18:57,350 And you can do the same thing on the internet. 439 00:18:57,350 --> 00:19:00,260 Now, this protocol the routing protocol they used 440 00:19:00,260 --> 00:19:02,120 was a distance-vector routing protocol. 441 00:19:02,120 --> 00:19:04,740 It wasn't actually even as sophisticated as the one 442 00:19:04,740 --> 00:19:05,960 we studied. 443 00:19:05,960 --> 00:19:08,810 But it was a distance-vector protocol. 444 00:19:08,810 --> 00:19:12,770 And distance-vector was the first routing protocol 445 00:19:12,770 --> 00:19:15,020 ever used in a packet switched network. 446 00:19:15,020 --> 00:19:17,840 And it continued on the ARPANET for many years. 447 00:19:17,840 --> 00:19:20,120 They continued running this protocol. 448 00:19:20,120 --> 00:19:23,210 OK, moving on, we move from basic packet networks 449 00:19:23,210 --> 00:19:25,070 to this problem of internetworking. 450 00:19:25,070 --> 00:19:27,260 And that went through a series of demos. 451 00:19:27,260 --> 00:19:31,490 So one of them was they had a big conference in 1972. 452 00:19:31,490 --> 00:19:34,610 And they were demonstrating the simple packet switched ARPANET. 453 00:19:34,610 --> 00:19:37,040 And it worked really well except when they demonstrated it 454 00:19:37,040 --> 00:19:40,537 to a team from AT&T, and it didn't work at all. 455 00:19:40,537 --> 00:19:42,870 And in fact, there were news articles that were written. 456 00:19:42,870 --> 00:19:44,520 And some people wrote this was a nice network, 457 00:19:44,520 --> 00:19:45,937 some people wrote it never worked. 458 00:19:45,937 --> 00:19:48,690 And AT&T just thought, ah, bunch of academics, 459 00:19:48,690 --> 00:19:51,900 it's never really going to work. 460 00:19:51,900 --> 00:19:53,790 They wrote a modified email program. 461 00:19:53,790 --> 00:19:58,650 And the US was not the only place where work was going on. 462 00:19:58,650 --> 00:20:02,100 In France, there was a really good team building 463 00:20:02,100 --> 00:20:03,840 a network called CYCLADES. 464 00:20:03,840 --> 00:20:07,320 And Louis Pouzin was the principal investigator 465 00:20:07,320 --> 00:20:09,210 of that system. 466 00:20:09,210 --> 00:20:11,250 I think that CYCLADES doesn't get enough credit 467 00:20:11,250 --> 00:20:13,650 because often, as it is with these things, the winner 468 00:20:13,650 --> 00:20:14,550 kind of-- 469 00:20:14,550 --> 00:20:17,400 ARPANET became the internet, and so sort of everybody 470 00:20:17,400 --> 00:20:18,600 forgot everything else. 471 00:20:18,600 --> 00:20:21,810 But CYCLADES actually came up with some pretty interesting, 472 00:20:21,810 --> 00:20:23,340 groundbreaking ideas. 473 00:20:23,340 --> 00:20:26,220 The idea of articulating that this network is going 474 00:20:26,220 --> 00:20:29,310 to be a best-effort network with these packets 475 00:20:29,310 --> 00:20:30,870 that they called datagrams, which 476 00:20:30,870 --> 00:20:33,480 is a word that continues to be used to this day, 477 00:20:33,480 --> 00:20:36,180 was in this French network. 478 00:20:36,180 --> 00:20:40,348 They originated the sliding window protocol. 479 00:20:40,348 --> 00:20:41,640 It looks obvious, but it's not. 480 00:20:41,640 --> 00:20:43,440 You can see there's lots of subtleties 481 00:20:43,440 --> 00:20:45,240 in how you build such a protocol and how 482 00:20:45,240 --> 00:20:46,560 you argue that it's correct. 483 00:20:46,560 --> 00:20:48,750 The first sliding window protocol was in CYCLADES. 484 00:20:48,750 --> 00:20:52,470 And TCP, which today is the world standard, 485 00:20:52,470 --> 00:20:54,860 used a very, very similar idea. 486 00:20:54,860 --> 00:20:56,610 And they also use distance-vector routing. 487 00:20:56,610 --> 00:20:58,820 And they also implemented, for the first time, 488 00:20:58,820 --> 00:21:01,550 a way to synchronize time between computers. 489 00:21:01,550 --> 00:21:03,300 And they had a number of interesting ideas 490 00:21:03,300 --> 00:21:06,390 in this network. 491 00:21:06,390 --> 00:21:08,670 The work was not just being done in the wide area. 492 00:21:08,670 --> 00:21:11,940 In 1973, ethernet was invented at Xerox PARC 493 00:21:11,940 --> 00:21:15,540 by a team that included Bob Metcalfe, who 494 00:21:15,540 --> 00:21:18,070 was another alumnus from MIT. 495 00:21:18,070 --> 00:21:21,480 That was inspired by this Aloha protocol that we studied. 496 00:21:21,480 --> 00:21:24,180 And ethernet was essentially Aloha 497 00:21:24,180 --> 00:21:25,950 with carrier-sense multiple access, 498 00:21:25,950 --> 00:21:28,110 very similar to what we did study. 499 00:21:28,110 --> 00:21:30,240 This idea of contention windows is a new idea. 500 00:21:30,240 --> 00:21:32,460 They actually used the probability method. 501 00:21:32,460 --> 00:21:34,890 Ethernet standard evolved in the late '70s and '80s 502 00:21:34,890 --> 00:21:37,140 to use the contention window that we now know. 503 00:21:37,140 --> 00:21:39,510 And that same contention window idea and carrier-sense 504 00:21:39,510 --> 00:21:41,040 is used in Wi-Fi. 505 00:21:41,040 --> 00:21:43,770 So you can draw this stream of ideas 506 00:21:43,770 --> 00:21:46,620 through that continue to exist to this day. 507 00:21:46,620 --> 00:21:48,720 It's interesting that ethernet today doesn't use 508 00:21:48,720 --> 00:21:50,700 carrier-sense multiple access. 509 00:21:50,700 --> 00:21:53,760 Because ethernet today is no longer a slow speed network, 510 00:21:53,760 --> 00:21:55,570 it's a very fast network. 511 00:21:55,570 --> 00:21:58,770 It's not a shared bus, it's point-to-point links. 512 00:21:58,770 --> 00:22:01,350 But it's called ethernet, and it doesn't 513 00:22:01,350 --> 00:22:03,270 use the same MAC protocol other than when 514 00:22:03,270 --> 00:22:04,980 you have low-speed ethernet. 515 00:22:04,980 --> 00:22:09,480 On the other hand, wireless uses the idea from ethernet. 516 00:22:09,480 --> 00:22:10,870 And in fact, a lot of people call 517 00:22:10,870 --> 00:22:13,890 802.11-- they used to call it wireless ethernet. 518 00:22:13,890 --> 00:22:17,880 And the ideas just got moved to a different domain, 519 00:22:17,880 --> 00:22:18,990 but it's the same ideas. 520 00:22:18,990 --> 00:22:25,000 And in fact, a lot of the early chipsets 521 00:22:25,000 --> 00:22:28,360 that ran the MAC protocol on 802.11 networks 522 00:22:28,360 --> 00:22:30,640 essentially were the same as the ethernet protocol. 523 00:22:30,640 --> 00:22:32,038 They use the ethernet MAC. 524 00:22:32,038 --> 00:22:33,580 They had that piece of hardware, they 525 00:22:33,580 --> 00:22:35,288 would buy it and build the box around it. 526 00:22:35,288 --> 00:22:38,390 So this idea of taking older technology, 527 00:22:38,390 --> 00:22:41,650 and applying it to a new context, and then modifying it 528 00:22:41,650 --> 00:22:43,310 is something that works pretty well. 529 00:22:43,310 --> 00:22:45,910 Because it means that you can leverage something 530 00:22:45,910 --> 00:22:48,470 that already exists and start making changes to it. 531 00:22:48,470 --> 00:22:51,220 And over time, it looks completely different. 532 00:22:53,980 --> 00:22:57,010 Now, the US government and DARPA-- 533 00:22:57,010 --> 00:22:58,647 ARPA was funding the ARPANET. 534 00:22:58,647 --> 00:23:00,730 But there were companies and other research groups 535 00:23:00,730 --> 00:23:01,623 in the mix here. 536 00:23:01,623 --> 00:23:03,040 And in those days, it was not very 537 00:23:03,040 --> 00:23:04,525 clear what was going to win. 538 00:23:04,525 --> 00:23:06,400 And everybody was doing research on coming up 539 00:23:06,400 --> 00:23:09,500 with different ways of connecting networks together. 540 00:23:09,500 --> 00:23:13,437 And Xerox had a system called PUP. 541 00:23:13,437 --> 00:23:14,770 I don't know what it stands for. 542 00:23:14,770 --> 00:23:16,450 I think it stands for the PARC-- 543 00:23:16,450 --> 00:23:18,070 Xerox PARC, Palo Alto Research Center, 544 00:23:18,070 --> 00:23:19,750 PARC-something protocol. 545 00:23:19,750 --> 00:23:21,700 I don't know what the U is. 546 00:23:21,700 --> 00:23:26,260 And in a way, there were many technical ideas 547 00:23:26,260 --> 00:23:28,720 in the Xerox system that actually 548 00:23:28,720 --> 00:23:31,030 were arguably better in technical terms 549 00:23:31,030 --> 00:23:33,340 from the ARPANET and TCP/IP. 550 00:23:33,340 --> 00:23:39,354 But it was proprietary, whereas TCP/IP was completely open. 551 00:23:39,354 --> 00:23:42,550 And open meant that you didn't have to pay anyone, 552 00:23:42,550 --> 00:23:44,770 you didn't have to get someone's permission to do it. 553 00:23:44,770 --> 00:23:46,750 The process by which things were standardized 554 00:23:46,750 --> 00:23:48,940 was far more open and democratic. 555 00:23:48,940 --> 00:23:51,190 And it won not because it was better, 556 00:23:51,190 --> 00:23:56,120 but because it was out there and open and free. 557 00:23:56,120 --> 00:23:58,030 There's a lot to be said for that model. 558 00:23:58,030 --> 00:24:00,700 Because for a network to succeed, 559 00:24:00,700 --> 00:24:02,590 you need to lower the barrier of entry 560 00:24:02,590 --> 00:24:05,870 so everybody can participate and implement it. 561 00:24:05,870 --> 00:24:08,200 And if you make network protocols proprietary, 562 00:24:08,200 --> 00:24:11,560 it usually ends up not benefiting anybody. 563 00:24:11,560 --> 00:24:14,620 So now, I think companies have started to realize that. 564 00:24:14,620 --> 00:24:16,090 So everybody understands that you 565 00:24:16,090 --> 00:24:18,460 want to make standards open, and then 566 00:24:18,460 --> 00:24:21,760 keep secret any particular implementation strategy for how 567 00:24:21,760 --> 00:24:22,920 you implement it. 568 00:24:22,920 --> 00:24:27,520 So you might gain commercial advantage from implementation, 569 00:24:27,520 --> 00:24:29,380 but you gain no commercial advantage 570 00:24:29,380 --> 00:24:32,410 from keeping a protocol closed. 571 00:24:32,410 --> 00:24:33,850 There are exceptions to this rule. 572 00:24:33,850 --> 00:24:35,650 Like, Skype is an exception to this rule. 573 00:24:35,650 --> 00:24:37,300 But who knows? 574 00:24:37,300 --> 00:24:40,420 In 5 or 10 years, I suspect that Skype is not 575 00:24:40,420 --> 00:24:42,077 likely to remain dominant. 576 00:24:42,077 --> 00:24:44,410 There are going to be other things that will come about. 577 00:24:44,410 --> 00:24:47,220 And some of them might be open. 578 00:24:52,672 --> 00:24:57,020 In the mid-1970s, this idea that you now really start 579 00:24:57,020 --> 00:24:59,300 to connect many different kinds of networks together, 580 00:24:59,300 --> 00:25:01,490 networks that are being run in different organizations, 581 00:25:01,490 --> 00:25:01,990 took root. 582 00:25:01,990 --> 00:25:04,160 And this was the internetworking problem. 583 00:25:04,160 --> 00:25:06,230 And this is the problem-- 584 00:25:06,230 --> 00:25:08,522 people were working on this packet switch technologies. 585 00:25:08,522 --> 00:25:11,063 And there were many different kinds of packet switch networks 586 00:25:11,063 --> 00:25:11,690 that showed up. 587 00:25:11,690 --> 00:25:14,210 So there was the Aloha network over in Hawaii. 588 00:25:14,210 --> 00:25:16,580 There were people building packet switched networks out 589 00:25:16,580 --> 00:25:17,990 of ethernet. 590 00:25:17,990 --> 00:25:19,932 At MIT and Cambridge University, there 591 00:25:19,932 --> 00:25:21,890 were people who were very enamored of something 592 00:25:21,890 --> 00:25:23,640 called Token Ring. 593 00:25:23,640 --> 00:25:26,210 I don't know if Victor was at MIT at the time, 594 00:25:26,210 --> 00:25:28,250 or any of my colleagues were, but people 595 00:25:28,250 --> 00:25:30,800 were building these Token Ring-based systems that 596 00:25:30,800 --> 00:25:33,140 were technically pretty superior in some respects 597 00:25:33,140 --> 00:25:34,460 and interesting. 598 00:25:34,460 --> 00:25:36,560 And so there were many different kinds of networks 599 00:25:36,560 --> 00:25:39,680 that people were building and connecting their own campuses 600 00:25:39,680 --> 00:25:40,520 internally. 601 00:25:40,520 --> 00:25:42,980 And you had to communicate between each other. 602 00:25:42,980 --> 00:25:45,630 The trouble was, there was no single protocol to do this. 603 00:25:45,630 --> 00:25:47,570 So ethernet had-- back in the day, 604 00:25:47,570 --> 00:25:49,190 when you bought an ethernet technology 605 00:25:49,190 --> 00:25:51,440 from say, Digital Equipment or Xerox or one 606 00:25:51,440 --> 00:25:54,080 of these companies, you wouldn't just get ethernet. 607 00:25:54,080 --> 00:25:56,270 You'd get the ethernet MAC protocol. 608 00:25:56,270 --> 00:25:58,880 Then you'd get some sort of network communication, 609 00:25:58,880 --> 00:26:02,120 network layer between the different ethernet devices. 610 00:26:02,120 --> 00:26:03,740 And you'd get something called EFTP, 611 00:26:03,740 --> 00:26:06,200 which was an internet file transport protocol. 612 00:26:06,200 --> 00:26:07,740 So you'd get applications around it. 613 00:26:07,740 --> 00:26:09,230 So imagine now, you're buying a network thing. 614 00:26:09,230 --> 00:26:11,330 And you don't get to run your own applications, 615 00:26:11,330 --> 00:26:12,780 you get a stack of everything. 616 00:26:12,780 --> 00:26:14,990 And you get a box, and you only get 617 00:26:14,990 --> 00:26:17,570 to use whatever the vendor gave you. 618 00:26:17,570 --> 00:26:20,870 That was the state of networking at that time. 619 00:26:20,870 --> 00:26:23,308 And people recognized this probably 620 00:26:23,308 --> 00:26:24,350 wasn't a very good thing. 621 00:26:24,350 --> 00:26:28,040 Because what you would like is to have a network where people 622 00:26:28,040 --> 00:26:30,140 can come up and invent their own applications 623 00:26:30,140 --> 00:26:32,295 and run their own applications on it. 624 00:26:32,295 --> 00:26:33,920 But you now needed a way to communicate 625 00:26:33,920 --> 00:26:36,090 between these different networks. 626 00:26:36,090 --> 00:26:38,450 So how do you do this? 627 00:26:38,450 --> 00:26:42,230 So this was a huge project that a lot 628 00:26:42,230 --> 00:26:44,780 of different organizations were involved in. 629 00:26:44,780 --> 00:26:47,720 But a large part of the credit is given to two people-- 630 00:26:47,720 --> 00:26:50,960 Vint Cerf and Bob Kahn, who were in some 631 00:26:50,960 --> 00:26:56,150 sense the lead people in getting a community of other people 632 00:26:56,150 --> 00:26:57,860 together in building the system. 633 00:26:57,860 --> 00:27:00,110 And they articulated these visions and these ideas. 634 00:27:00,110 --> 00:27:03,420 So Kahn's rules of interconnection are as follows. 635 00:27:03,420 --> 00:27:05,870 He first said that each network is independent 636 00:27:05,870 --> 00:27:07,080 and must not change. 637 00:27:07,080 --> 00:27:09,200 So the idea that you can bring networks together 638 00:27:09,200 --> 00:27:12,800 and communicate, if it required every network to change, 639 00:27:12,800 --> 00:27:15,570 that wasn't a palatable idea. 640 00:27:15,570 --> 00:27:18,020 The second is that he agreed with CYCLADES and said, 641 00:27:18,020 --> 00:27:19,970 best-effort communication is what we need. 642 00:27:19,970 --> 00:27:22,760 Because we cannot assume that every network will guarantee 643 00:27:22,760 --> 00:27:23,690 delivery. 644 00:27:23,690 --> 00:27:26,480 There are some networks that may guarantee delivery in order, 645 00:27:26,480 --> 00:27:29,060 but you can't mandate that. 646 00:27:29,060 --> 00:27:32,030 And what they said was, we will design this network 647 00:27:32,030 --> 00:27:34,340 with these boxes that we'll call gateways. 648 00:27:34,340 --> 00:27:35,900 And these gateways will translate 649 00:27:35,900 --> 00:27:38,000 between different network protocols. 650 00:27:38,000 --> 00:27:40,550 And in a pretty radical departure 651 00:27:40,550 --> 00:27:42,590 from the Bell Telephone network, they 652 00:27:42,590 --> 00:27:46,670 said that there will be no central global management 653 00:27:46,670 --> 00:27:47,360 control. 654 00:27:47,360 --> 00:27:49,850 There is no central place where the operation 655 00:27:49,850 --> 00:27:51,890 of this worldwide network or countrywide network 656 00:27:51,890 --> 00:27:53,402 is going to be managed. 657 00:27:53,402 --> 00:27:54,860 So it's kind of a simple idea-- you 658 00:27:54,860 --> 00:27:56,540 have your own internal network. 659 00:27:56,540 --> 00:27:59,480 This might be an ethernet, this might be some sort of Aloha 660 00:27:59,480 --> 00:28:01,010 network or what have you. 661 00:28:01,010 --> 00:28:04,130 But you have these gateways here that sit and translate 662 00:28:04,130 --> 00:28:05,870 between these different protocols. 663 00:28:05,870 --> 00:28:10,280 And we know the stuff as-- 664 00:28:10,280 --> 00:28:14,270 we now know that what they did was a pretty good decision, 665 00:28:14,270 --> 00:28:16,940 which is they made it so that these gateways will all 666 00:28:16,940 --> 00:28:19,400 agree on one protocol. 667 00:28:19,400 --> 00:28:21,530 And the protocol they standardized-- 668 00:28:21,530 --> 00:28:24,320 they got it wrong initially, but by the late '70s, 669 00:28:24,320 --> 00:28:27,560 they figured out that that protocol will be called 670 00:28:27,560 --> 00:28:29,330 IP, or the internet protocol. 671 00:28:29,330 --> 00:28:32,360 So a node is on the internet if it implements 672 00:28:32,360 --> 00:28:33,950 the internet protocol, which means 673 00:28:33,950 --> 00:28:38,270 it has a agreed-upon plan for how the addressing of nodes 674 00:28:38,270 --> 00:28:41,630 works, and it has a plan for what happens when you forward 675 00:28:41,630 --> 00:28:42,387 a packet. 676 00:28:42,387 --> 00:28:43,970 You have a look-up and a routing table 677 00:28:43,970 --> 00:28:47,270 that looks up the IP address and then decides on the link. 678 00:28:47,270 --> 00:28:49,430 And that's all you have to agree upon. 679 00:28:49,430 --> 00:28:51,620 So to be on the internet, all you have to do 680 00:28:51,620 --> 00:28:55,040 is-- a network has to support IP addressing, 681 00:28:55,040 --> 00:28:58,160 and it has to agree that it will send packets of at least 682 00:28:58,160 --> 00:29:01,760 20 bytes in size, because that's the length of the IP header. 683 00:29:01,760 --> 00:29:04,130 There's very little else that it has to do, 684 00:29:04,130 --> 00:29:06,080 so much so that people have written standards 685 00:29:06,080 --> 00:29:09,050 on how you can send internet protocol over, you know, 686 00:29:09,050 --> 00:29:09,975 carrier pigeon. 687 00:29:09,975 --> 00:29:12,350 And you can-- and in fact, someone demonstrated something 688 00:29:12,350 --> 00:29:14,065 like this, where they had these things, 689 00:29:14,065 --> 00:29:15,440 and these pigeons were delivering 690 00:29:15,440 --> 00:29:18,020 these scraps of paper, and there was something looking it 691 00:29:18,020 --> 00:29:19,700 up and sending it on. 692 00:29:19,700 --> 00:29:22,090 So it doesn't take much to be on the internet. 693 00:29:26,220 --> 00:29:30,330 So Cerf and Kahn started then designing the network. 694 00:29:30,330 --> 00:29:33,690 And they wrote in their original paper 695 00:29:33,690 --> 00:29:35,640 that you needed to identify the network you're 696 00:29:35,640 --> 00:29:38,160 in, and within the network, a host that you were in. 697 00:29:38,160 --> 00:29:43,710 And they said the choice of network identification 698 00:29:43,710 --> 00:29:46,335 allows for up to 256 distinct networks. 699 00:29:46,335 --> 00:29:48,210 Like, how many networks do you possibly need? 700 00:29:48,210 --> 00:29:50,960 How many organizations can you possibly have? 701 00:29:50,960 --> 00:29:53,910 And they wrote, you know, famous last words-- this size 702 00:29:53,910 --> 00:29:56,863 seems sufficient for the foreseeable future. 703 00:29:56,863 --> 00:29:58,530 The problem is they were slightly wrong. 704 00:29:58,530 --> 00:30:00,060 The foreseeable future in their case 705 00:30:00,060 --> 00:30:02,670 was probably less than 10 years, and it may not have been more 706 00:30:02,670 --> 00:30:03,900 than five or six years. 707 00:30:03,900 --> 00:30:06,660 But you know, they made a mistake. 708 00:30:06,660 --> 00:30:10,583 But what was interesting was, the next time the community got 709 00:30:10,583 --> 00:30:12,000 to make a change in that decision, 710 00:30:12,000 --> 00:30:13,083 they still made a mistake. 711 00:30:13,083 --> 00:30:16,920 They decided that 32-bit packet IP addresses are enough. 712 00:30:16,920 --> 00:30:19,470 And we've run out. 713 00:30:19,470 --> 00:30:24,150 We literally ran out, right now, of IP addresses. 714 00:30:24,150 --> 00:30:26,730 So they had these gateways that would translate, 715 00:30:26,730 --> 00:30:28,710 and you would run the internetworking protocol, 716 00:30:28,710 --> 00:30:29,760 or IP. 717 00:30:29,760 --> 00:30:33,630 So in the 1970s, this idea of internetworking 718 00:30:33,630 --> 00:30:34,990 was all the rage. 719 00:30:34,990 --> 00:30:39,930 And in 1978, there was a really good decision 720 00:30:39,930 --> 00:30:42,450 made to split TCP from IP. 721 00:30:42,450 --> 00:30:46,260 And a lot of that motivation was from a group of people at MIT. 722 00:30:46,260 --> 00:30:48,990 There's a paper here that you'll study at length in 6.033. 723 00:30:48,990 --> 00:30:50,490 It's one of these papers that you'll 724 00:30:50,490 --> 00:30:53,957 study two or three times, because it'll keep coming back, 725 00:30:53,957 --> 00:30:55,790 because these concepts are pretty important. 726 00:30:55,790 --> 00:30:57,990 It's called "End-to-End Arguments in System Design" 727 00:30:57,990 --> 00:31:01,200 by Saltzer, Reed, and Clark. 728 00:31:01,200 --> 00:31:03,390 They have many examples, but the gist 729 00:31:03,390 --> 00:31:07,830 of the end-to-end arguments is that if you have a system, 730 00:31:07,830 --> 00:31:11,850 like let's say a network, and you 731 00:31:11,850 --> 00:31:14,310 want to be able to design a network, 732 00:31:14,310 --> 00:31:16,650 and you have to make a decision of what features 733 00:31:16,650 --> 00:31:19,170 do you put into the network? 734 00:31:19,170 --> 00:31:22,770 The end-to-end arguments say that you only 735 00:31:22,770 --> 00:31:24,300 put in features in the network that 736 00:31:24,300 --> 00:31:27,570 are absolutely essential for the working of the system. 737 00:31:27,570 --> 00:31:30,270 Anything else that's not crucial to the working of the system, 738 00:31:30,270 --> 00:31:33,310 you leave to the endpoints. 739 00:31:33,310 --> 00:31:36,150 So if you think about reliability as a goal-- 740 00:31:36,150 --> 00:31:39,390 like, does the network need to put in a mechanism 741 00:31:39,390 --> 00:31:42,510 to guarantee the delivery of packets, the answer is no. 742 00:31:42,510 --> 00:31:46,502 The reason is is that that property is required, 743 00:31:46,502 --> 00:31:48,210 for example, if you're delivering a file, 744 00:31:48,210 --> 00:31:51,180 but not if you're delivering a video stream or talking. 745 00:31:51,180 --> 00:31:53,440 Because not every byte needs to get there, 746 00:31:53,440 --> 00:31:56,418 which means you don't put that functionality 747 00:31:56,418 --> 00:31:57,210 inside the network. 748 00:31:57,210 --> 00:31:59,280 You leave the function of achieving reliability 749 00:31:59,280 --> 00:32:02,610 to the endpoints, because not everybody needs it. 750 00:32:02,610 --> 00:32:04,890 And the only exception to the rule 751 00:32:04,890 --> 00:32:08,550 that the only function you put inside of the network 752 00:32:08,550 --> 00:32:11,220 is functions that are absolutely essential for the system 753 00:32:11,220 --> 00:32:13,920 to work is if the mechanism leads 754 00:32:13,920 --> 00:32:16,120 to significant improvements in performance. 755 00:32:16,120 --> 00:32:19,770 So for example, if you run on a network with a 20% packet loss 756 00:32:19,770 --> 00:32:22,290 rate, it makes sense to have some degree 757 00:32:22,290 --> 00:32:26,130 of reliability and retransmission built 758 00:32:26,130 --> 00:32:27,660 on a network hub. 759 00:32:27,660 --> 00:32:29,280 Like, that's what Wi-Fi would do. 760 00:32:29,280 --> 00:32:30,750 Because if you didn't do it, you'd 761 00:32:30,750 --> 00:32:33,540 have sometimes a 20% or 30% packet loss rate. 762 00:32:33,540 --> 00:32:36,700 And that would make everything not work. 763 00:32:36,700 --> 00:32:39,010 But we don't try to design our network 764 00:32:39,010 --> 00:32:41,500 so that between the Wi-Fi access point and your computer, 765 00:32:41,500 --> 00:32:44,970 we produce perfectly reliable transmission. 766 00:32:44,970 --> 00:32:47,040 If you did that, it would then mean that you 767 00:32:47,040 --> 00:32:49,530 would have really long delays. 768 00:32:49,530 --> 00:32:53,250 And you would be providing that function for applications 769 00:32:53,250 --> 00:32:54,310 that don't need it. 770 00:32:54,310 --> 00:32:56,220 If I want an application that I would 771 00:32:56,220 --> 00:32:57,720 like to just send the bytes through, 772 00:32:57,720 --> 00:33:00,450 if it gets through, great, if it doesn't, then I'll 773 00:33:00,450 --> 00:33:03,300 do something else, that's a bad network design. 774 00:33:03,300 --> 00:33:05,280 Unfortunately, there are real networks today 775 00:33:05,280 --> 00:33:06,910 that don't obey this principle. 776 00:33:06,910 --> 00:33:10,620 cellular networks are sometimes problematic, 777 00:33:10,620 --> 00:33:13,515 like Verizon or AT&T or something. 778 00:33:13,515 --> 00:33:15,810 You find there in real data that there are 779 00:33:15,810 --> 00:33:17,700 long delays in these networks. 780 00:33:17,700 --> 00:33:21,460 Because between the cellular base station and your phone, 781 00:33:21,460 --> 00:33:23,580 they have decided to provide something that 782 00:33:23,580 --> 00:33:27,150 looks like highly reliable TCP. 783 00:33:27,150 --> 00:33:28,590 It's kind of a bad network design, 784 00:33:28,590 --> 00:33:31,630 but that's how they do it, some of them. 785 00:33:31,630 --> 00:33:34,560 And so this is an old principle, but it's sometimes not 786 00:33:34,560 --> 00:33:38,900 followed, and that's not so good. 787 00:33:38,900 --> 00:33:43,370 Now, the reason why packet switching and this TCP/IP split 788 00:33:43,370 --> 00:33:46,550 won in the internet compared to various other proposals that 789 00:33:46,550 --> 00:33:49,400 were floating around at the time is 790 00:33:49,400 --> 00:33:51,830 that this architecture, the internet architecture, 791 00:33:51,830 --> 00:33:56,730 is good enough for everything, but optimal for nothing. 792 00:33:56,730 --> 00:33:59,780 There is really no application for which 793 00:33:59,780 --> 00:34:02,810 the design of the internet network infrastructure 794 00:34:02,810 --> 00:34:04,370 is optimal. 795 00:34:04,370 --> 00:34:07,022 If you wanted to build a network to support voice, 796 00:34:07,022 --> 00:34:08,480 you'd go build a telephone network. 797 00:34:08,480 --> 00:34:10,380 You wouldn't build a network that looks like this. 798 00:34:10,380 --> 00:34:12,830 If you wanted to build a network to distribute television 799 00:34:12,830 --> 00:34:15,380 data to television streams to a bunch of people, 800 00:34:15,380 --> 00:34:16,820 you wouldn't build the internet. 801 00:34:16,820 --> 00:34:18,362 If you wanted to build a network that 802 00:34:18,362 --> 00:34:20,750 wanted to support Facebook and nothing else, 803 00:34:20,750 --> 00:34:22,949 you probably wouldn't build the internet. 804 00:34:22,949 --> 00:34:25,520 But if you want a network that's going to support all those 805 00:34:25,520 --> 00:34:29,480 applications reasonably well, including applications that you 806 00:34:29,480 --> 00:34:33,350 cannot imagine today, this design is a very good idea 807 00:34:33,350 --> 00:34:34,985 because it's very minimalist. 808 00:34:34,985 --> 00:34:36,860 There's almost nothing that the network does. 809 00:34:36,860 --> 00:34:39,060 It leaves everything to the endpoint. 810 00:34:39,060 --> 00:34:41,480 So I would say in fact that the most useful lesson, which 811 00:34:41,480 --> 00:34:42,920 you will apply over and over again-- 812 00:34:42,920 --> 00:34:44,719 I mean, let's say you go work at a company, 813 00:34:44,719 --> 00:34:47,810 or work on some sort of research project. 814 00:34:47,810 --> 00:34:50,150 And at various stages of the project, 815 00:34:50,150 --> 00:34:54,500 there are endless discussions on what you need to do-- 816 00:34:54,500 --> 00:34:57,020 whether it's worth doing something or not doing. 817 00:34:57,020 --> 00:35:00,770 And the most important lesson that you can take away 818 00:35:00,770 --> 00:35:04,280 in system design is that when faced with a choice, 819 00:35:04,280 --> 00:35:07,250 try to make a choice that's the simplest possible choice that 820 00:35:07,250 --> 00:35:08,690 gets the job done. 821 00:35:08,690 --> 00:35:11,960 Because most likely, if you get the application 822 00:35:11,960 --> 00:35:14,600 wrong of whatever it is you're building, 823 00:35:14,600 --> 00:35:16,567 if it's simple and minimalist, you 824 00:35:16,567 --> 00:35:18,650 could probably pivot around and use the same thing 825 00:35:18,650 --> 00:35:22,160 for that other application. 826 00:35:22,160 --> 00:35:26,150 So there's a famous set of quotations here. 827 00:35:26,150 --> 00:35:28,930 One should always architect systems for flexibility 828 00:35:28,930 --> 00:35:31,120 because you'll naturally never-- almost never know 829 00:35:31,120 --> 00:35:33,567 when your design-- everybody has these use cases in mind. 830 00:35:33,567 --> 00:35:35,650 But let's face it, when you're at the early stages 831 00:35:35,650 --> 00:35:38,410 of a project, nobody actually knows. 832 00:35:38,410 --> 00:35:40,040 And you'll almost always get it wrong. 833 00:35:40,040 --> 00:35:42,490 So it's important to architect them for flexibility, not 834 00:35:42,490 --> 00:35:46,090 for performance, not for lots of functionality, 835 00:35:46,090 --> 00:35:48,340 just for being flexible, and the bare minimum 836 00:35:48,340 --> 00:35:52,870 to get the job done based on what you think is necessary. 837 00:35:52,870 --> 00:35:54,700 And it usually means doing that even if it 838 00:35:54,700 --> 00:35:56,440 means sacrificing performance. 839 00:35:56,440 --> 00:35:58,660 Like I said, the most important improvement 840 00:35:58,660 --> 00:36:01,580 in the system's performance is getting it to work. 841 00:36:01,580 --> 00:36:03,352 Everything else is secondary. 842 00:36:03,352 --> 00:36:05,310 So there's a nice quote here, I don't know if-- 843 00:36:05,310 --> 00:36:08,340 any French speakers or readers? 844 00:36:08,340 --> 00:36:09,921 Yes? 845 00:36:09,921 --> 00:36:11,470 AUDIENCE: [SPEAKING FRENCH] 846 00:36:11,470 --> 00:36:11,800 PROFESSOR: Yeah, I know. 847 00:36:11,800 --> 00:36:12,900 Just tell me in English. 848 00:36:12,900 --> 00:36:16,720 [LAUGHTER] 849 00:36:16,720 --> 00:36:18,230 Good enough, all right. 850 00:36:18,230 --> 00:36:22,474 AUDIENCE: Seems that perfection isn't 851 00:36:22,474 --> 00:36:26,670 attained when there's nothing to add, but there's nothing to-- 852 00:36:26,670 --> 00:36:27,920 PROFESSOR: Yeah, that's great. 853 00:36:27,920 --> 00:36:30,337 Yeah, perfection is achieved not when there's nothing more 854 00:36:30,337 --> 00:36:32,420 to add, but there's nothing left to take away. 855 00:36:32,420 --> 00:36:35,000 And this is a really, really good lesson. 856 00:36:35,000 --> 00:36:36,500 I mean, you guys, every one of you 857 00:36:36,500 --> 00:36:39,810 is going to go into the real world, 858 00:36:39,810 --> 00:36:42,920 either at a startup company or a big company 859 00:36:42,920 --> 00:36:44,570 where you're defining a new product, 860 00:36:44,570 --> 00:36:47,900 or a research project, you go to graduate school-- 861 00:36:47,900 --> 00:36:51,770 at the beginning, you won't know the right answers. 862 00:36:51,770 --> 00:36:53,180 You have some vague ideas of what 863 00:36:53,180 --> 00:36:55,875 it's useful for when you design anything. 864 00:36:55,875 --> 00:36:57,500 And it's really important to understand 865 00:36:57,500 --> 00:36:59,625 that you should do the bare minimum to get it work. 866 00:36:59,625 --> 00:37:03,230 And it's a really, really good idea-- 867 00:37:03,230 --> 00:37:04,623 less is more. 868 00:37:04,623 --> 00:37:06,290 I have a very simple way to think of it. 869 00:37:06,290 --> 00:37:08,510 I tell my students this repeatedly-- when 870 00:37:08,510 --> 00:37:10,220 in doubt, just leave it out. 871 00:37:10,220 --> 00:37:12,470 If you're not sure if you need it or not, don't do it. 872 00:37:12,470 --> 00:37:14,180 There's enough stuff to do. 873 00:37:14,180 --> 00:37:16,280 And that's probably the most important lesson 874 00:37:16,280 --> 00:37:19,400 from many of the classes, at least on the system side 875 00:37:19,400 --> 00:37:21,110 that you'll be learning. 876 00:37:21,110 --> 00:37:23,600 Of course, it takes a lot of good taste and insight 877 00:37:23,600 --> 00:37:26,660 and intuition to figure out what's really important. 878 00:37:26,660 --> 00:37:29,170 I can't help you there. 879 00:37:29,170 --> 00:37:33,350 OK, so by the 1980s, the internet started to grow up. 880 00:37:33,350 --> 00:37:37,100 And the way in which you wanted to handle growth 881 00:37:37,100 --> 00:37:40,010 was this simple but brilliant idea 882 00:37:40,010 --> 00:37:41,420 called topological addressing. 883 00:37:41,420 --> 00:37:44,990 So I'm going to explain what that means. 884 00:37:44,990 --> 00:37:47,030 In the very early days of the internet, 885 00:37:47,030 --> 00:37:50,600 and including the simple small networks that we studied, 886 00:37:50,600 --> 00:37:54,350 every network node had a network identifier-- 887 00:37:54,350 --> 00:37:57,420 an IP address or some sort of a name for that node. 888 00:37:57,420 --> 00:37:59,540 So in the way in which we looked at it, 889 00:37:59,540 --> 00:38:02,600 nodes would have names like A, B, C, D, and E. 890 00:38:02,600 --> 00:38:04,400 But in reality, A, B, C, of course, 891 00:38:04,400 --> 00:38:06,350 there are some set of bits that communicated. 892 00:38:06,350 --> 00:38:07,850 And in the old days of the internet, 893 00:38:07,850 --> 00:38:10,430 you would have a two-phase identifier. 894 00:38:10,430 --> 00:38:15,020 You'd have a sort of a network identifier or an organization 895 00:38:15,020 --> 00:38:15,590 identifier. 896 00:38:15,590 --> 00:38:18,020 So MIT would have a set of 8 bits, 897 00:38:18,020 --> 00:38:20,570 and then you would have a set of other bits here 898 00:38:20,570 --> 00:38:24,560 that communicated within MIT what that number meant. 899 00:38:24,560 --> 00:38:26,540 So just abstractly, these numbers 900 00:38:26,540 --> 00:38:28,250 meant nothing in the global internet. 901 00:38:28,250 --> 00:38:31,850 This could just be 110111-something. 902 00:38:31,850 --> 00:38:35,960 And then you would have another sequence of something else. 903 00:38:35,960 --> 00:38:37,880 That was the basic idea. 904 00:38:37,880 --> 00:38:40,445 Now, in the networks we studied so far, 905 00:38:40,445 --> 00:38:41,570 we wouldn't even have this. 906 00:38:41,570 --> 00:38:45,380 Every network node would just have some name. 907 00:38:45,380 --> 00:38:48,380 So what that meant is you could have a network address that 908 00:38:48,380 --> 00:38:49,502 was some set of bits. 909 00:38:49,502 --> 00:38:50,960 I could have a network address that 910 00:38:50,960 --> 00:38:52,790 was some other set of bits. 911 00:38:52,790 --> 00:38:56,270 And the switches in the network, in order 912 00:38:56,270 --> 00:38:59,340 to forward packets to you or to me, 913 00:38:59,340 --> 00:39:03,260 would have to have entries in the routing tables that 914 00:39:03,260 --> 00:39:06,468 were one-to-one with all of the different nodes 915 00:39:06,468 --> 00:39:08,010 that they wanted to communicate with. 916 00:39:08,010 --> 00:39:09,590 So you would have a routing table 917 00:39:09,590 --> 00:39:11,420 that would be essentially one entry 918 00:39:11,420 --> 00:39:14,120 for every host in the network, which doesn't scale. 919 00:39:14,120 --> 00:39:16,580 It's just too much information. 920 00:39:16,580 --> 00:39:19,460 So topological addressing is the idea 921 00:39:19,460 --> 00:39:23,040 that per-node routing entries don't scale very well. 922 00:39:23,040 --> 00:39:25,760 So what you would like to do is organize the network 923 00:39:25,760 --> 00:39:27,080 hierarchically. 924 00:39:27,080 --> 00:39:31,730 And it's sort of similar to the way in which the postal system 925 00:39:31,730 --> 00:39:32,930 works. 926 00:39:32,930 --> 00:39:36,590 So in the 1980s, they came up with a way 927 00:39:36,590 --> 00:39:39,245 of doing it using three kinds of addresses. 928 00:39:39,245 --> 00:39:41,042 I'll call it class A, B, and C address. 929 00:39:41,042 --> 00:39:42,500 We don't use those anymore, but let 930 00:39:42,500 --> 00:39:47,712 me describe what this kind of area-based addressing means. 931 00:39:47,712 --> 00:39:49,670 So here's a very simple, abstract view of this. 932 00:39:49,670 --> 00:39:53,870 The internet used to adopt this in some approximate way. 933 00:39:53,870 --> 00:39:56,150 But this is the conceptual idea. 934 00:39:56,150 --> 00:39:57,590 You design the network into areas. 935 00:39:57,590 --> 00:40:00,498 So MIT might be an area, Stanford might be an area, 936 00:40:00,498 --> 00:40:02,040 Berkeley might be an area-- you know, 937 00:40:02,040 --> 00:40:05,330 BBN, all these different people are their own areas, 938 00:40:05,330 --> 00:40:06,500 organizations. 939 00:40:06,500 --> 00:40:08,880 Areas have numbers that everybody knows. 940 00:40:08,880 --> 00:40:14,240 So that's the first part, which might be an area identifier. 941 00:40:14,240 --> 00:40:16,100 And then this is a-- 942 00:40:16,100 --> 00:40:20,840 within the area, you might have a host identifier, or more 943 00:40:20,840 --> 00:40:24,920 generally, an interface identifier. 944 00:40:24,920 --> 00:40:28,280 What I mean by interface is that really on the internet, 945 00:40:28,280 --> 00:40:31,790 my computer doesn't have an IP address. 946 00:40:31,790 --> 00:40:33,980 If I'm connected to the internet by the ethernet, 947 00:40:33,980 --> 00:40:35,990 the ethernet has an IP address. 948 00:40:35,990 --> 00:40:38,180 It gets an IP address by virtue of connecting 949 00:40:38,180 --> 00:40:40,430 to a switch upstream of it. 950 00:40:40,430 --> 00:40:42,440 M Wi-Fi network has an IP address. 951 00:40:42,440 --> 00:40:45,030 If I use the Bluetooth, my Bluetooth has an IP address. 952 00:40:45,030 --> 00:40:47,390 In fact, in general, sometimes, my computer 953 00:40:47,390 --> 00:40:49,130 might have four IP addresses-- one 954 00:40:49,130 --> 00:40:53,060 if I'm connected on ethernet, one on Bluetooth, one on Wi-Fi. 955 00:40:53,060 --> 00:40:55,070 And if I have one of those cellular modems, 956 00:40:55,070 --> 00:40:57,320 if I tether through my phone, every time I 957 00:40:57,320 --> 00:41:00,650 do one of those things, I get an IP address, OK? 958 00:41:00,650 --> 00:41:04,680 So IP addresses on the internet name the network interface. 959 00:41:04,680 --> 00:41:06,500 So the way this area routing idea works 960 00:41:06,500 --> 00:41:08,930 is that within these areas, there's 961 00:41:08,930 --> 00:41:11,420 routing as usual and forwarding as usual. 962 00:41:11,420 --> 00:41:13,190 So all these nodes have-- 963 00:41:13,190 --> 00:41:15,218 you could recursively build sub-areas. 964 00:41:15,218 --> 00:41:16,760 But if you didn't, each of these guys 965 00:41:16,760 --> 00:41:19,700 would have an entry for all of the other nodes here. 966 00:41:19,700 --> 00:41:22,700 And within these areas, you would have border routers. 967 00:41:22,700 --> 00:41:25,460 And these border routers would only 968 00:41:25,460 --> 00:41:28,470 have entries for the other areas. 969 00:41:28,470 --> 00:41:31,640 So if you wanted to send a packet from area 1 970 00:41:31,640 --> 00:41:36,290 to area 4, what you would do is you would send a packet 971 00:41:36,290 --> 00:41:38,060 to one of your border routers. 972 00:41:38,060 --> 00:41:40,040 And that border router would have an entry 973 00:41:40,040 --> 00:41:43,400 in its routing table to get to area 4. 974 00:41:43,400 --> 00:41:47,670 It wouldn't know anything about the details inside area 4. 975 00:41:47,670 --> 00:41:51,710 And so you have a nice hierarchy where inside the network, 976 00:41:51,710 --> 00:41:55,130 you only know how to get inside your network and to the border. 977 00:41:55,130 --> 00:41:57,290 The borders know how to get inside, 978 00:41:57,290 --> 00:42:00,950 and the borders know how to get to other borders. 979 00:42:00,950 --> 00:42:03,590 But the border of one area doesn't 980 00:42:03,590 --> 00:42:06,450 know how to get inside any other network. 981 00:42:06,450 --> 00:42:09,050 So you can see now, you can recursively apply this idea 982 00:42:09,050 --> 00:42:11,510 and start to scale the routing system. 983 00:42:14,340 --> 00:42:17,220 Now, on the internet, what ended up happening was, 984 00:42:17,220 --> 00:42:19,650 well, they had to apply this area hierarchy. 985 00:42:19,650 --> 00:42:21,880 And very soon, organizations started saying, 986 00:42:21,880 --> 00:42:24,510 well, I have a big area, and I have a small area. 987 00:42:24,510 --> 00:42:26,100 So how big do you make this thing? 988 00:42:26,100 --> 00:42:29,700 In the very old internet, these were 8 bits long, 989 00:42:29,700 --> 00:42:31,900 and these were the rest of the address. 990 00:42:31,900 --> 00:42:36,030 If you have 8 bits, you can only have 256 organizations. 991 00:42:36,030 --> 00:42:39,570 And although Kahn and Cerf thought that was plenty enough, 992 00:42:39,570 --> 00:42:41,857 that clearly wasn't the case. 993 00:42:41,857 --> 00:42:44,190 So by this time, people were starting to build equipment 994 00:42:44,190 --> 00:42:46,950 with these 32-bit addresses. 995 00:42:46,950 --> 00:42:49,450 And all this hardware was out there, so what do you do? 996 00:42:49,450 --> 00:42:51,000 So what they said was, all right, 997 00:42:51,000 --> 00:42:54,060 let's have three classes of areas. 998 00:42:54,060 --> 00:42:58,350 For the really big guys, we'll have class A addresses. 999 00:42:58,350 --> 00:42:59,850 Then for the medium guys, we'll have 1000 00:42:59,850 --> 00:43:02,292 class B. And for the little guys, we'll have class C. 1001 00:43:02,292 --> 00:43:03,750 What that meant is that we're going 1002 00:43:03,750 --> 00:43:08,190 to have class A allows an organization to have up to 2 1003 00:43:08,190 --> 00:43:11,070 to the 24 addresses. 1004 00:43:11,070 --> 00:43:15,670 Because class A is identified by 8 bits. 1005 00:43:15,670 --> 00:43:21,120 So you get 24, which is 32 minus 8. 1006 00:43:21,120 --> 00:43:22,920 So MIT was pretty smart. 1007 00:43:22,920 --> 00:43:24,645 They decided that they would go and-- 1008 00:43:24,645 --> 00:43:26,520 you know, they were up there, they were doing 1009 00:43:26,520 --> 00:43:27,728 a lot of networking research. 1010 00:43:27,728 --> 00:43:29,610 So they said, we're going to go get ourselves 1011 00:43:29,610 --> 00:43:31,890 one of these class A addresses because we're a big university, 1012 00:43:31,890 --> 00:43:33,030 and we've got lots of computers. 1013 00:43:33,030 --> 00:43:34,947 And it was probably the case that at the time, 1014 00:43:34,947 --> 00:43:37,630 MIT probably had more computers than most other places. 1015 00:43:37,630 --> 00:43:40,900 So even to this day, they maintain this address, 1016 00:43:40,900 --> 00:43:45,600 which was 18-dot-star, where the star refers to-- 1017 00:43:45,600 --> 00:43:47,850 well, technically, star dot star dot star. 1018 00:43:47,850 --> 00:43:52,500 So all 2 to the 24 addresses that start with the number 18, 1019 00:43:52,500 --> 00:43:54,570 or in binary terms, whatever the 18 is-- 1020 00:43:54,570 --> 00:43:56,940 000-- I'm going to get this wrong, 1021 00:43:56,940 --> 00:43:58,650 but there's some 8-bit number for 18. 1022 00:44:03,660 --> 00:44:07,410 So anyway, they went and got this done. 1023 00:44:07,410 --> 00:44:11,160 Now, nobody wanted the class Cs because the classes were-- 1024 00:44:11,160 --> 00:44:15,330 you could get 2 to the 8 addresses 1025 00:44:15,330 --> 00:44:19,920 because the class C was defined as a 24 base. 1026 00:44:19,920 --> 00:44:22,025 So you have 24 bits to define the organization. 1027 00:44:22,025 --> 00:44:24,150 So you could have 2 to the 24, or some large number 1028 00:44:24,150 --> 00:44:25,890 of organizations-- not quite 2 to the 24. 1029 00:44:25,890 --> 00:44:28,740 But you'd have some large number of organizations, 1030 00:44:28,740 --> 00:44:31,210 but then you'd only get 256. 1031 00:44:31,210 --> 00:44:33,713 Now, the organization doling out these numbers-- 1032 00:44:33,713 --> 00:44:35,130 there's a particular organization. 1033 00:44:35,130 --> 00:44:36,990 Actually, it was not even an organization at the time. 1034 00:44:36,990 --> 00:44:37,810 It was like this-- 1035 00:44:37,810 --> 00:44:41,460 Jon Postel at UCLA, one guy was doling this stuff out, 1036 00:44:41,460 --> 00:44:45,690 and then it became an organization. 1037 00:44:45,690 --> 00:44:46,712 Jon Postel is great. 1038 00:44:46,712 --> 00:44:47,670 I mean, he was really-- 1039 00:44:47,670 --> 00:44:51,690 the social aspects of how he managed this was remarkable. 1040 00:44:51,690 --> 00:44:55,320 So he didn't dole these out randomly. 1041 00:44:55,320 --> 00:44:59,370 And nobody wanted this, everybody wanted this. 1042 00:44:59,370 --> 00:45:01,500 I mean, everybody wanted that, but they got this. 1043 00:45:01,500 --> 00:45:02,820 And this was 16 and 16. 1044 00:45:02,820 --> 00:45:04,980 So you'd get 2 to the 16 addresses, 1045 00:45:04,980 --> 00:45:09,600 and you get your 16-bit identifiers for these areas. 1046 00:45:09,600 --> 00:45:12,150 And over time, as the internet grew, 1047 00:45:12,150 --> 00:45:14,028 the obvious thing happened. 1048 00:45:14,028 --> 00:45:16,320 Because the thing is this is like the Goldilocks story, 1049 00:45:16,320 --> 00:45:16,570 right? 1050 00:45:16,570 --> 00:45:18,670 This is like, it's too big, and this is too small. 1051 00:45:18,670 --> 00:45:19,462 This is just right. 1052 00:45:19,462 --> 00:45:22,350 And pretty soon, there were 2 to the 16 organizations 1053 00:45:22,350 --> 00:45:24,450 on the internet, and we ran out. 1054 00:45:24,450 --> 00:45:26,680 Literally, they ran out of class B addresses. 1055 00:45:26,680 --> 00:45:28,770 And by this time, by the early '90s, 1056 00:45:28,770 --> 00:45:31,890 they realized that this rigid decomposition 1057 00:45:31,890 --> 00:45:35,610 into addresses in this form was just not quite right. 1058 00:45:35,610 --> 00:45:37,080 Because what you really wanted was 1059 00:45:37,080 --> 00:45:38,625 a system where you allowed-- 1060 00:45:38,625 --> 00:45:40,470 I mean, this idea of an organization ID 1061 00:45:40,470 --> 00:45:41,800 didn't make sense. 1062 00:45:41,800 --> 00:45:46,113 Like, what if I need 2 to the 12 addresses? 1063 00:45:46,113 --> 00:45:47,530 Today, you would have to give me 2 1064 00:45:47,530 --> 00:45:50,070 to the 16, which is ridiculous. 1065 00:45:50,070 --> 00:45:51,613 So if I needed 2 to the 12, well, 1066 00:45:51,613 --> 00:45:52,780 how do you actually do that? 1067 00:45:52,780 --> 00:45:55,890 Well, I could get four 2 to the 8's, but if those 2 to the 8's 1068 00:45:55,890 --> 00:45:59,280 were not contiguous, then those routers 1069 00:45:59,280 --> 00:46:01,620 in the middle of the internet would not 1070 00:46:01,620 --> 00:46:07,260 be able to treat them as one and have the same prefix 1071 00:46:07,260 --> 00:46:08,610 to define the entire network. 1072 00:46:08,610 --> 00:46:11,310 They would have to have four different entries, defeating 1073 00:46:11,310 --> 00:46:14,700 the purpose of, in fact, using this kind 1074 00:46:14,700 --> 00:46:15,940 of area-based routing. 1075 00:46:15,940 --> 00:46:18,850 So the whole thing was kind of messed up. 1076 00:46:18,850 --> 00:46:21,000 So they actually got wise to this problem. 1077 00:46:21,000 --> 00:46:23,940 And they came up with a more sensible, sane way 1078 00:46:23,940 --> 00:46:25,920 to deal with this problem. 1079 00:46:25,920 --> 00:46:30,760 I'll talk about that probably on Monday. 1080 00:46:30,760 --> 00:46:32,400 Now, in the meantime, in the 1980s, 1081 00:46:32,400 --> 00:46:34,230 this growth was happening pretty rapidly. 1082 00:46:34,230 --> 00:46:37,380 And they started getting organized. 1083 00:46:37,380 --> 00:46:40,530 Vint Cerf, who was by then at ARPA, 1084 00:46:40,530 --> 00:46:43,920 appointed Dave Clark, who was a senior research scientist 1085 00:46:43,920 --> 00:46:47,490 and professor at MIT to a position of internet's 1086 00:46:47,490 --> 00:46:48,450 chief architect. 1087 00:46:48,450 --> 00:46:52,560 And he was instrumental in writing and bringing together 1088 00:46:52,560 --> 00:46:54,630 a lot of people in organizing how people do 1089 00:46:54,630 --> 00:46:56,180 this kind of standardization. 1090 00:46:56,180 --> 00:46:58,680 So there was an organization called the Internet Engineering 1091 00:46:58,680 --> 00:47:00,370 Task Force, or IETF. 1092 00:47:00,370 --> 00:47:02,370 That's the organization that determines and sets 1093 00:47:02,370 --> 00:47:06,240 the standards for the protocols that run on the internet. 1094 00:47:06,240 --> 00:47:08,280 In 1982, a really important thing happened. 1095 00:47:08,280 --> 00:47:13,890 This idea of this community with the internet architecture 1096 00:47:13,890 --> 00:47:19,110 community got a real boost when the US Department of Defense 1097 00:47:19,110 --> 00:47:22,140 looked at various ways and competing ways of designing 1098 00:47:22,140 --> 00:47:25,590 networks, and kind of remarkably for a Department of Defense 1099 00:47:25,590 --> 00:47:28,050 decided to pick the open standard rather than some 1100 00:47:28,050 --> 00:47:31,080 proprietary standard, rather than some closed standard that 1101 00:47:31,080 --> 00:47:34,733 is potentially more secure, though it really isn't. 1102 00:47:34,733 --> 00:47:36,150 Remarkably, they said, we're going 1103 00:47:36,150 --> 00:47:38,970 to standardize our entire systems on TCP/IP. 1104 00:47:38,970 --> 00:47:40,320 And the Defense Department-- 1105 00:47:40,320 --> 00:47:42,630 I don't know if it's still the case, it probably is. 1106 00:47:42,630 --> 00:47:44,880 But in those days, it was a huge consumer 1107 00:47:44,880 --> 00:47:46,465 of information technology. 1108 00:47:46,465 --> 00:47:48,840 It still is, it's just that other people consume it, too. 1109 00:47:48,840 --> 00:47:51,210 But in those days, it was probably the dominant consumer 1110 00:47:51,210 --> 00:47:53,220 of information technology. 1111 00:47:53,220 --> 00:47:57,000 And they standardized on it, and they awarded a contract 1112 00:47:57,000 --> 00:47:59,700 to the Berkeley computer systems group 1113 00:47:59,700 --> 00:48:03,600 to build TCP/IP, which by then had become standard. 1114 00:48:03,600 --> 00:48:05,100 And there were many implementations, 1115 00:48:05,100 --> 00:48:09,900 but they said, take Unix and go build the TCP/IP stack. 1116 00:48:09,900 --> 00:48:11,850 And Berkeley did a lot of interesting things 1117 00:48:11,850 --> 00:48:14,760 with it including creating the Sockets Layer. 1118 00:48:14,760 --> 00:48:16,680 They came out with what today is known 1119 00:48:16,680 --> 00:48:21,960 as open source implementations of the TCP stack. 1120 00:48:21,960 --> 00:48:24,750 In 1983, MIT created Project Athena, 1121 00:48:24,750 --> 00:48:28,800 which was the world's first campus-wide-- 1122 00:48:28,800 --> 00:48:30,660 campus-area network system. 1123 00:48:30,660 --> 00:48:34,650 And they did a lot of work on things like filesystems, 1124 00:48:34,650 --> 00:48:38,640 distributed file systems, and the Kerberos authentication 1125 00:48:38,640 --> 00:48:42,540 scheme, and a lot of important ideas from this network. 1126 00:48:42,540 --> 00:48:44,790 And they also ran the TCP/IP stack. 1127 00:48:44,790 --> 00:48:47,370 They didn't run anything proprietary. 1128 00:48:47,370 --> 00:48:50,580 In 1984, the domain name system was introduced. 1129 00:48:50,580 --> 00:48:52,140 For those who don't know it, when 1130 00:48:52,140 --> 00:48:55,890 you go type in www.mit.edu, something 1131 00:48:55,890 --> 00:48:57,270 converts it to an IP address. 1132 00:48:57,270 --> 00:49:00,630 And then your network stack communicates and sends 1133 00:49:00,630 --> 00:49:06,370 packets over TCP or UDP or these protocols to that address. 1134 00:49:06,370 --> 00:49:07,420 How do you convert this? 1135 00:49:07,420 --> 00:49:11,750 Well, again, originally, this was maintained in a file. 1136 00:49:11,750 --> 00:49:14,650 This file was called host.txt. 1137 00:49:14,650 --> 00:49:17,110 And believe it or not, the way this file would work 1138 00:49:17,110 --> 00:49:20,223 was that every night, I think in the middle of the night, 1139 00:49:20,223 --> 00:49:22,390 every computer on the internet would go and download 1140 00:49:22,390 --> 00:49:25,210 this file from one computer that was located somewhere 1141 00:49:25,210 --> 00:49:26,260 on the west coast. 1142 00:49:26,260 --> 00:49:28,730 Like, literally, you would get a host.txt file. 1143 00:49:28,730 --> 00:49:29,470 And of course, you could hack it. 1144 00:49:29,470 --> 00:49:30,280 You could do whatever you want. 1145 00:49:30,280 --> 00:49:32,447 And the assumption, of course, was that no one was-- 1146 00:49:32,447 --> 00:49:34,275 I was told that in the early days, 1147 00:49:34,275 --> 00:49:36,400 every computer had a root password which was empty, 1148 00:49:36,400 --> 00:49:38,942 or these computers at MIT had a root password that was empty. 1149 00:49:38,942 --> 00:49:40,980 Everybody could log in to any computer. 1150 00:49:40,980 --> 00:49:42,730 And everybody was completely trusted, 1151 00:49:42,730 --> 00:49:46,010 which I think is sort of not true anymore. 1152 00:49:46,010 --> 00:49:49,210 But you would download this host.txt file every day. 1153 00:49:49,210 --> 00:49:53,185 And it started-- as the internet grew really fast-- 1154 00:49:53,185 --> 00:49:55,960 You know, the internet has been growing by 80% to 90% 1155 00:49:55,960 --> 00:49:59,830 a year, not just in the past few years, not just in the '90s. 1156 00:49:59,830 --> 00:50:03,800 You know, it's been growing at 80% to 90% a year since 1980. 1157 00:50:03,800 --> 00:50:06,980 So it's just on this amazing tear. 1158 00:50:06,980 --> 00:50:12,300 And so anyway, this idea of downloading a file every night 1159 00:50:12,300 --> 00:50:13,510 is just not a good idea. 1160 00:50:13,510 --> 00:50:17,370 So they created the domain name system. 1161 00:50:17,370 --> 00:50:19,050 They had to create-- 1162 00:50:19,050 --> 00:50:21,270 the NSF, which was the National Science Foundation, 1163 00:50:21,270 --> 00:50:22,020 got into the act. 1164 00:50:22,020 --> 00:50:24,780 And they became the first internet backbone. 1165 00:50:24,780 --> 00:50:27,930 And the backbone-- the idea is that this is a backbone that 1166 00:50:27,930 --> 00:50:30,900 connects all of these different networks together-- 1167 00:50:30,900 --> 00:50:33,280 in particular, all of the universities together. 1168 00:50:33,280 --> 00:50:35,610 So they also picked TCP/IP as a standard. 1169 00:50:35,610 --> 00:50:37,110 And again, the important lesson here 1170 00:50:37,110 --> 00:50:40,040 is they picked it as a standard because it was open, 1171 00:50:40,040 --> 00:50:43,110 because It was very clear that these implementations were 1172 00:50:43,110 --> 00:50:44,970 available, they were free, everybody 1173 00:50:44,970 --> 00:50:47,220 could contribute to it, and everybody could beat on it 1174 00:50:47,220 --> 00:50:49,920 and improve it, and there was no proprietary technology 1175 00:50:49,920 --> 00:50:51,210 that was held. 1176 00:50:51,210 --> 00:50:53,670 So what I will do here is I'm going to stop. 1177 00:50:53,670 --> 00:50:55,990 I'll pick it up at this point on Monday, 1178 00:50:55,990 --> 00:50:58,920 talk about congestion control, how to hijack routes, 1179 00:50:58,920 --> 00:51:01,140 and how to send spam without being detected, 1180 00:51:01,140 --> 00:51:05,310 and then talk about the future of networking 1181 00:51:05,310 --> 00:51:07,760 and communications.