6.033 | Spring 2018 | Undergraduate

Computer System Engineering

Week 6: Networking Part II

Lecture 10: Networking: Routing (BGP)

Lecture 10 Outline

  1. Introduction
  2. (Three Ways We Deal With) Scale
  3. Policy Routing
  4. Typical BGP Relationships
  5. BGP Relationships => BGP Export Policies
  6. BGP Relationships => BGP Import Policies
  7. BGP in Light of Distributed Routing
  8. Problems With BGP
  9. Recurring Themes

Lecture Slides

Reading

  • Balakrishnan, Hari. “An Introduction to Wide-Area Internet Routing.” 6.033 case study, MIT, January 2009. 

Recitation 10: [Cancelled due to inclement weather]

Hands-on Assignment 5: Internet Routes and Measuring Round Trip Times

(Not available to OCW users.)

Lecture 11: Transport Layer

Lecture 11 Outline

  1. Introduction
  2. TCP
  3. Reliable Transport via Sliding-Window Protocol
  4. Main Motivation
  5. Congestion Control
  6. AIMD
  7. Finite Offered Load
  8. Additional Mechanisms
  9. Reflection

Lecture Slides

Reading

Recitation 11: Resilient Overlay Networks (RON)

Tutorial 6: Assembling the Design Project Preliminary Report

The Design Project Preliminary Report (PDF) was assigned in Week 3. The preliminary report for the design project will be evaluated by your Recitation Instructor and your Communication Instructor. Your Communication Instructor will evaluate it according to the grading rubric and assign a letter grade. Your Recitation Instructor will evaluate the preliminary report to make sure your design is on the right track; you should incorporate their feedback into the presentation and report. See the Design Project section for more details.

  1. Introduction 
    • Today: Routing, some addressing.
    • Enormous growth of Internet => routing protocols redesigned to scale, and also to enforce policy.
    • Problem: DV and LS don’t scale to the Internet.
      • DV = Low overhead, but convergence time is proportional to longest path. Good for small networks.
      • LS = Fast convergence, but high overhead because of flooding. Good for MIT-sized networks, but not the Internet.
  2. (Three Ways We Deal With) Scale 
    • Path-vector routing
      • Like DV, but include the full path in the routing advertisements.  Overhead increases (advs are larger), but convergence time decreases (avoid counting to infinity).
      • Overhead is still lower than LS’s.
    • Routing hierarchy
      • Internet is divided into Autonomous Systems (ASes). ASes are universities, ISPs, government branches, etc. Each AS has a unique ID (its AS number). There are tens of thousands of them (but not billions).
      • Use one routing protocol to route across ASes, and a different protocol to route within ASes.
        • Implies that there are devices on the edge of each AS that can “translate” between or “speak” both protocols.
      • BGP is the path-vector protocol used across ASes.
    • Topological addressing
      • Despite being between ASes, BGP still routes to IP addresses (e.g., to 18.0.0.1, not to AS3).
      • Addresses are given to ASes in contiguous blocks, so that they can be specified succinctly via a particular notation (“CIDR” notation).
      • Keeps advertisements small(er than they would be otherwise).
  3. Policy Routing 
    • ASes also want to implement policy; they want “policy routing”.
    • Policy routing: Switches make routing decisions based on some set of policies set by a human.  Routing protocol must disseminate enough information to enable those policies.
    • What policies are typical in BGP? ASes don’t want to send traffic on a path unless they have financial incentive to do so.
    • Mechanism of enforcement: Selective advertisements. AS1 won’t tell AS2 about a path unless it will make money by letting AS2 use the path.
      • => Each AS will have a different view of the network, and that view will (almost certainly) *not* contain every physical link.
  4. Typical BGP Relationships (which will eventually lead us to typical BGP policies) 
    • Customer/provider
      • Customers pay for access (transit), which the provider provides.
    • Peers
      • Peers provide mutual access to a subset of each other’s routing tables, namely, the subset that contains their transit customers.
      • Why peer? Can save money and improve performance. Sometimes, it may be the only way to connect your customers to some part of the Internet.
      • Why *not* peer? You’d rather have customers.
  5. BGP Relationships => BGP Export Policies 
    • First decision: Which routes do I advertise to which neighbors? These are an AS’s “export policies.”
      • High-level: “Tell everyone about yourself (your internal IPs) and your customers; tell your customer about everyone.”
      • More specifically:
        • Providers export customer’s routes to everyone.
        • A customer exports its provider’s routes to *its* customers.
          • These two should make sense: Since the customer is paying for Internet, the provider should give them a route to as many destinations as possible. Similarly, the provider should allow *other* parts of the network to reach its customers.
        • AS exports *only* customer routes to peers.
          • Why not full table? AS doesn’t want to provide transit for its peers; they’re not paying it for transit.
    • Z will tell X about C; C is a customer of Z, and X and Z are peers.
    • X will tell Z, Y, and T about C1, C2, and C3.
    • Y will tell X about D.
    • X will *not* tell Y about C; it makes no money to provide transit from Y to C.
    • X doesn’t tell Y about T; it would lose money to provide transit from Y to T.
    • In example, Y appears disconnected from part of the network. BGP doesn’t prevent this. In practice, it never happens.
      • Almost every AS is a customer of someone else (i.e., Y would buy transit from someone).
      • Typically: Small ASes buy Internet from Tier-3 ISPs, which buy Internet from Tier-2 ISPs, which buy Internet from Tier-1 ISPs. Tier-1’s are huge; there are only a handful (10-15).
    • Additionally, all Tier-1 ISPs peer with one another. So each Tier-1 ISP can provide global connectivity.
      • This is an example where we need peering in order to reach part of the Internet.
  6. BGP Relationships => BGP Import Policies 
    • If an AS hears about a route to X from multiple neighbors, how does it decide? These are its “import policies.”
    • First: Make money. Prefer routes via customers—which you make money on—to routes via peers —which you don’t make, but don’t lose money on—to routes via providers—which you lose money on.
    • In the case of a tie (which happens often): There are a whole host of other attributes that BGP provides. A common one is AS-hop-count.
    • Each AS sets its own policies.
  7. BGP in Light of Distributed Routing 
    • HELLO protocol: BGP sends KEEPALIVE messages to neighbors.
    • Advertisements: Sent to neighbors. Look different depending on which neighbor.
      • BGP runs on top of TCP, a reliable-transport protocol. Doesn’t have to do periodic advertisements to handle failure. Instead, push advs when routes change.
    • Integration: Via policies described above.
    • Failures: Routes can be explicitly withdrawn in BGP when they fail. Routing loops avoided because BGP is path-vector.
  8. Problems With BGP 
    • Does it scale? Well, it works on the Internet. But…
      • BGP routing tables are getting big (exceeding the amount of memory dedicated to the table in some switches).
      • We see route instability due to misconfigurations or conflicting AS policies. “Route-flap damping” (ignore advs about frequently-changing routes) helps with this, but increases convergence time.
      • ASes can multi-home: Buy Internet from more than one ISP, usually for back-up or load-balancing. More multi-homed networks => bigger routing tables. The load-balancing itself is also tricky.
      • iBGP. An AS actually has multiple BGP routers on its edge, and a protocol called iBGP keeps them all in sync. iBGP requires an AS’s BGP routers to be connected in a complete graph, and so it doesn’t scale particularly well.
      • Basically: Internet has grown enough that scalability of BGP is becoming a concern.
    • Is it secure?
      • Goodness no. ASes can advertise about a prefix that they don’t actually own.
      • Similar problem (and solution) as in DNS. We’ll talk more about it after spring break.
    • Is it secure?
      • The protocol itself: Arguably yes.
      • BGP in practice: No. Again, mo’ money, mo’ problems. Also, human operator error due to the complexity of setting the policies.
  1. Introduction
    • Last week: How to route scalably in the face of policy and economy.
    • This week: How to transport scalably in the face of diverse application demands.
  2. TCP
    • Goals: Provide reliable transport, prevent congestion.
    • Broader questions: How do we do this scalably, and how do we share the network efficiently and fairly?
    • Today: TCP Congestion Control
      • In particular, a version of TCP known as “New Reno.”
    • Next lecture: An alternative approach to “resource management” on the Internet.
  3. Reliable Transport via Sliding-Window Protocol
    • Goal: Receiving application gets a complete, in-order bytestream from the sender. One copy of every packet, in order.
    • Why do we need it? Network is unreliable. Packets get dropped, can arrive out-of-order.
    • Basics:
      • Every data packet gets a sequence number (1, 2, 3, …).
      • Sender has W outstanding packets at any given time. W = window size.
      • When receive gets a packet, it sends an ACK back. ACKs are cumulative: An ACK for X indicates “I have received all packets up to and including X.”
      • If sender doesn’t receive an ACK indicating that packet X has been received, after some amount of time it will “timeout” and retransmit X.
        • Maybe X was lost, its ACK was lost, or its ACK is delayed.
        • The timeout = proportional to (but a bit larger than) the RTT of the path between sender and receiver.
      • At receiver: Keep buffer to avoid delivering out-of-order packets, keep track of last-packet-delivered to avoid delivering duplicates.
  4. Main Motivation
    • What’s the “right” value for W?

    • In particular, what if there are multiple senders?

      • Ex:
      Diagram of packet transmission with 2 Mb/s window.
      • What should happen? Debatable. Reasonable alternative:
      Diagram of packet transmission with 1 Mb/s window.
      • How do S1 and S2 figure this out? What happens if S3 arrives? Or if S1 starts sending less? Etc.
  5. Congestion Control: Controlling the Source Rate to Achieve High Performance
    • Goals: Efficiency and fairness.
      • Minimize drops, minimize delay, maximize utilization.
      • Share bandwidth fairly among all connections that are using it.
    • FOR NOW: Assume all senders have infinite offered load. Fairness = splitting bandwidth equally amongst them.
    • But no senders knows how many other senders there are, and that number can change over time.
    • We’ll use window-based congestion control. Switches are dumb (can only drop packets); senders are smart.
  6. AIMD
    • Need a signal for congestion in the network, so senders can react to it.
    • Our signal: Packet drops
    • Every RTT:
      • If there is no loss, W = W+1
      • If there is loss, W = W/2
    • This is “Additive Increase Multiplicative Decrease” (AIMD)
    • Senders constantly readjust => adapt to a changing number of senders, or changing offered loads
    • Window size exhibits sawtooth behavior (see slides)
    • Why AIMD?
      • It’s “safe”: Senders are conservative about increasing, but scale back dramatically in the face of congestion
      • Efficient and fair
  7. Finite Offered Load
    • Remove the assumption that everyone has infinite offered load
    • Suppose S1 and S2 have offered load of 1Mb/s, S3 has offered load of .5Mb/s, and they all share a bottleneck with capacity 2Mb/s
    • What happens?
      • In theory: S3 stops increase once it’s sending .5Mb/s. S1 and S2 continue increasing until they reach .75Mb/s
    • Is this fair?
      • In some sense. It achieves a type of fairness known as “max-min fairness”. But there are other definitions (e.g., “proportional fairness”)
    • What happens in practice?
      • We might get max-min fairness, or one of the senders might experience a much longer RTT and so not increase its window at the same rate.
    • So: TCP’s congestion control utilizes the network reasonably well, but it’s hard to measure fairness, or claim that fairness is achieved under skewed workloads, varying RTTs, etc.
  8. Additional Mechanisms
    • Slow Start

      • At the beginning of the connection, exponential increase the window (double it every RTT until you see loss).
      • Decreases the time it takes for the initial window to “ramp up.”
      • (See slide for diagram)
    • Fast Retransmit/Fast Recovery

      • When a sender receives an ACK with sequence number X, and then three duplicates of that packet, it immediately retransmits packet X+1 (remember: ACKs are cumulative).

      Ex:

                Send    1 2 3 4 5 6  
                    Receive 1 2   2 2 2
      

            Sender receives 4 ACKs total with sequence number “2”; infers that packet 3 is lost, immediately retransmits.

      • On fast-retransmit, window decrease is as before: W = W/2.
      • In fact, when a packet is lost due to timeout, TCP behaves differently: W = 1, then do slow-start until the last good window and then start additive increase.
      • (See slide for diagram)
      • Reasoning: If there is a retransmission due to timeout, then there is significant loss in the network and senders should back *way* off.
  9. Reflection
    • TCP has been a massive success, requires no changes to the Internet’s infrastructure, is something endpoints can opt-in to, allows the network to be shared among tons of different users, all with different—and changing—types of traffic, in a distributed manner.
    • BUT: TCP doesn’t react to congestion until it’s already happening. Is there something better we could do?

For this recitation, you’ll be reading most of “Resilient Overlay Networks (PDF)” by David Anderson, Hari Balakishnan, Frans Kaashoek, and Robert Morris. This paper explains how to build an overlay network on top of the existing Internet that has better properties or other features. Many Internet applications, such as peer-to-peer applications are built as overlay networks.

To guide you as you read:

  • Read Section 1 closely. It introduces the main goals of RON and summarizes the main results.
  • Skim Section 2. It gives support for the context and motivation of RON.
  • Read Section 3 closely. Make sure you understand each of RON’s design goals.
  • Read Section 4, but don’t get too stuck on 4.2.2. It’s important that you understand that RON uses measurement to evaluate and select paths, less important that you closely scrutinize its equations.
  • Skip Section 5.
  • Skim Section 6. The main results of the paper are summarized at the end of the intro. You should understand how the authors evaluated RON to determine those results.
  • Read Sections 7 and 8. Section 7, in particular, addresses some criticisms of RON.

As you read, think about:

  • Why is RON able to overcome failures that BGP can’t?
  • Why does RON collect different application metrics?
  • How far does RON scale?
  • Routing is normally done at the network layer, but RON (and BGP) operate at the application layer. What are the benefits and drawbacks of this change?

Questions for Recitation

Before you come to this recitation, write up (on paper) a brief answer to the following (really—we don’t need more than a couple sentences for each question).  

Your answers to these questions should be in your own words, not direct quotations from the paper.

  • What is the goal of RON?
  • How was it designed to meet this goal?
  • Why do we need RON? (Or why do the authors believe that we need RON?)

As always, there are multiple correct answers for each of these questions.

Course Info

Instructor
As Taught In
Spring 2018
Learning Resource Types
Lecture Notes
Written Assignments
Projects with Examples
Instructor Insights