6.033 | Spring 2018 | Undergraduate

Computer System Engineering

Week 9: Distributed Systems Part II

Lecture 16: Atomicity via Logging

Lecture 16 Outline

  1. Introduction
  2. Motivating Example
  3. Basic Idea
  4. How to Use a Log for Transactions
  5. Performance of Log
  6. Cell Storage
  7. Performance of Log + Cell Storage
  8. Improving Performance
  9. What about Un-undo-able Actions?
  10. Summary

Lecture Slides


  • Book section 9.3

Recitation 16: Log-Structured File System (LFS)

Hands-on Assignment 6: Write Ahead Log (WAL) System

(Not available to OCW users.)

Lecture 17: Fault Tolerance: Isolation

Lecture 17 Outline

  1. Introduction
  2. Serializability
  3. Conflict Serializability
  4. Conflict Graphs
  5. Interlude
  6. Two-phase Locking (2PL)
  7. Performance Improvement: Reader-Writer locks
  8. Another Possible Performance Improvement: Giving up on Conflict Serializability
  9. Summary

Lecture Slides


  • Book sections 9.4 before 9.4.1 and 9.5

Recitation 17: Databases

Tutorial 9: [No Tutorial this Week]

Read “Concurrency Control and Recovery by Michael J. Franklin. Skip section 3.2.

This paper is easiest to digest in chunks. You don’t have to—in fact, probably shouldn’t—read it all at once (or even read it in order).

  • Section 1 introduces some basic terms and goals for database systems.
  • Section 2 gives a good review of the basics of locking and logging that will be discussed in lectures this week.
  • Section 3.1 (remember, skip 3.2) discusses some solutions to the problem of concurrency control (how to keep a database consistent even with interleaved operations from multiple users).
  • Section 4 discusses some of the trade-offs of the transaction model discussed in the paper.

You should come to understand concepts such as serializability, (no-)force and (no-)steal, write-ahead logging, two-phase locking, degrees of isolation, etc.

As you read, think about the following:

  • What failure models are we dealing with in this paper?
  • Under what circumstances would you want transaction executions to respect the ACID properties? Are there systems that don’t need to have all four properties?

Questions for Recitation

Before you come to this recitation, write up (on paper) a brief answer to the following (really—we don’t need more than a couple sentences for each question).  

Your answers to these questions should be in your own words, not direct quotations from the paper.

  • What is an example from the paper that illustrates the trade-off between implementing ACID transaction properties and maintaining good performance?
  • How does that policy or technique trade off performance?
  • Why would you use this policy or technique? (In what context, under what circumstances, etc.)

As always, there are multiple correct answers for each of these questions.

  1. Introduction
    • Currently: Building reliable systems out of unreliable components. We’re working on implementing transactions which provide:
      • Atomicity
      • Isolation
    • So far: Have a poorly-performing version of atomicity via shadow copies.
    • Today: Logging, which will give us reasonable performance for atomicity. Logging also works when we have multiple concurrent transactions, even though for today we’re not thinking about concurrency.
  2. Motivating Example
    • begin // T1
      A = 100  
      B = 50  
      commit // At commit: A=100; B=50
      begin // T2
      A = A - 20  
      B = B + 20  
      commit // At commit: A=80; B=70
      begin // T3
      A = A + 30  
    • Problem: A = 110, but T3 didn’t commit. We need to revert.

  3. Basic Idea
    • Keep a log of all changes and whether a transaction commits or aborts.
      • Every transaction gets a unique ID.
      • UPDATE records include old an new values of a variable.
      • COMMIT records specify that transaction committed..
      • ABORT records specify that transaction aborted.
        • Not always needed.
    • (See Lecture 16 slides (PDF) for the log for this example.)
    • Nice: Updates are small appends.
  4. How to Use a Log for Transactions
    • On begin: Allocate new transaction ID (TID).
    • On write: Append entry to log.
    • On read: Scan log to find last committed value.
    • On commit: Write commit record.
      • This is the commit point.
      • Atomic because we can assume it’s a single-sector write.
      • Another way to do it would be to put checksums on each record and ignore partially-written records.
    • On abort: Nothing (could write an ABORT record but not strictly needed).
    • On recover: Nothing.
    • (see Lecture 16 slides (PDF) for code.)
  5. Performance of Log
    • Writes: Good. Sequential = fast.
    • Reads: Terrible. Must scan entire log.
    • Recovery: Instantaneous.
  6. Cell Storage
    • Improve read performance with cell storage.
      • (For us) stored on disk, i.e., non-volatile storage.
      • Updates go to log and cell storage.
      • Read from cell storage.
    • “Log” = write to log. “Install” = write to cell storage.
    • How to recover:
      • Scan the log backwards, determine what actions aborted, and undo them.
      • (see Lecture 16 slides (PDF) for code.)
      • What if we crash during recovery? No worries; recover() is idempotent. Can do it repeatedly.
    • How to write:
      • Log before install, not the other way; otherwise, can’t recover from a crash in between the two writes.
      • This is write-ahead logging.
  7. Performance of Log + Cell Storage
    • Writes: Okay, but now we write to disk twice instead of once.
    • Reads: Fast.
    • Recovery: Bad. Have to scan the entire log.
  8. Improving Performance
    • Improve writes: Use a (volatile) cache.
      • Reads go to cache first, writes go to cache and are eventually flushed to cell storage.
      • Problem: After crash, there may be updates that didn’t make it to cell storage (were in cache but not flushed).
        • Also could be updates in cell storage that need to be undone, but we had that problem before.
      • Solution: We need a redo phase in addition to an undo phase in our recovery.
    • Improving recovery:
      • Problem: Recovery takes longer and longer as the log grows.
      • Solution: Truncate the log.
      • How?
        • Assuming no pending actions:
          • Flush all cached updates to cell storage.
          • Write a CHECKPOINT record.
          • Truncate the log prior to the CHECKPOINT record.
            • Usually amounts to deleting a file.
        • With pending actions, delete before the checkpoint and earliest undecided record.
    • ABORT records
      • Can be used to help recovery and skip undo-ing aborted transaction. Not necessary for correctness—can always just pretend we crashed—but can help.
  9. What about Un-undo-able Actions?
    • What if our transaction fires a missile and then aborts?
    • Typically: Wait for software that controls the action to commit and then take the action, but have a special way to detect whether the action has/will happened.
  10. Summary * Logging is a general technique for achieving atomicity.
    • Writes are fast, reads can be fast with cell storage.
    • Need to log before installing (write-ahead), and need a recovery process. * Tomorrow is recitation: Logging for file systems. * Now: We’re good with atomicity.
    • In fact, logging will work fine with concurrent transactions; the problem will be figuring out which steps we can actually run in parallel. * Wednesday: Isolation. * Next week: Distributed transactions.
  1. Introduction

    • Last time: Atomicity via logging. We’re good with atomicity now.
    • Today: Isolation.
    • The problem: We have multiple transactions—T1, T2, .., TN—all of which must be atomic, and all of which can have multiple steps. We want to schedule the steps of these transactions so that it appears as if they ran sequentially.
    • Naive solution: Run transaction sequentially with a single global lock.
      • Very poor performance.
    • Better solution: Fine-grained locking. But we all agreed that this was error prone back in the OS section. What to do?
  2. Serializability

    • What does it mean for transactions to “appear” as if they were run in sequence?
      • That the final written state is the same?
      • That the final written state + intermediate reads are the same?
      • Something else?
    • It depends! There are different types of “serializability”. The right one depends on what your application is doing.
    • Final-state serializability: A schedule is final-state serializable if its final written state is equivalent to that of some serial schedule.
  3. Conflict Serializability

    • Two operations conflict if:
      1. They both operate on the same object.
      2. At least one of them is a write.
    • Definition should make sense: Concurrent reads are generally fine, but problems arise as soon as writes get involved.
    • A schedule is conflict serializable if the order of all of its conflicts is the same as the order of the conflicts in some sequential schedule.
      • By “order of conflicts” we mean the ordering of the steps in each individual conflict.
    • See Lecture 17 slides (PDF) for examples. A schedule can be final-state serializable but not conflict serializable.
  4. Conflict Graphs

    • Nodes are transactions.

    • Edges are directed.

    • There is an edge from T_i to T_j iff:

      • T_i and T_j have a conflict between them.
      • The first step in the conflict occurs in T_i.
    • Example 1:

      T1 T2
        write (x, 20)
      read (x)  
        write (y, 30)
      read (y)  
      write (y, y+10)  

      There are three conflicts:
      T2: write (x, 20); T1: read (x)   T2: write (y, 30); T1: read (y)   T2: write (y, 30); T1: write (y, y+10)

      In each transaction, the first step is in T2. Conflict graph is: T2 -> T1.

    • Example 2:

      T1 T2
      read (x)  
        write (x, 20)
        write (y, 30)
      read (y)  
      write (y, y+10)  

    Now our three conflicts are:
    T1: read (x); T2: write (x, 20)   T2: write (y, 30); T1: read (y)   T2: write (y, 30); T1: write (y, y+10)

    Our conflict graph here is T1 <–> T2.

    (Note: this schedule was final-state serializable but not conflict serializable.)

    • Example 3:

      T1: T2: T3: T4:
      read (x)      
        write (x)    
          read (y)  
            read (y)
      write (y)      
        write (y)    
          write (z)  

      The conflicts here are:
      T1: read (x); T2: write (x)   T3: read (y); T1: write (y)   T3: read (y); T2: write (y)   T4: read (y); T1: write (y)   T4: read (y); T2: write (y)   T1: write (y); T2: write (y)

      The conflict graph is:
       Conflict graph between T1, T2, T3, and T4.

    • Acyclic conflict graph <=> conflict-serializable.

      • Makes sense: conflict graph for any serial schedule is acyclic.
      • But we won’t formally prove this.
  5. Interlude

    • We’re going to explore conflict serializability in more depth because it’s useful in practice.
    • As of right now, we have no methodical way to create conflict serializable schedules.
      • We can check if a schedule is conflict serializable, but if we want a conflict serializable schedule the best we can do right now is keep generating random schedules and testing until we find one with an acyclic conflict graph.
    • We’ll get to this, don’t worry.
  6. Two-Phase Locking (2PL)

    • So how do we generate conflict-serializable schedules?
    • Via two-phase locking:
      • Each shared variable has a lock (fine-grained locking).
      • Before any operation on the variable, the transaction must acquire the corresponding lock.
      • After a transaction releases a lock, it may not acquire any other locks.
    • (Proof that 2PL => conflict serializability is coming.)
    • Note: Fine-grained locking but in a systematic way.
    • Why “two-phase”? We see two phases:
      • Acquire phase, where transactions acquire locks.
      • Release phase, where transactions release locks.
    • Immediate problem: 2PL can result in deadlock:

      T1     T2
      acquire (x_lock)     acquire (y_lock)
      read (x)     read (y)
      acquire (y_lock)     acquire (x_lock)
      read (y)     read (x)
      release (y_lock)     release (x_lock)
      release (x_lock)     release (y_lock)
      • One solution to deadlock: Global ordering on locks. Not very modular.
      • Better solution: Take advantage of atomicity and abort one of the transactions.
        • Seems like we’re punting, but is actually very elegant; given atomicity, aborting is a-okay.
        • Detecting deadlock is possible:
          • Use “wait-dependency” graphs, which capture the locks each transaction has and the ones it wants. Cycle in wait-dependency graph => deadlock.
          • Or just abort a transaction after X seconds (X “reasonably” large). Not as elegant, but simpler.
    • Performance Improvement: Reader-Writer Locks

      • Reader-writer locks.
      • Rules:
        • Can acquire a reader lock at the same time as other readers.
        • Can acquire a writer lock only if there are no other readers or writers.
      • What about fairness? If readers keep acquiring the lock, and a writer is waiting?
        • Typically: If writer is waiting, new readers wait too.
      • Reader-writer locks improve performance.
        • As described, they allow for concurrent reads.
        • We can also release read locks prior to commit
          • Why? Once a transaction T has acquired all its locks (reached its “lock point”) any conflict transaction will run later.
          • If T reaches its lock point and will no longer access X, releasing read locks on X will be fine.
          • Hold write locks until commit, though, in case the transaction aborts.
      • Can also improve performance by relaxing our requirements for serializability/isolation.
        • Read-committed or snapshot isolation (see hands-on).
        • See also PNUTS in recitation next week.
      • Important: There are a ton of tradeoffs between performance and isolation semantics.
    • Another Possible Performance Improvement: Giving up on Conflict Serializability.

      • Sometimes conflict serializability can seem like too strict a requirement.
      • Example:

        T1 T2 T3
        read (x)    
          write (x)  
        write (x)    
            write (x)

        Conflict graph:
        Conflict graph between T1, T2, and T3.

        • Not acyclic => not conflict serializable.
        • But compare it to running T1, then T2, then T3 (serially).
          • Final-state is fine.
          • Intermediate reads are fine.
        • So what’s wrong? Why shouldn’t we allow this schedule?
        • This schedule is view serializable, but not conflict serializable.
          • Informally: A schedule is view serializable if the final written state as well as intermediate reads are the same as in some serial schedule.
          • Formally, for those interested (this will NOT be on any exam):
            • Two schedules S and S’ are view equivalent if:
              • If T_i in S reads an initial value for X, so does T_i in S'.
              • If T_i in S reads the value written by T_j in S for some X, so does T_i in S'.
              • If T_i in S does the final write to X, so does T_i in S'.
            • A schedule is view serializable if it is view equivalent to some serial schedule.
        • Why focus on conflict serializability when it seems too strict? Why not focus on view serializability?
          • View serializability is hard to test for (likely NP-hard). Conflict serializability is not, since checking whether a graph is acyclic is fast.
          • We have an easy way to generate conflict serializable schedules (coming shortly).
            • Conflict serializable schedules are also view serializable, so technically this means we have an easy way to generate view serializable schedules. But we don’t have an easy way to generate view schedules that allows for ones like the example above.
          • Schedules that are view serializable but not conflict serializable involve blind writes: Writes that are ultimately not read. These are not common in practice.
        • Basically: Conflict serializability has practical benefits.
      • Summary

        • Now: Have atomicity and isolation working on a single machine (concurrent transactions, good performance, etc.).
        • Next week: Distributed transactions.

        Proof that 2PL produces a conflict-serializable schedule:

        1. Suppose not. Suppose the conflict graph produced by an execution of 2PL has a cycle, which without loss of generality, is T1 –> T2 –> … –> Tk –> T1.

        2. We’ll show that a locking protocol that produces such a schedule must violate 2PL.

        3. Let the shared variable—the one that causes the conflict—between T_i and T_{i+1} be represented by x_i.

          T1 and T2 conflict on x1
          T2 and T3 conflict on x2

          Tk and T1 conflict on x_k

        4. This means that:

          T1 acquires x1.lock
          T2 acquires x1.lock and x2.lock
          T3 acquires x2.lock and x3.lock

          Tk acquires x_k.lock and x_{k-1}.lock
          T1 acquires x_k.lock

        5. Time flows down in the above step. Since the edges go from T_i to T_{i+1}, T_i must have accessed x_i before T_{i+1}.

        6. For T2 to have acquired its locks—in particular, x1.lock—

          T1 must have previously released x1.lock. Thus:
          T1 acquires x1.lock
          T1 releases x1.lock
          T2 acquires x1.lock and x2.lock
          T3 acquires x2.lock and x3.lock

          Tk acquires x_k.lock and x_{k-1}.lock
          T1 acquires x_k.lock

        7. Focusing just on the steps that involve T1:
          T1 acquires x1.lock
          T1 releases x1.lock
          T1 acquires x_k.lock

        8. T1 violates 2PL; it acquires a lock after releasing a lock

        9. Therefore, cyclic conflict graph => 2PL was violated.
          Alternatively, 2PL => acyclic conflict graph

Read “Log-Structured File Systems (PDF)” by R. & A. Arpaci-Dusseau

The Log-Structured File System departs dramatically from the UNIX File System and proposes, instead, a file system in which all of the data is stored in an append-only log, that is, a flat file that can be modified only by having data added to the end of it. In Chapter 9, we also hear about logs, specifically how they help achieve reliability. For today’s reading, the purpose of the log is to achieve good performance.

The primary goal of a log-structured file system is to minimize seeks by treating the disk as an infinite append-only log. For example, the file system software simply appends new files to the end of the log.

As you read, think about the following:

  • The “infinite log” is actually on a finite disk; how does that constraint affect the designers’ goals?
  • Are there certain file access patterns by applications might make it hard for a log-structured file system to avoid seeks?

Questions for Recitation

Before you come to this recitation, write up (on paper) a brief answer to the following (really—we don’t need more than a couple sentences for each question). 

Your answers to these questions should be in your own words, not direct quotations from the paper.

  • What is one technique that the log-structure filesystem uses to achieve higher performance? (There is more than one technique.)
  • How does the log-structured file system implement this technique?
  • Why does this technique, along with minimizing seeks, lead to good performance?

As always, there are multiple correct answers for each of these questions.

Course Info

As Taught In
Spring 2018
Learning Resource Types
Lecture Notes
Written Assignments
Projects with Examples
Instructor Insights