Concurrency: strict 2PL, deadlock, and MVCC
Last week told you which interleavings are wrong. This week is the machinery that forbids them while still running transactions at the same time: locks that block, deadlocks that form when two transactions wait on each other, and the multi-version trick that lets readers never wait at all.
By the end you can
- State why strict and rigorous 2PL hold locks to commit, and what each one buys beyond plain 2PL.
- Read a lock table and a waits-for graph, and tell deadlock detection apart from wait-die and wound-wait prevention.
- Explain how MVCC lets a reader and a writer touch the same row at the same time without either blocking.
- Walk a PostgreSQL version chain by its xmin and xmax stamps and say which version a snapshot sees.
- Answer the "which is false" traps about 2PL, wait-die, MVCC, and update-in-place.
We are on the second journey through the engine, the write path. Last week's statement is back:
BEGIN;
UPDATE account SET balance = balance - 100 WHERE id = 42;
COMMIT;
Week 12 handed this lesson a target and a vocabulary:
the schedule must be conflict serializable (its precedence graph must stay acyclic), and the result
must be at least recoverable. That week named the anomalies (dirty read, lost update, write skew) and
the theory (the precedence graph, 2PL, the lock point). It did not build the mechanism. This week is
the mechanism. The row for id = 42 is located exactly as on the read path (optimizer to
access method to buffer pool to storage, weeks 4 through 11), but now concurrency control governs the
write. Under strict two-phase locking the transaction takes an exclusive lock on that row and holds it
to commit. Under MVCC the UPDATE does not overwrite anything: it writes a new tuple version, stamps the
old version's delete marker, and links the chain, so a reader on an older snapshot still sees the old
balance and never blocks.
What this layer hands down to week 14: a page in the buffer pool that is about to be modified, plus a guarantee about who is allowed to have touched it. Recovery then logs the change before the page can reach disk. The link between the two is direct. The buffer pool's freedom to evict an uncommitted page (steal) is exactly why the next lesson must log undo information, and the abort path that 2PL deadlock resolution triggers is the same undo path ARIES formalizes. Keep that thread in mind.
There are two ways to stop two people editing the same document and corrupting it. The first is a physical key: whoever holds it edits, everyone else waits at the door. That is locking. It is correct but the line at the door is the whole cost, and two people can each be holding a key the other needs, so nobody moves. The second way is to give every reader a photocopy frozen at the moment they walked in. The writer scribbles on a fresh sheet and files it; readers keep reading their copy, oblivious. That is MVCC. Nobody waits to read. The cost moved somewhere else: you now have a pile of old copies to shred later.
Holding locks to commit: strict and rigorous 2PL
Plain two-phase locking is enough to guarantee conflict serializability: acquire locks in a growing phase, release in a shrinking phase, and never acquire after you release. Week 12 proved that ordering transactions by their lock points gives a valid topological order of the precedence graph, so the graph is acyclic. But plain 2PL has a hole. The shrinking phase can begin before commit, so a transaction can release a write lock, then a second transaction reads that just-unlocked value, and then the first transaction aborts. The second transaction read a value that never officially existed. That is a dirty read, and if the second transaction already committed, the schedule is not even recoverable.
The constraint is that another transaction must never read or overwrite your write until you have decided whether it survives. The naive fix is to release the write lock as soon as you finish writing. That breaks, because "finished writing" is not "committed"; you might still abort. The real fix is blunt: hold every exclusive lock until commit or abort. That is strict 2PL. Now no other transaction can see your write until it is final, so dirty reads and the cascading aborts they cause are impossible. The schedule it produces is strict in the recoverability sense, which is why the undo rule "restore the value that existed before my write" is always correct.
| Variant | What it holds to commit | What it guarantees |
|---|---|---|
| Plain 2PL | nothing past the lock point | conflict serializability only |
| Strict 2PL | all exclusive (write) locks | also recoverable and cascadeless |
| Rigorous 2PL | all locks, shared and exclusive | also serialization order equals commit order |
The difference between strict and rigorous is just the shared locks. Strict 2PL releases read locks in the shrinking phase but pins write locks to commit. Rigorous 2PL (also called strong strict 2PL) holds reads too, which collapses the shrinking phase to a single instant at commit and makes the commit order the serialization order. That last property is what makes rigorous 2PL the easy thing to reason about, so most textbooks quietly mean rigorous when they say "the locking implementation" [CMU 15-445 notes].
"2PL guarantees serializability, so it also avoids cascading aborts." False. Plain 2PL gives conflict serializability but can release write locks before commit, which permits dirty reads and therefore cascading aborts. You need strict 2PL (hold write locks to commit) for cascadelessness. Serializability and cascadelessness are different axes, and plain 2PL only buys the first.
The lock manager: who is granted, who waits
The lock manager is an in-memory hash table keyed by the locked object identifier (a table OID, a page, a tuple id, or a hashed key). Each entry holds a granted group (the set of transaction-and-mode pairs currently holding the lock) and a wait queue of pending requests in arrival order. A request is granted if its mode is compatible with every mode in the granted group and with everything ahead of it in the queue; otherwise the requester blocks. The entries themselves are guarded by latches, not locks. A latch is a short mutex over shared memory; a lock is a logical, transaction-duration claim that the system tracks for deadlock. Confusing the two is a standard exam point: the lock table is protected by latches because it is mutable shared state, but the rows it tracks are protected by locks.
Multi-granularity locking and intention locks
Locking every tuple is fine until a transaction touches a whole table, at which point it pays for millions of tiny lock entries. Locking the whole table is fine until a second transaction wants one row, at which point concurrency collapses. Multi-granularity locking lets each transaction lock at the level that fits, over a hierarchy of database, table, page, tuple. The problem it creates: if T1 holds an X lock on one row and T2 wants an S lock on the whole table, how does the lock manager notice the conflict without scanning every row? Intention locks solve this. Before locking a descendant, a transaction announces intent at every ancestor.
- IS (intention shared): I will take S locks somewhere below this node.
- IX (intention exclusive): I will take X (or finer IX) locks below this node.
- SIX (shared and intention exclusive): I am reading this whole subtree (S) and will modify some descendants (IX).
Now two transactions can both hold IX on the same table (both intend to modify, just different rows) with no conflict, while a transaction wanting S on the whole table conflicts with anyone holding IX, because IX announces "someone is writing below here." The load-bearing fact is the compatibility matrix: IS conflicts only with X; IX conflicts with S, SIX, and X. Locks are acquired root to leaf and released leaf to root.
Deadlock: detect it, or prevent it
The moment you hold locks to commit, you can deadlock. T1 holds A and waits for B; T2 holds B and waits for A. Neither can proceed and neither will release. Lock-based systems break out of this in one of two philosophies, and the difference is a favorite exam axis.
Detection is optimistic. Let transactions wait freely, and periodically build a waits-for graph: a node per active transaction, a directed edge from a blocked transaction to the one holding the lock it needs. A deadlock is exactly a cycle in this graph. A depth-first search finds cycles in O(V+E); on a cycle, the system picks a victim and aborts it, releasing its locks so the rest proceed. Victim selection weighs how long each transaction has run, how much work it would waste, and how many cycles aborting it breaks. PostgreSQL detects, it does not prevent [PG explicit locking docs].
Prevention is pessimistic: never let a cycle form. Stamp each transaction with a start timestamp where older means smaller and higher priority. When transaction T requests a lock held by H, decide by age:
| Scheme | T older than holder H | T younger than holder H | Why no cycle |
|---|---|---|---|
| Wait-die (non-preemptive) | T waits | T dies (aborts, restarts later) | waits only point old to young |
| Wound-wait (preemptive) | T wounds H (H aborts), T takes lock | T waits | waits only point young to old |
"In wait-die the older transaction aborts." False. In wait-die the older transaction waits and the younger one dies. In wound-wait the older transaction wounds (aborts) the younger. In both schemes the older transaction never aborts in its branch, which is the whole point: a transaction that keeps dying eventually becomes the oldest in the system and is finally allowed to wait, so it makes progress. Also false: "wait-die and wound-wait are detection schemes." They are prevention. No graph is built and no cycle ever forms.
The cheapest scheme builds no graph and tracks no timestamps: if a lock wait exceeds a timeout, assume deadlock and abort the waiter. It needs no bookkeeping, but it produces false positives (it kills transactions that were merely slow), and the timeout is a guess.
PostgreSQL chooses detection. The docs are blunt: "PostgreSQL automatically detects deadlock situations and resolves them by aborting one of the transactions involved," and a transaction "seeking either a table-level or row-level lock will wait indefinitely for conflicting locks to be released" so long as no deadlock is detected [PG docs]. It also will not tell you in advance which transaction becomes the victim. Prevention schemes like wait-die show up more in distributed systems, where building a global waits-for graph across nodes is expensive.
MVCC: let readers read the past
Locking serializes correctly, but its cost is contention. A reader holds S, blocks a writer's X; a writer holds X, blocks a reader's S. For a workload that is mostly reads with the occasional write, that blocking is pure overhead. Multi-version concurrency control attacks it directly. The defining property, in PostgreSQL's own words: "locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading" [PG MVCC intro]. CMU calls MVCC "the most widely used scheme," now in almost every DBMS built in the last decade.
The constraint that forces blocking under locking is that there is one copy of the row, so a reader and a writer fight over it. Remove the constraint: keep more than one copy. Every write creates a new physical version of the row instead of overwriting the old one; the versions of one logical row form a chain. Each version carries the id of the transaction that created it and a marker for when it stopped being current. A transaction reads against a snapshot, a logical point in time, and sees the latest version that was committed as of that snapshot. The old version is still sitting there, so a reader on an older snapshot reads it without waiting for the writer to finish. Reads became lock-free. The price is that you now store old versions and must reclaim them later.
The four design decisions
The Wu et al. survey decomposes every MVCC system into four choices [Wu et al., PVLDB 2017]. The point of naming them is that "MVCC" is not one design but a family, and PostgreSQL is one specific corner of it.
| Decision | The options | PostgreSQL |
|---|---|---|
| Protocol (orders writers) | timestamp ordering, optimistic, or 2PL | 2PL-style row locks, plus SSI at Serializable |
| Version storage | append-only, time-travel, delta | append-only, versions inline in the heap |
| Garbage collection | tuple-level or transaction-level | VACUUM, gated by the oldest live snapshot |
| Index pointers | logical or physical | physical (index points at a heap tuple) |
"MVCC means no locks at all." False. MVCC removes read-write blocking, but writers still have to be serialized against each other. Something orders conflicting writes (row locks, timestamp ordering, or optimistic validation), and in PostgreSQL that is a full lock manager that can still deadlock on write-write conflicts. MVCC is orthogonal to the protocol that orders writers, not a replacement for it. The first decision in the table is exactly that protocol.
PostgreSQL: append-only versions on the heap
PostgreSQL stores versions inline in the heap. Every heap tuple has a header whose visibility fields
are t_xmin (the insert XID that created this version), t_xmax (the XID that
deleted or superseded it, zero while still live), and t_ctid (a tuple id pointing forward to
the next version, used to walk the chain)
[PG page layout, Table 66.4].
The decisive fact: an UPDATE is not an in-place edit. It writes a brand-new tuple version and sets the
old version's t_xmax to the updating transaction's XID. A DELETE just sets t_xmax.
The old version stays visible to any snapshot that began before the change committed.
# the UPDATE from the top, as PostgreSQL actually does it
# row 42 starts as one version:
v0: (xmin=100, xmax=0) balance=500 # inserted by txn 100, still live
# txn 205 runs UPDATE ... balance = balance - 100:
v0: (xmin=100, xmax=205) balance=500 # old version, now marked deleted by 205
v1: (xmin=205, xmax=0) balance=400 # new version, created by 205, ctid links v0 -> v1
# a reader on a snapshot taken before 205 committed still sees v0 (500),
# never blocks, and never sees the in-flight 400.
The visibility rule, simplified: a version is visible to a snapshot when its xmin
committed and is in the snapshot's past, and either its xmax is unset, or that
xmax did not commit, or it is in the snapshot's future. A snapshot is just the set of
in-progress XIDs at its start plus the horizon, checked against the commit log. The same
xmax field also encodes row locks: SELECT FOR UPDATE records a locker there
without creating a new version, which is how MVCC and explicit locking share one mechanism.
The new cost: dead versions and VACUUM
You did not get the lock-free reads for nothing. Every UPDATE leaves a dead version behind, every DELETE leaves a tombstone, and they pile up. PostgreSQL must reclaim them, which is what VACUUM does. The routine-vacuuming docs state the cause plainly: "In PostgreSQL, an UPDATE or DELETE of a row does not immediately remove the old version of the row. This approach is necessary to gain the benefits of multiversion concurrency control" [PG routine vacuuming]. A version can be reclaimed only when no live transaction's snapshot could still need it, which is why a single long-running reader holding an old snapshot stalls garbage collection across the whole table. That is the heart of the MVCC trade: a long reader does not block a writer, it just holds back cleanup, turning a contention problem into a space problem.
"Under MVCC, a long-enough read can block a writer." False, and inverting it is the whole point. Under MVCC reading never blocks writing and writing never blocks reading. A long reader instead keeps an old snapshot alive, which prevents VACUUM from reclaiming versions that reader might still see. That is bloat, a space cost, not a blocking cost. Also false: "an UPDATE modifies the row in place." It writes a new version and stamps the old one's xmax, which is exactly why heavy-update tables bloat and need VACUUM.
Locking pays for correctness in waiting and deadlocks; MVCC pays for it in storage and garbage collection. The reader-writer block is gone, but writers still serialize against each other, and the old versions still have to be swept up.
SQLite: the opposite corner
SQLite reaches the same correctness by the opposite route. In its default rollback-journal mode it uses five whole-file lock states (UNLOCKED, SHARED, RESERVED, PENDING, EXCLUSIVE) and the model is many readers or one writer, never both. WAL mode loosens this: the writer appends to a separate log and readers read a consistent end mark, so "WAL provides more concurrency as readers do not block writers and a writer does not block readers" [SQLite WAL]. That end mark is a snapshot, which makes WAL mode a lightweight MVCC. But the single-writer rule never lifts: "since there is only one WAL file, there can only be one writer at a time." So SQLite-WAL gives readers-do-not-block-writers, yet never two concurrent writers, the opposite end of the spectrum from PostgreSQL's full MVCC with many writers serialized by row locks.
"SQLite in WAL mode supports multiple concurrent writers." False. WAL lets concurrent readers run alongside one writer; it never allows two simultaneous writers. The single-writer rule is exactly how SQLite stays serializable without a version visibility machine as elaborate as PostgreSQL's.
| Property | Strict 2PL (locking) | MVCC (PostgreSQL) |
|---|---|---|
| Reader vs writer on one row | they block each other | neither blocks; reader sees old version |
| Writer vs writer | serialized by X locks | still serialized (row locks / SSI) |
| Failure mode under contention | waiting, then deadlock | version bloat, slower GC |
| Extra storage | none beyond the lock table | old versions until VACUUM |
| Deadlock handling | detection or prevention needed | still possible on writes; PG detects |
Deeper: snapshot isolation is not serializable, and how SSI fixes it
Plain MVCC at the Repeatable Read level gives snapshot isolation: every transaction reads a frozen snapshot, and write-write conflicts on the same item are caught by First-Committer-Wins. That blocks dirty reads, non-repeatable reads, lost updates, and even simple phantoms. It does not block write skew, because write skew is two transactions writing different items after reading an overlapping set, so there is no write-write conflict to catch even though the precedence graph has a cycle (week 12). PostgreSQL's Serializable level adds Serializable Snapshot Isolation: it runs snapshot isolation but additionally tracks read-write antidependencies and aborts a transaction when it detects a dangerous structure (a transaction with both an incoming and an outgoing rw-edge), which is a necessary condition for a serialization cycle. The SIRead locks it uses to record reads never block and never deadlock; they only leave a trace so a later conflicting write can be caught [PG transaction isolation]. This is how MVCC climbs from snapshot isolation up to true serializability without going back to blocking reads.
Active recall
Check yourself
Which statement about two-phase locking is false?
The false claim is the third. Plain 2PL can release a write lock during the shrinking phase before commit, letting another transaction read uncommitted data, which is a dirty read and a cascading-abort risk. Serializability and cascadelessness are separate guarantees, and you need strict 2PL (option two) for the second.
Under the wait-die prevention scheme, which is correct?
Option two is correct: in wait-die the older waits, the younger dies. Option one inverts the rule. Option three describes detection, not prevention; wait-die builds no graph and forms no cycle. Option four describes wound-wait's young-waits branch, not wait-die, where a younger requester dies.
Which statement about MVCC is false?
The fourth is false. MVCC removes read-write blocking, not all locks. Writers are still ordered by a protocol (PostgreSQL uses row locks plus SSI), and write-write conflicts can still deadlock, which PostgreSQL resolves by detection. The other three are exactly the MVCC trade: lock-free reads paid for in stored versions and deferred cleanup.
A PostgreSQL UPDATE of one row does what to the heap?
The first is correct: append-only MVCC means UPDATE is delete-plus-insert. The old version's xmax is set to the updater, a new version is written with that XID as its xmin, and ctid links them. Nothing is overwritten in place (that rules out the second and fourth), and the old version is not freed immediately (the third); it waits for VACUUM because older snapshots may still need it.
Which statement about deadlock handling is false?
The third is false on both counts. PostgreSQL detects deadlocks and aborts a victim after a cycle forms; it does not prevent them, and it does not let you predict the victim. The other three are accurate: detection is the waits-for-graph cycle search, wait-die and wound-wait are timestamp-based prevention, and timeouts trade bookkeeping for false positives.
Which statement about intention locks in multi-granularity locking is false?
The second is false. IS announces only intent to read below, so it is compatible with IX (intent to write below); two transactions can hold IS and IX on the same table at once. IS conflicts only with a full X on the node. The others are correct: shared IX, the root-to-leaf acquisition rule, and the meaning of SIX.
It states the defining property in one sentence: reading never blocks writing and writing never blocks reading. Read it alongside the routine-vacuuming page to see the cost side, and the explicit-locking page for how PostgreSQL still uses a lock manager and detects deadlocks under MVCC.
Want me to run the UPDATE from the top under strict 2PL and then under MVCC side by side, or to walk a three-transaction deadlock and have you pick the victim? Ask me to draw the version chain for a DELETE, or to show exactly why write skew slips past plain snapshot isolation. I am your teacher for this course, not just a document.