B+tree operations cheat sheet
Order m. Search, insert with splits, delete with merge or redistribute, bulk loading, and the rules that decide each. Everything is keyed to the half-full invariant.
Invariants (order m)
| Invariant | Statement |
|---|---|
| Data in leaves | All record entries live in leaves. Internal nodes hold only separator keys and child pointers; they route, never store payload. This is the line between a B+tree and a plain B-tree. |
| Linked leaves | Leaves form a (doubly) linked list in key order, so a range scan walks leaf to leaf without going back up. PostgreSQL keeps a doubly linked leaf chain. |
| Height balanced | Every root-to-leaf path has the same length. Balance is structural, not a periodic rebuild. Tree grows or shrinks only at the root. |
| Half full (non-root) | Every non-root node holds between ceil(m/2) - 1 and m - 1 keys (internal: between ceil(m/2) and m children). |
| Root exemption | The root may hold as few as 1 key (2 children), or 0 keys when the tree is a single leaf. The half-full rule does not apply to the root. |
| Separator order | Between adjacent separators, a search key v follows Ki < v <= Ki+1 (the exact nbtree inequality). |
Forcing at least 50 percent occupancy on non-root nodes bounds the height at about log_(m/2) N, so the logarithmic I/O cost holds in the worst case, not just on average. Without it the tree could degenerate.
Search
- Start at the root. Binary-search the separators to pick the child whose range contains v, using Ki < v <= Ki+1.
- Descend one page read per level until a leaf, then binary-search the leaf.
- Range scan: descend once to the lower bound, then follow leaf links rightward to the upper bound.
Cost: one page read per level, O(log_m N) I/Os. Upper levels usually stay cached in the buffer pool, so marginal cost is closer to one or two leaf reads.
Insert (split rule)
- Descend to the target leaf as in search.
- If the leaf has room, insert in sorted position. Done, no structural change.
- If the leaf is full, split: allocate a new leaf, move the upper half of entries into it, link it into the leaf chain.
- Leaf split = copy up. The first key of the new right leaf is copied up as the parent separator; the key still exists in the leaf, because all data stays in leaves.
- Internal split = push up. If inserting the separator overflows an internal node, split it and move (push) the middle key up to the grandparent; internal nodes only route, so no copy is kept.
- Splits propagate upward only while overflow continues. If the root splits, allocate a new root over the two halves. This is the only way height grows, and it grows from the top, keeping all leaves at equal depth.
Delete (underflow)
- Descend to the leaf and remove the entry.
- If the node still has >= ceil(m/2) - 1 keys, done.
- If it underflows (< ceil(m/2) - 1 keys), restore the invariant using an adjacent sibling.
- Redistribute (borrow / rotate): if a sibling has more than the minimum, move one entry across and update the parent separator. Tree shape unchanged.
- Merge (coalesce): if no sibling can spare an entry, combine the node with a sibling and remove the now-unused separator from the parent.
- A merge removes a parent key, which can underflow the parent, so merge or redistribute propagates upward like splits. If the root ends with one child, that child becomes the new root and the tree shrinks in height.
| Condition after delete | Action | Effect on tree |
|---|---|---|
| Node has >= ceil(m/2) - 1 keys | Nothing | No change |
| Underflow, adjacent sibling has > minimum keys | Redistribute / borrow one entry, update parent separator | Shape unchanged, no propagation |
| Underflow, no sibling above minimum | Merge with sibling, drop separator from parent | May cascade upward; root may collapse and shrink height |
Leaf split copies up, internal split pushes up; do not swap them. The root is exempt from half-full, so "every node is at least half full" is false. Height changes only on a root split (taller) or a root collapse after merge (shorter), never on an ordinary insert.
Leaf split, before and after
Bulk loading (bottom up)
One-at-a-time insertion of N keys costs N descents and many random splits. If keys are sorted (or can be sorted first), build the tree bottom up instead.
- Sort the entries.
- Pack them into leaves left to right up to a target fill (the fillfactor, leaving slack for later inserts).
- Link the leaves into the chain.
- Build the parent level from the first key of each leaf; repeat upward until one root remains.
Result: sequential I/O, controlled page density, and a denser, shallower tree than repeated insertion. PostgreSQL B-tree fillfactor defaults to 90 (range 10 to 100).
Complexity
| Operation | I/O cost | Note |
|---|---|---|
| Point search (=) | O(log_m N) | One read per level; upper levels usually cached |
| Range scan (k matches) | O(log_m N + k/leaf) | One descent, then sequential leaf links |
| Insert | O(log_m N) | Dominated by descent; splits rare amortized (halves stay half full) |
| Delete | O(log_m N) | Dominated by descent; merge or redistribute may cascade up |
| Bulk load N sorted keys | O(N) sequential | Plus the sort if not already ordered |
| Height | about log_m N | Fanout 100: 3 levels reach ~1M leaf entries, 4 levels ~100M |
The log base is the fanout m, not 2. Wide nodes make a single page read a high-fanout decision, so the tree is 3 to 4 levels deep for hundreds of millions of rows. "O(log2 N) lookups" is the trap.
B-tree vs B+tree vs hash
| Property | Plain B-tree | B+tree | Hash index |
|---|---|---|---|
| Where data / pointers live | Internal nodes and leaves | Leaves only; internals route | Buckets keyed by hash |
| Point lookup (=) | O(log_m N) I/O | O(log_m N) I/O | expected O(1) I/O |
| Range / inequality (<, >, BETWEEN) | possible but no linked leaves | yes, ordered linked leaves | no |
| ORDER BY without a sort | in-order traversal, not leaf-linked | yes, walk the leaf chain | no |
| Prefix of composite key | yes | yes (leftmost prefix) | no, needs the full key |
| Worst case under skew / growth | bounded by height | bounded by height | long overflow chains |
PostgreSQL nbtree is a B+tree on the Lehman-Yao design (right-links and high keys for lock-light search), with lazy two-stage deletion rather than eager textbook merge on every underflow. SQLite stores tables and indexes as b-trees with data only in leaves. PostgreSQL hash indexes are single column, equality only, no uniqueness, and crash-safe since v10 [PG hash docs].