The within-page layout that turns a fixed-size block of bytes into addressable,
variable-length rows: header, line-pointer array growing down, tuple data growing up, free space closing in the middle.
Figure 1. The two ends close on a shrinking free space. The slot array (offsets 0 upward) grows down as
pd_lower rises; tuple data grows up as pd_upper falls. The page is full when they meet. Each slot
indirects to a tuple offset (dashed arrow), so a tuple can move during compaction while its record id stays fixed.
Keep this one thing
A record id is (page id, slot number), never a raw byte offset. The slot indirects to the in-page offset, so the
engine can compact and move tuples while the id an index holds never changes.
Slotted page header (generic)
The minimum a slotted page header must record to drive insert, lookup, and the full-page check.
Field
Purpose
slot count
number of slots in the line-pointer array; index 0 to count-1.
free-space pointer(s)
boundary between used and free space; in PostgreSQL the pair pd_lower / pd_upper.
free-space start
offset where the array ends and free space begins (data side grows from the page end inward).
page lsn
WAL position of the last change; page is not flushed until WAL up to this lsn is durable.
flags / checksum
page state bits and an optional integrity checksum.
special space ptr
start of access-method-private trailer; empty for an ordinary heap page.
PostgreSQL PageHeaderData (24 B)
Header at offset 0 of every 8 KB heap page.
Field
Size
Meaning
pd_lsn
8 B
WAL lsn of last change (write-ahead rule).
pd_checksum
2 B
page checksum.
pd_flags
2 B
page state flags.
pd_lower
2 B
offset to start of free space (end of ItemId array). Grows up.
pd_upper
2 B
offset to end of free space (start of tuples). Grows down.
pd_special
2 B
offset to special space.
pd_pagesize_version
2 B
page size and layout version.
pd_prune_xid
4 B
oldest unpruned xmax, or 0.
ItemId / line pointer (4 B)
ItemIdData bitfields, storage/itemid.h. One per slot.
Field
Bits
Meaning
lp_off
15
offset to tuple from page start. 2^15 caps a page at 32 KB.
lp_flags
2
line-pointer state (below).
lp_len
15
byte length of the tuple.
lp_flags states.
Value
State
Note
0
LP_UNUSED
free slot, lp_len=0.
1
LP_NORMAL
live, lp_len>0.
2
LP_REDIRECT
HOT redirect to newer version on same page.
3
LP_DEAD
dead, reclaimable by VACUUM.
PostgreSQL HeapTupleHeaderData
Per-row header, access/htup_details.h. Documented minimum 23 B before user data; t_hoff rounds it to MAXALIGN.
Field
Size
Meaning
t_xmin
4 B
inserting transaction id.
t_xmax
4 B
deleting transaction id; 0 if live. The tombstone marker.
t_cid / t_xvac
4 B
command id or vacuum xid (overlaid).
t_ctid
6 B
tid of this or a newer version; the update-chain pointer.
t_infomask2
2 B
attribute count plus flag bits.
t_infomask
2 B
flag bits, including HEAP_HASNULL.
t_hoff
1 B
offset to user data; multiple of MAXALIGN (typically 8).
null bitmap
var
present only if HEAP_HASNULL; 1 bit per column.
Variable-length records and the NULL bitmap
Why these two mechanisms exist
Fixed-length attributes sit at a computable offset. A varchar/text/bytea breaks that: you cannot find
column N+1 without knowing the length of column N. So variable-length data carries a length prefix (or an in-tuple offset array). NULL gets a
bitmap, not a sentinel, because any sentinel byte could be a legal value.
Concern
Encoding
Fixed-length column
raw bytes at a precomputed offset; padded to its natural alignment.
Variable-length column
length prefix then bytes; reader advances by the stored length to reach the next column.
NULL bitmap
after the fixed header, only if HEAP_HASNULL. One bit per column: 1 = present, 0 = null. A null column stores zero data bytes.
Alignment (MAXALIGN)
fields padded to natural boundaries; user data starts at t_hoff. Column order changes row size because of pad gaps (put wide types before narrow ones).
Oversized value (TOAST)
a tuple cannot span pages; values past ~2 KB are compressed and/or moved out of line, leaving an 18 B pointer in the tuple. Up to 1 GB per field.
Exam traps
The slot array and tuple data grow in opposite directions. A NULL is not a zero or sentinel in the
column bytes; it is a 0 bit in the header bitmap and occupies no data area. A tid is not an arithmetic disk address; it is
(page, slot) and the slot can move on compaction.
Quick rules: insert, delete, update within a page
What each operation does to the slot array, the data area, and the free-space pointers.
Op
Steps and invariant
Insert
Check fit: free space = pd_upper - pd_lower must hold tuple_len + slot_size; else need a new page (see the FSM).
Write tuple at pd_upper - tuple_len (data grows down toward the array); lower pd_upper. Allocate a slot at the front, set
lp_off/lp_len, raise pd_lower by 4 B. O(1). A free LP_UNUSED slot may be reused instead of extending the array.
Delete
Do not physically erase under MVCC. Stamp t_xmax with the deleting xid (tombstone). The line pointer becomes LP_DEAD on
pruning; VACUUM later reclaims the slot and data. Old readers still see the row until the horizon passes. Cheap and non-blocking, at the cost of bloat.
Update
Insert the new version, set the old tuple's t_ctid to point at it (update chain). If the new version fits on the same page and no
indexed column changed, use HOT: keep an LP_REDIRECT line pointer so indexes need not be touched. Otherwise the new version may land on
another page and indexes get a new entry.
Compact
Slide live tuples up to close gaps left by deletes, rewriting each slot's lp_off. Record ids stay stable because indexes hold the slot
number, not the offset. Raises usable free space; done lazily, not on every delete.
In real systems
PostgreSQL fixes the page at 8 KB and uses the pd_lower / pd_upper pair as its free-space pointers
[PG: Database Page Layout]. SQLite uses the same slotted idea inside each
B-tree page (cell pointer array growing down, cell content growing up) and reclaims intra-page gaps with chained freeblocks rather than a VACUUM process
[SQLite file format].