A self-taught course in database internals

A Field Guide to the Database Engine

Sixteen weeks, one machine. This is the map. Every box below is a layer a query passes through on its way from text to bytes and back. Click any layer to open the lesson that takes it apart, or read the parts in order. The point of this page is the thing that is hardest to get from sixteen separate lectures: how it all fits together.

Week 1 · Architecture: the whole pipeline, end to end (click) SQL query text SELECT name FROM t WHERE id = 42 tokens Parser and analyzer Week 8 · lex, parse, bind, logical plan logical plan Optimizer Week 11 · cost, selectivity, join order physical plan Executor · Volcano operators Weeks 9 to 10 · scan, filter, join, sort, aggregate key lookups Access methods · B+tree and hash Weeks 6 to 7 · find the right pages fast page requests Buffer pool Weeks 4 to 5 · cache pages, decide what to evict read / write page Storage · slotted pages on disk Weeks 2 to 3 · heap files, records, free space Transactions and concurrency Weeks 12 to 13 ACID, isolation, 2PL, deadlock, MVCC decides which interleavings are legal Logging and recovery (WAL) Week 14 · ARIES every change is logged before it hits the page so a crash can always be undone Or a different machine entirely Weeks 15 to 16 · LSM trees, columnar, DuckDB, distributed, CAP same problems, different trade-offs
query processing indexes buffer pool storage transactions recovery modern and distributed
The one story

A SELECT enters as text at the top and walks down: the parser turns it into a tree, the optimizer picks a plan, the executor pulls tuples one at a time through its operators, the access methods find the right pages, the buffer pool serves them from memory or fetches them from disk, and the storage layer hands back raw bytes. An UPDATE adds two more characters to the cast: the transaction manager on the left decides whether your change is allowed to interleave with everyone else's, and the log on the right writes down what you did before the change touches the page, so a crash can never lose a committed write. Hold this picture. Every lesson is a zoom into one box.

I

The whole machine

II

Storage: where the bytes live

III

The buffer pool: memory is the real disk

IV

Indexes: finding the needle

V

Query processing: from text to tuples

VI

Transactions and recovery: never lose a write

VII

Modern and distributed

·

Reference shelf and exam prep

Cheat sheets are built to print. The exam bank is MCQ-only practice across every topic; the viva bank is the anticipated-question set for the four papers.

This is a course, not a pile of files

Each lesson ends with a prompt to ask your teacher. That teacher is the agent that built this. Ask it to go deeper on any box in the map above, redraw a diagram, or quiz you harder before the exam.


Built with the teach and humanizer skills. The research that grounds every lesson, the course spine, and the build methodology are in research/.