Columnar database

Oct 1, 2025#database

Columnar databases are easier to understand if we start with row databases first.

1) How row databases work

Row databases like Postgres store each record as one full unit.

Example table (logical SQL view)

order_id	date	customer	city	status	revenue
1	2026-01-01	Ann	Tokyo	paid	120
2	2026-01-01	Bob	Paris	paid	80
3	2026-01-02	Ann	Tokyo	refund	-120

How row storage keeps it physically

physical row block
(1, 2026-01-01, Ann, Tokyo, paid, 120)
(2, 2026-01-01, Bob, Paris, paid, 80)
(3, 2026-01-02, Ann, Tokyo, refund, -120)

This is a simplified physical picture. In real systems, data is stored in disk pages/blocks with headers and metadata, but row DBs still keep row values together in those pages.

This is great for OLTP: read/update one customer order quickly.

2) Where row storage struggles in analytics

Analytical query:

SELECT SUM(revenue) FROM orders WHERE date >= '2026-01-01'

This query needs only date and revenue.

But in row storage, rows are packed together, so the engine reads row pages that contain all fields (customer, city, status, etc.), not just the 2 needed ones.

So for analytics on many rows, row DB often does:

more disk I/O (extra bytes),
more CPU parsing/decoding (extra fields),
more memory/cache pressure (data you do not need).

Simple size intuition:

10M rows
average full row = 200B
needed fields (date + revenue) = 16B
row-style scan volume: 10M * 200B = ~2GB
ideal 2-column scan volume: 10M * 16B = ~160MB

3) How columnar databases solve this

Columnar databases (ClickHouse, BigQuery, DuckDB, Parquet) store values by columns.

How the same data looks in column storage

column	values
order_id	1, 2, 3
date	2026-01-01, 2026-01-01, 2026-01-02
customer	Ann, Bob, Ann
city	Tokyo, Paris, Tokyo
status	paid, paid, refund
revenue	120, 80, -120

Now the query above reads mostly 2 columns (date, revenue) instead of full rows.

That gives:

much less data read,
better compression (similar values in one column),
faster vectorized processing.

4) How `WHERE` works in columnar (simple flow)

Example:

SELECT SUM(revenue) FROM orders WHERE date >= now() - interval '6 months'

Typical execution:

table is split into parts/blocks,
each block keeps lightweight stats (for example min/max for date),
engine skips blocks where max(date) is older than 6 months,
for remaining blocks it reads date and builds a filter mask,
it reads revenue for matching rows and computes SUM.

So WHERE in columnar often saves time twice:

by skipping whole blocks,
by reading only needed columns inside remaining blocks.

FAQ (common confusion)

"Is the main win easier aggregations?"

Not really. SUM, COUNT, GROUP BY are conceptually similar.
The big win is reading fewer bytes and processing them more efficiently.

"Why can't row DB just use indexes and avoid full rows?"

Indexes help find rows fast, but analytics often touches many rows.
Then the engine still does a lot of reads from table pages.

Sometimes a covering/index-only strategy helps, but:

not for every query,
index size grows quickly,
writes become more expensive.

"Are B-trees where table data is stored?"

B-trees are usually index structures.
They help locate data, but they do not make a row table columnar.

"Are min/max block stats basically indexes?"

Close idea, but usually not a classic B-tree index.
Think of them as data-skipping metadata maps attached to blocks/parts.

Their job is simple: quickly answer "can this block contain rows for this filter?"
If answer is "no", the block is not read.

"Are columnar tables sorted by default?"

Depends on engine and table design.
Many column stores use a sort key / clustering key (for example by date, tenant_id) to make skipping more effective.
Better ordering -> tighter min/max ranges -> more blocks skipped on WHERE.

"So is Postgres storage basically like CSV rows in files?"

Useful intuition, but not exact.
CSV is plain text line by line. Postgres stores binary tuples inside fixed-size pages, plus metadata and MVCC info.
But the high-level idea is similar: row-oriented storage keeps fields of one record together.

In practice: row DB for transactions, columnar DB for analytics. Teams often use both.