@sargonpiraev

Columnar database

#database

Columnar databases are easier to understand if we start with row databases first.

1) How row databases work

Row databases like Postgres store each record as one full unit.

Example table (logical SQL view)

order_iddatecustomercitystatusrevenue
12026-01-01AnnTokyopaid120
22026-01-01BobParispaid80
32026-01-02AnnTokyorefund-120

How row storage keeps it physically

physical row block
(1, 2026-01-01, Ann, Tokyo, paid, 120)
(2, 2026-01-01, Bob, Paris, paid, 80)
(3, 2026-01-02, Ann, Tokyo, refund, -120)

This is a simplified physical picture. In real systems, data is stored in disk pages/blocks with headers and metadata, but row DBs still keep row values together in those pages.

This is great for OLTP: read/update one customer order quickly.

2) Where row storage struggles in analytics

Analytical query:

SELECT SUM(revenue) FROM orders WHERE date >= '2026-01-01'

This query needs only date and revenue.

But in row storage, rows are packed together, so the engine reads row pages that contain all fields (customer, city, status, etc.), not just the 2 needed ones.

So for analytics on many rows, row DB often does:

  • more disk I/O (extra bytes),
  • more CPU parsing/decoding (extra fields),
  • more memory/cache pressure (data you do not need).

Simple size intuition:

  • 10M rows
  • average full row = 200B
  • needed fields (date + revenue) = 16B
  • row-style scan volume: 10M * 200B = ~2GB
  • ideal 2-column scan volume: 10M * 16B = ~160MB

3) How columnar databases solve this

Columnar databases (ClickHouse, BigQuery, DuckDB, Parquet) store values by columns.

How the same data looks in column storage

columnvalues
order_id1, 2, 3
date2026-01-01, 2026-01-01, 2026-01-02
customerAnn, Bob, Ann
cityTokyo, Paris, Tokyo
statuspaid, paid, refund
revenue120, 80, -120

Now the query above reads mostly 2 columns (date, revenue) instead of full rows.

That gives:

  • much less data read,
  • better compression (similar values in one column),
  • faster vectorized processing.

4) How WHERE works in columnar (simple flow)

Example:

SELECT SUM(revenue) FROM orders WHERE date >= now() - interval '6 months'

Typical execution:

  1. table is split into parts/blocks,
  2. each block keeps lightweight stats (for example min/max for date),
  3. engine skips blocks where max(date) is older than 6 months,
  4. for remaining blocks it reads date and builds a filter mask,
  5. it reads revenue for matching rows and computes SUM.

So WHERE in columnar often saves time twice:

  • by skipping whole blocks,
  • by reading only needed columns inside remaining blocks.

FAQ (common confusion)

"Is the main win easier aggregations?"

Not really. SUM, COUNT, GROUP BY are conceptually similar.
The big win is reading fewer bytes and processing them more efficiently.

"Why can't row DB just use indexes and avoid full rows?"

Indexes help find rows fast, but analytics often touches many rows.
Then the engine still does a lot of reads from table pages.

Sometimes a covering/index-only strategy helps, but:

  • not for every query,
  • index size grows quickly,
  • writes become more expensive.

"Are B-trees where table data is stored?"

B-trees are usually index structures.
They help locate data, but they do not make a row table columnar.

"Are min/max block stats basically indexes?"

Close idea, but usually not a classic B-tree index.
Think of them as data-skipping metadata maps attached to blocks/parts.

Their job is simple: quickly answer "can this block contain rows for this filter?"
If answer is "no", the block is not read.

"Are columnar tables sorted by default?"

Depends on engine and table design.
Many column stores use a sort key / clustering key (for example by date, tenant_id) to make skipping more effective.
Better ordering -> tighter min/max ranges -> more blocks skipped on WHERE.

"So is Postgres storage basically like CSV rows in files?"

Useful intuition, but not exact.
CSV is plain text line by line. Postgres stores binary tuples inside fixed-size pages, plus metadata and MVCC info.
But the high-level idea is similar: row-oriented storage keeps fields of one record together.

In practice: row DB for transactions, columnar DB for analytics. Teams often use both.