Database
A database is an organized system for storing, retrieving, and changing data. The important part is not only that it stores data, but that it gives the application a model for querying, consistency, durability, and coordination.
Different databases optimize for different access patterns. A relational database, a search engine, a key-value store, and a columnar analytical database all store data, but they make different tradeoffs around schema, latency, transactions, indexing, and scale.
Types of databases
Each type page is the hub for that class of database: what it is good for, which theory matters most there, and which concrete systems belong to it.
- Relational database — tables, SQL, joins, constraints, transactions, and systems like Postgres.
- Document database — JSON-like documents grouped into collections.
- Key-value database — direct lookup by key, caching, sessions, and systems like Redis.
- Columnar database — analytical storage optimized for scanning columns across many rows.
- Graph database — nodes and edges for relationship-heavy data.
- Time-series database — timestamped events, metrics, measurements, and systems like Prometheus.
- Search database — text search, relevance, filtering, and systems like Elasticsearch.
- Vector database — similarity search over embeddings with systems like Qdrant and Pinecone.
Core theory
- ACID — transaction guarantees: atomicity, consistency, isolation, durability.
- Transactions — grouping operations into atomic units and isolation tradeoffs.
- MVCC — multi-version concurrency control and obsolete row versions.
- Indexes — faster lookups and the write/storage cost.
- Query planner — how engines choose execution plans.
- Write-ahead log — durability and the log as a replication substrate.
- Replication — copies, failover, and read scaling.
Type-specific theory lives on the relevant type page. For example, joins and constraints belong to Relational database, inverted indexes belong to Search database, and HNSW belongs to Vector database.
Questions to ask
- What access pattern is this database optimized for?
- Does the system need transactions, search, analytics, caching, similarity search, or relationship traversal?
- Which invariants belong in the database, and which belong in the application?
- What becomes expensive as the dataset grows?