Understanding Database Models: DDIA Chapter 2

Published: 3 weeks ago (January 15, 2026 at 11:01 AM EST)

4 min read

Source: Dev.to

Overview

This chapter from Designing Data‑Intensive Applications explores different database models and strategies for structuring application data. Understanding these concepts is crucial for making informed architectural decisions in modern software development.

Data Flow from an Application Perspective

Collect real‑world data.
Model it using data structures and APIs.
Store it in a database, which may use JSON/XML documents, relational tables, or graph models—ultimately persisted as bytes on disk.

Relational vs. NoSQL

While we’ll discuss various data models, the relational model has proven the most resilient and remains widely used. Modern applications often combine relational databases with non‑relational databases (polyglot persistence) to meet specific requirements.

Advantages of NoSQL

Higher scalability and write throughput for large datasets
More open‑source options available
Support for queries not possible in relational databases
Less restrictive schemas
Better performance for certain use cases, matching application data structures more closely

Impedance Mismatch and ORMs

Most application code uses object‑oriented programming, representing data as objects. Relational databases, however, store data in tables, rows, and columns, creating an awkward translation layer known as the impedance mismatch.

Object‑Relational Mappers (ORMs) help by eliminating manual SQL queries, but they mainly provide abstraction rather than truly solving the underlying problem.

Document Model (NoSQL)

A document model stores data as flexible, semi‑structured documents (JSON/XML) rather than rigid rows and columns. MongoDB is a popular example.

One‑to‑Many Example

Consider a LinkedIn profile where one user has multiple job positions. Early SQL databases struggled with this relationship because they were not designed to store multiple values per row. Developers created separate tables for positions, though modern SQL databases now support multi‑valued fields.

Document models handle this elegantly because JSON provides better locality than multi‑table schemas. Instead of issuing multiple queries across tables, you retrieve all relevant data in a single query.

Normalization and ID References

When storing user inputs like city, organization, or college, use ID references instead of raw strings (e.g., org_id = 5 instead of org = "Microsoft"). This approach:

Models many‑to‑one relationships naturally (many users, one organization)
Enables clean data validation through dropdowns
Eliminates ambiguity from duplicate names
Improves maintainability (update once, reflect everywhere)
Allows centralized changes without touching thousands of records

For a deeper exploration of normalization and schema design, see my detailed blog on database normalization.

Joins

Joins combine fields from different tables in relational databases. While straightforward in SQL, document models have weak or nonexistent join support. Without database‑level joins, you must emulate them in application code, creating performance overhead and shifting complexity from the storage layer to your application.

SELECT * FROM animals WHERE family = 'Sharks';

Historical Data Models

Hierarchical Model

The first database model, used in IBM’s IMS, employed a tree structure with single parents—ideal for one‑to‑many relationships.

Limitations

No many‑to‑many relationship support
Difficult denormalization decisions

Network Model

Created to solve hierarchical limitations, the network model allowed multiple parents, supporting many‑to‑one and many‑to‑many relationships.

Fatal Flaw: Records were connected like programming pointers through “access paths.” These access paths made querying and updating extremely complex, negating the benefit of multiple relationship support.

Relational Model Success

Stores data in simple tables (rows and columns)
Eliminates complicated access paths
Supports multiple relationships elegantly
Allows inserting rows without foreign‑key concerns

The game‑changer was the introduction of query optimizers, which make writing efficient queries straightforward.

Declarative vs. Imperative Queries

Imperative code tells the computer every step: loop through lists, check conditions, push to arrays—like giving turn‑by‑turn directions.
Declarative queries (e.g., SQL) specify what you want, not how to get it. The database determines the optimal execution plan.

Declarative languages are naturally parallelizable because the execution order is not dictated by the programmer, allowing databases to split work across multiple cores automatically.

Key Takeaways

Relational databases remain dominant but work well alongside NoSQL for specific use cases.
Document models excel at one‑to‑many relationships with better data locality.
Normalization using IDs prevents data duplication and eases maintenance.
Declarative queries enable database‑level optimization and parallelization.
Understanding historical models (hierarchical, network) helps appreciate modern solutions.