Understanding Database Models: DDIA Chapter 2
Source: Dev.to
Overview
This chapter from Designing Data‑Intensive Applications explores different database models and strategies for structuring application data. Understanding these concepts is crucial for making informed architectural decisions in modern software development.
Data Flow from an Application Perspective
- Collect real‑world data.
- Model it using data structures and APIs.
- Store it in a database, which may use JSON/XML documents, relational tables, or graph models—ultimately persisted as bytes on disk.
Relational vs. NoSQL
While we’ll discuss various data models, the relational model has proven the most resilient and remains widely used. Modern applications often combine relational databases with non‑relational databases (polyglot persistence) to meet specific requirements.
Advantages of NoSQL
- Higher scalability and write throughput for large datasets
- More open‑source options available
- Support for queries not possible in relational databases
- Less restrictive schemas
- Better performance for certain use cases, matching application data structures more closely
Impedance Mismatch and ORMs
Most application code uses object‑oriented programming, representing data as objects. Relational databases, however, store data in tables, rows, and columns, creating an awkward translation layer known as the impedance mismatch.
Object‑Relational Mappers (ORMs) help by eliminating manual SQL queries, but they mainly provide abstraction rather than truly solving the underlying problem.
Document Model (NoSQL)
A document model stores data as flexible, semi‑structured documents (JSON/XML) rather than rigid rows and columns. MongoDB is a popular example.
One‑to‑Many Example
Consider a LinkedIn profile where one user has multiple job positions. Early SQL databases struggled with this relationship because they were not designed to store multiple values per row. Developers created separate tables for positions, though modern SQL databases now support multi‑valued fields.
Document models handle this elegantly because JSON provides better locality than multi‑table schemas. Instead of issuing multiple queries across tables, you retrieve all relevant data in a single query.
Normalization and ID References
When storing user inputs like city, organization, or college, use ID references instead of raw strings (e.g., org_id = 5 instead of org = "Microsoft"). This approach:
- Models many‑to‑one relationships naturally (many users, one organization)
- Enables clean data validation through dropdowns
- Eliminates ambiguity from duplicate names
- Improves maintainability (update once, reflect everywhere)
- Allows centralized changes without touching thousands of records
For a deeper exploration of normalization and schema design, see my detailed blog on database normalization.
Joins
Joins combine fields from different tables in relational databases. While straightforward in SQL, document models have weak or nonexistent join support. Without database‑level joins, you must emulate them in application code, creating performance overhead and shifting complexity from the storage layer to your application.
SELECT * FROM animals WHERE family = 'Sharks';
Historical Data Models
Hierarchical Model
The first database model, used in IBM’s IMS, employed a tree structure with single parents—ideal for one‑to‑many relationships.
Limitations
- No many‑to‑many relationship support
- Difficult denormalization decisions
Network Model
Created to solve hierarchical limitations, the network model allowed multiple parents, supporting many‑to‑one and many‑to‑many relationships.
Fatal Flaw: Records were connected like programming pointers through “access paths.” These access paths made querying and updating extremely complex, negating the benefit of multiple relationship support.
Relational Model Success
- Stores data in simple tables (rows and columns)
- Eliminates complicated access paths
- Supports multiple relationships elegantly
- Allows inserting rows without foreign‑key concerns
The game‑changer was the introduction of query optimizers, which make writing efficient queries straightforward.
Declarative vs. Imperative Queries
- Imperative code tells the computer every step: loop through lists, check conditions, push to arrays—like giving turn‑by‑turn directions.
- Declarative queries (e.g., SQL) specify what you want, not how to get it. The database determines the optimal execution plan.
Declarative languages are naturally parallelizable because the execution order is not dictated by the programmer, allowing databases to split work across multiple cores automatically.
Key Takeaways
- Relational databases remain dominant but work well alongside NoSQL for specific use cases.
- Document models excel at one‑to‑many relationships with better data locality.
- Normalization using IDs prevents data duplication and eases maintenance.
- Declarative queries enable database‑level optimization and parallelization.
- Understanding historical models (hierarchical, network) helps appreciate modern solutions.