Normal Forms and MongoDB
Source: Dev.to
Normal Forms and Modern Data Modeling
When databases were first designed, the relational model focused on enterprise‑wide entities defined before any access patterns were known. The idea was to create a stable, normalized schema that many future applications could share.
Today we design databases for specific applications or bounded domains. Instead of a full model up front, we add features incrementally, gather feedback, and let the schema evolve with the application.
Key point: Normal forms aren’t just relational theory—they describe real data dependencies. Even with MongoDB’s document model you still need to think about normalization; you just have more flexibility in how you apply it.
MVP: One Pizza, One Manager, One Variety, One Area
We’re starting a new business: a large network of pizzerias across many areas with a wide variety of pizzas.
MVP assumptions
| Attribute | Value |
|---|---|
| name | “A1 Pizza” |
| manager | “Bob” |
| variety | “Thick Crust” |
| area | “Springfield” |
{
"name": "A1 Pizza",
"manager": "Bob",
"variety": "Thick Crust",
"area": "Springfield"
}
No repeating groups or multi‑valued attributes → already in First Normal Form (1NF).
Because the MVP data model is simple—one value per attribute and a single key—there are no dependencies that would violate higher normal forms.
Many designs start out fully normalized, not because the designer painstakingly worked through every normal form, but because the initial dataset is too simple for complex dependencies to exist.
Normalization becomes necessary later, as business rules evolve and new varieties, areas, and independent attributes introduce dependencies that higher normal forms address.
Adding Multiple Varieties – Violating 1NF
When a pizzeria can offer several varieties, a naïve approach might be:
{
"name": "A1 Pizza",
"manager": "Bob",
"varieties": "Thick Crust, Stuffed Crust",
"area": "Springfield"
}
Why this breaks 1NF
- Atomicity – each field must hold a single, indivisible piece of data.
- A comma‑separated string cannot be queried, indexed, or updated efficiently as individual varieties.
Proper 1NF‑compliant representation
Relational view – store the one‑to‑many relationship in a separate table.
Document view – use an array instead of a delimited string.
{
"name": "A1 Pizza",
"manager": "Bob",
"email": "bob@a1-pizza.it",
"varieties": ["Thick Crust", "Stuffed Crust"]
}
- Each array element is atomic and independently addressable → satisfies a document‑oriented equivalent of 1NF.
- MongoDB keeps related data colocated for predictable performance, while SQL provides logical‑physical data independence.
Introducing Prices – Moving Toward 2NF
Now we want to store the base price of each pizza variety.
Embedded price information
{
"name": "A1 Pizza",
"manager": "Bob",
"email": "bob@a1-pizza.it",
"varieties": [
{ "name": "Thick Crust", "basePrice": 10 },
{ "name": "Stuffed Crust", "basePrice": 12 }
]
}
Second Normal Form (2NF) builds on 1NF: every non‑key attribute must depend on the entire primary key, not just part of it. This only matters when we have composite keys.
- Composite key for each array element: (
pizzeria,variety). - If each pizzeria can set its own price,
basePricedepends on the full composite key → 2NF satisfied.
When prices are standardized
If the same variety costs the same everywhere, basePrice depends only on variety. That’s a partial dependency → 2NF violation.
Normalizing the price data
Create a separate collection (or table) for pricing:
{ "variety": "Thick Crust", "basePrice": 10 }
{ "variety": "Stuffed Crust", "basePrice": 12 }
Remove basePrice from the pizzeria document and retrieve it via a lookup when needed.
Example: MongoDB view that joins pricing
db.createView(
"pizzeriasWithPrices",
"pizzerias",
[
{ $unwind: "$varieties" },
{
$lookup: {
from: "pricing",
localField: "varieties.name",
foreignField: "variety",
as: "priceInfo"
}
},
{ $unwind: "$priceInfo" },
{
$addFields: {
"varieties.basePrice": "$priceInfo.basePrice"
}
},
{ $project: { priceInfo: 0 } }
]
);
Alternatively, the pricing collection can be referenced directly from the varieties array (using the variety name as a foreign key) and joined at query time.
Recap
| Normal Form | What it enforces | How it applies to our pizza model |
|---|---|---|
| 1NF | Atomic values, no repeating groups | Use arrays (or separate tables) for multiple varieties |
| 2NF | No partial dependencies on a composite key | Separate basePrice when price is a property of the variety alone |
| 3NF (not covered) | No transitive dependencies | Would require further separation if, e.g., variety → category → taxRate |
By incrementally evolving the schema—starting with a simple, 1NF‑compliant document and normalizing only when business rules demand—it’s possible to keep the data model both flexible and well‑structured as the pizzeria network grows.
Updating Prices in a Document Database
When the application retrieves a price it stores it in the pizzeria document for faster reads.
To avoid update anomalies, the application updates all affected documents whenever a variety’s price changes:
const session = db.getMongo().startSession();
const sessionDB = session.getDatabase(db.getName());
session.startTransaction();
sessionDB.getCollection("pricing").updateOne(
{ variety: "Thick Crust" },
{ $set: { basePrice: 11 } }
);
sessionDB.getCollection("pizzerias").updateMany(
{ "varieties.name": "Thick Crust" },
{ $set: { "varieties.$[v].basePrice": 11 } },
{ arrayFilters: [{ "v.name": "Thick Crust" }] }
);
session.commitTransaction();
SQL databases avoid such multiple updates because they are designed for direct end‑user access, often bypassing the application layer. Without normalizing (splitting dependencies into separate tables), duplicated data can be overlooked. In a document database, an application service is responsible for maintaining consistency.
While normalizing to 2NF is possible, it isn’t always the best choice in a domain‑driven design. Keeping the price embedded in each pizzeria:
- Allows asynchronous updates.
- Supports future requirements where some pizzerias may offer different prices—without breaking integrity, because the application enforces atomic updates.
In practice many applications accept this controlled duplication when price changes are infrequent and prefer fast single‑document reads over perfectly normalized writes.
1NF → 2NF Example: Manager Email
Original Document (single email per pizzeria)
{
"name": "A1 Pizza",
"manager": "Bob",
"email": "bob@a1-pizza.it",
"varieties": [
{ "name": "Thick Crust", "basePrice": 10 },
{ "name": "Stuffed Crust", "basePrice": 12 }
]
}
3NF – Removing a Transitive Dependency
The email actually belongs to the manager, not directly to the pizzeria, creating a transitive dependency:
pizzeria → manager → email
Normalized document:
{
"name": "A1 Pizza",
"manager": { "name": "Bob", "email": "bob@a1-pizza.it" },
"varieties": [
{ "name": "Thick Crust", "basePrice": 10 },
{ "name": "Stuffed Crust", "basePrice": 12 }
]
}
If a pizzeria has multiple managers, use an array of sub‑documents.
In a relational model this would become separate tables (pizzeria, manager, contact), but in our domain we don’t manage contacts outside of pizzerias, so embedding is appropriate.
4NF – Independent Multi‑Valued Dependencies
Desired Delivery Areas
{
"name": "A1 Pizza",
"manager": { "name": "Bob", "email": "bob@a1-pizza.it" },
"offerings": [
{ "variety": { "name": "Thick Crust", "basePrice": 10 }, "area": "Springfield" },
{ "variety": { "name": "Thick Crust", "basePrice": 10 }, "area": "Shelbyville" }
]
}
A multi‑valued dependency exists when one attribute determines a set of values for another attribute, independent of all other attributes.
-
If varieties and areas were dependent (e.g., only certain varieties available in certain areas), the
(variety, area)pair would be a single fact and 4NF would not be violated. -
In our case, pizzerias deliver all varieties to all areas, giving us two independent multi‑valued facts:
pizzeria →→ varietypizzeria →→ area
Storing every combination creates redundancy.
Normalized 4NF Document
{
"name": "A1 Pizza",
"manager": { "name": "Bob", "email": "bob@a1-pizza.it" },
"varieties": [
{ "name": "Thick Crust", "basePrice": 10 },
{ "name": "Stuffed Crust", "basePrice": 12 }
],
"deliveryAreas": ["Springfield", "Shelbyville"]
}
Now varieties and deliveryAreas are stored independently, eliminating the 4NF violation.
2NF & 3NF – Area‑Based Pricing
When pricing varies by delivery area:
{
"name": "A1 Pizza",
"manager": { "name": "Bob", "email": "bob@a1-pizza.it" },
"offerings": [
{ "variety": "Thick Crust", "area": "Springfield", "price": 10 },
{ "variety": "Thick Crust", "area": "Shelbyville", "price": 11 },
{ "variety": "Stuffed Crust", "area": "Springfield", "price": 12 },
{ "variety": "Stuffed Crust", "area": "Shelbyville", "price": 13 }
]
}
- Composite key for each offering: (pizzeria, variety, area).
pricedepends on the full key, satisfying 2NF (no partial dependencies) and 3NF (no transitive dependencies).
BCNF – Adding an Area Manager
Now each area has a single manager, independent of the pizzeria:
{
"name": "A1 Pizza",
"manager": { "name": "Bob", "email": "bob@a1-pizza.it" },
"offerings": [
{ "variety": "Thick Crust", "area": "Springfield", "price": 10, "areaManager": "Alice" },
{ "variety": "Stuffed Crust", "area": "Springfield", "price": 12, "areaManager": "Alice" },
{ "variety": "Thick Crust", "area": "Shelbyville", "price": 11, "areaManager": "Eve" },
{ "variety": "Stuffed Crust", "area": "Shelbyville", "price": 13, "areaManager": "Eve" }
]
}
Boyce‑Codd Normal Form (BCNF) requires every determinant to be a superkey.
Here, area → areaManager violates BCNF because area is not a superkey of the offering document.
BCNF fix
Extract the area‑manager relationship into its own collection or sub‑document:
{
"area": "Springfield",
"manager": "Alice"
}
and reference the area from each offering.
Normal Forms in Relational vs. Document Modeling
BCNF vs. 3NF
- BCNF: every determinant must be a superkey.
- 3NF allows a determinant to be a candidate‑key attribute when the dependent attribute is part of a candidate key.
Example: In the offerings relation (pizzeria, variety, area) the functional dependency area → areaManager violates BCNF because area alone is not a superkey.
Practical impact: Changing the manager for an area requires updating every offering for that area. In a relational system you would extract area managers into a separate table.
MongoDB approach (embedded structure):
db.pizzerias.updateMany(
{ "offerings.area": "Springfield" },
{ $set: { "offerings.$[o].areaManager": "Carol" } },
{ arrayFilters: [{ "o.area": "Springfield" }] }
);
Trade‑off: Simpler queries and faster reads at the cost of strict BCNF compliance; consistency is enforced by the application.
5NF (Project‑Join Normal Form)
When offering multiple sizes (Small, Medium, Large) that are independent of varieties and areas, storing every combination creates redundancy.
5NF‑compliant design: store independent facts separately.
{
"name": "A1 Pizza",
"varieties": ["Thick Crust", "Stuffed Crust"],
"sizes": ["Large", "Medium"],
"deliveryAreas": ["Springfield", "Shelbyville"]
}
The application can generate valid combinations on demand, avoiding hundreds of explicit documents.
6NF (Sixth Normal Form)
For audit‑level price history:
{
"offerings": [
{
"variety": "Thick Crust",
"area": "Springfield",
"currentPrice": 12,
"priceHistory": [
{ "price": 10, "effectiveDate": ISODate("2024-01-01") },
{ "price": 11, "effectiveDate": ISODate("2024-03-15") },
{ "price": 12, "effectiveDate": ISODate("2024-06-01") }
]
}
]
}
6NF design: use a temporal fact collection.
{ "pizzeria": "A1 Pizza", "variety": "Thick Crust", "area": "Springfield",
"price": 10, "effectiveDate": ISODate("2024-01-01") }
{ "pizzeria": "A1 Pizza", "variety": "Thick Crust", "area": "Springfield",
"price": 11, "effectiveDate": ISODate("2024-03-15") }
{ "pizzeria": "A1 Pizza", "variety": "Thick Crust", "area": "Springfield",
"price": 12, "effectiveDate": ISODate("2024-06-01") }
Use this for auditing, analytics, or “price on a given date” queries.