Normal Forms and MongoDB

Published: (February 7, 2026 at 04:53 PM EST)
10 min read
Source: Dev.to

Source: Dev.to

Normal Forms and Modern Data Modeling

When databases were first designed, the relational model focused on enterprise‑wide entities defined before any access patterns were known. The idea was to create a stable, normalized schema that many future applications could share.

Today we design databases for specific applications or bounded domains. Instead of a full model up front, we add features incrementally, gather feedback, and let the schema evolve with the application.

Key point: Normal forms aren’t just relational theory—they describe real data dependencies. Even with MongoDB’s document model you still need to think about normalization; you just have more flexibility in how you apply it.

MVP: One Pizza, One Manager, One Variety, One Area

We’re starting a new business: a large network of pizzerias across many areas with a wide variety of pizzas.

MVP assumptions

AttributeValue
name“A1 Pizza”
manager“Bob”
variety“Thick Crust”
area“Springfield”
{
  "name": "A1 Pizza",
  "manager": "Bob",
  "variety": "Thick Crust",
  "area": "Springfield"
}

No repeating groups or multi‑valued attributes → already in First Normal Form (1NF).
Because the MVP data model is simple—one value per attribute and a single key—there are no dependencies that would violate higher normal forms.

Many designs start out fully normalized, not because the designer painstakingly worked through every normal form, but because the initial dataset is too simple for complex dependencies to exist.

Normalization becomes necessary later, as business rules evolve and new varieties, areas, and independent attributes introduce dependencies that higher normal forms address.

Adding Multiple Varieties – Violating 1NF

When a pizzeria can offer several varieties, a naïve approach might be:

{
  "name": "A1 Pizza",
  "manager": "Bob",
  "varieties": "Thick Crust, Stuffed Crust",
  "area": "Springfield"
}

Why this breaks 1NF

  • Atomicity – each field must hold a single, indivisible piece of data.
  • A comma‑separated string cannot be queried, indexed, or updated efficiently as individual varieties.

Proper 1NF‑compliant representation

Relational view – store the one‑to‑many relationship in a separate table.

Document view – use an array instead of a delimited string.

{
  "name": "A1 Pizza",
  "manager": "Bob",
  "email": "bob@a1-pizza.it",
  "varieties": ["Thick Crust", "Stuffed Crust"]
}
  • Each array element is atomic and independently addressable → satisfies a document‑oriented equivalent of 1NF.
  • MongoDB keeps related data colocated for predictable performance, while SQL provides logical‑physical data independence.

Introducing Prices – Moving Toward 2NF

Now we want to store the base price of each pizza variety.

Embedded price information

{
  "name": "A1 Pizza",
  "manager": "Bob",
  "email": "bob@a1-pizza.it",
  "varieties": [
    { "name": "Thick Crust", "basePrice": 10 },
    { "name": "Stuffed Crust", "basePrice": 12 }
  ]
}

Second Normal Form (2NF) builds on 1NF: every non‑key attribute must depend on the entire primary key, not just part of it. This only matters when we have composite keys.

  • Composite key for each array element: (pizzeria, variety).
  • If each pizzeria can set its own price, basePrice depends on the full composite key → 2NF satisfied.

When prices are standardized

If the same variety costs the same everywhere, basePrice depends only on variety. That’s a partial dependency2NF violation.

Normalizing the price data

Create a separate collection (or table) for pricing:

{ "variety": "Thick Crust", "basePrice": 10 }
{ "variety": "Stuffed Crust", "basePrice": 12 }

Remove basePrice from the pizzeria document and retrieve it via a lookup when needed.

Example: MongoDB view that joins pricing

db.createView(
  "pizzeriasWithPrices",
  "pizzerias",
  [
    { $unwind: "$varieties" },
    {
      $lookup: {
        from: "pricing",
        localField: "varieties.name",
        foreignField: "variety",
        as: "priceInfo"
      }
    },
    { $unwind: "$priceInfo" },
    {
      $addFields: {
        "varieties.basePrice": "$priceInfo.basePrice"
      }
    },
    { $project: { priceInfo: 0 } }
  ]
);

Alternatively, the pricing collection can be referenced directly from the varieties array (using the variety name as a foreign key) and joined at query time.

Recap

Normal FormWhat it enforcesHow it applies to our pizza model
1NFAtomic values, no repeating groupsUse arrays (or separate tables) for multiple varieties
2NFNo partial dependencies on a composite keySeparate basePrice when price is a property of the variety alone
3NF (not covered)No transitive dependenciesWould require further separation if, e.g., varietycategorytaxRate

By incrementally evolving the schema—starting with a simple, 1NF‑compliant document and normalizing only when business rules demand—it’s possible to keep the data model both flexible and well‑structured as the pizzeria network grows.

Updating Prices in a Document Database

When the application retrieves a price it stores it in the pizzeria document for faster reads.
To avoid update anomalies, the application updates all affected documents whenever a variety’s price changes:

const session = db.getMongo().startSession();
const sessionDB = session.getDatabase(db.getName());
session.startTransaction();

sessionDB.getCollection("pricing").updateOne(
  { variety: "Thick Crust" },
  { $set: { basePrice: 11 } }
);

sessionDB.getCollection("pizzerias").updateMany(
  { "varieties.name": "Thick Crust" },
  { $set: { "varieties.$[v].basePrice": 11 } },
  { arrayFilters: [{ "v.name": "Thick Crust" }] }
);

session.commitTransaction();

SQL databases avoid such multiple updates because they are designed for direct end‑user access, often bypassing the application layer. Without normalizing (splitting dependencies into separate tables), duplicated data can be overlooked. In a document database, an application service is responsible for maintaining consistency.

While normalizing to 2NF is possible, it isn’t always the best choice in a domain‑driven design. Keeping the price embedded in each pizzeria:

  • Allows asynchronous updates.
  • Supports future requirements where some pizzerias may offer different prices—without breaking integrity, because the application enforces atomic updates.

In practice many applications accept this controlled duplication when price changes are infrequent and prefer fast single‑document reads over perfectly normalized writes.

1NF → 2NF Example: Manager Email

Original Document (single email per pizzeria)

{
  "name": "A1 Pizza",
  "manager": "Bob",
  "email": "bob@a1-pizza.it",
  "varieties": [
    { "name": "Thick Crust", "basePrice": 10 },
    { "name": "Stuffed Crust", "basePrice": 12 }
  ]
}

3NF – Removing a Transitive Dependency

The email actually belongs to the manager, not directly to the pizzeria, creating a transitive dependency:

pizzeria → manager → email

Normalized document:

{
  "name": "A1 Pizza",
  "manager": { "name": "Bob", "email": "bob@a1-pizza.it" },
  "varieties": [
    { "name": "Thick Crust", "basePrice": 10 },
    { "name": "Stuffed Crust", "basePrice": 12 }
  ]
}

If a pizzeria has multiple managers, use an array of sub‑documents.
In a relational model this would become separate tables (pizzeria, manager, contact), but in our domain we don’t manage contacts outside of pizzerias, so embedding is appropriate.

4NF – Independent Multi‑Valued Dependencies

Desired Delivery Areas

{
  "name": "A1 Pizza",
  "manager": { "name": "Bob", "email": "bob@a1-pizza.it" },
  "offerings": [
    { "variety": { "name": "Thick Crust", "basePrice": 10 }, "area": "Springfield" },
    { "variety": { "name": "Thick Crust", "basePrice": 10 }, "area": "Shelbyville" }
  ]
}

A multi‑valued dependency exists when one attribute determines a set of values for another attribute, independent of all other attributes.

  • If varieties and areas were dependent (e.g., only certain varieties available in certain areas), the (variety, area) pair would be a single fact and 4NF would not be violated.

  • In our case, pizzerias deliver all varieties to all areas, giving us two independent multi‑valued facts:

    • pizzeria →→ variety
    • pizzeria →→ area

Storing every combination creates redundancy.

Normalized 4NF Document

{
  "name": "A1 Pizza",
  "manager": { "name": "Bob", "email": "bob@a1-pizza.it" },
  "varieties": [
    { "name": "Thick Crust", "basePrice": 10 },
    { "name": "Stuffed Crust", "basePrice": 12 }
  ],
  "deliveryAreas": ["Springfield", "Shelbyville"]
}

Now varieties and deliveryAreas are stored independently, eliminating the 4NF violation.

2NF & 3NF – Area‑Based Pricing

When pricing varies by delivery area:

{
  "name": "A1 Pizza",
  "manager": { "name": "Bob", "email": "bob@a1-pizza.it" },
  "offerings": [
    { "variety": "Thick Crust", "area": "Springfield", "price": 10 },
    { "variety": "Thick Crust", "area": "Shelbyville", "price": 11 },
    { "variety": "Stuffed Crust", "area": "Springfield", "price": 12 },
    { "variety": "Stuffed Crust", "area": "Shelbyville", "price": 13 }
  ]
}
  • Composite key for each offering: (pizzeria, variety, area).
  • price depends on the full key, satisfying 2NF (no partial dependencies) and 3NF (no transitive dependencies).

BCNF – Adding an Area Manager

Now each area has a single manager, independent of the pizzeria:

{
  "name": "A1 Pizza",
  "manager": { "name": "Bob", "email": "bob@a1-pizza.it" },
  "offerings": [
    { "variety": "Thick Crust", "area": "Springfield", "price": 10, "areaManager": "Alice" },
    { "variety": "Stuffed Crust", "area": "Springfield", "price": 12, "areaManager": "Alice" },
    { "variety": "Thick Crust", "area": "Shelbyville", "price": 11, "areaManager": "Eve" },
    { "variety": "Stuffed Crust", "area": "Shelbyville", "price": 13, "areaManager": "Eve" }
  ]
}

Boyce‑Codd Normal Form (BCNF) requires every determinant to be a superkey.
Here, area → areaManager violates BCNF because area is not a superkey of the offering document.

BCNF fix

Extract the area‑manager relationship into its own collection or sub‑document:

{
  "area": "Springfield",
  "manager": "Alice"
}

and reference the area from each offering.

Normal Forms in Relational vs. Document Modeling

BCNF vs. 3NF

  • BCNF: every determinant must be a superkey.
  • 3NF allows a determinant to be a candidate‑key attribute when the dependent attribute is part of a candidate key.

Example: In the offerings relation (pizzeria, variety, area) the functional dependency area → areaManager violates BCNF because area alone is not a superkey.

Practical impact: Changing the manager for an area requires updating every offering for that area. In a relational system you would extract area managers into a separate table.

MongoDB approach (embedded structure):

db.pizzerias.updateMany(
  { "offerings.area": "Springfield" },
  { $set: { "offerings.$[o].areaManager": "Carol" } },
  { arrayFilters: [{ "o.area": "Springfield" }] }
);

Trade‑off: Simpler queries and faster reads at the cost of strict BCNF compliance; consistency is enforced by the application.

5NF (Project‑Join Normal Form)

When offering multiple sizes (Small, Medium, Large) that are independent of varieties and areas, storing every combination creates redundancy.

5NF‑compliant design: store independent facts separately.

{
  "name": "A1 Pizza",
  "varieties": ["Thick Crust", "Stuffed Crust"],
  "sizes": ["Large", "Medium"],
  "deliveryAreas": ["Springfield", "Shelbyville"]
}

The application can generate valid combinations on demand, avoiding hundreds of explicit documents.

6NF (Sixth Normal Form)

For audit‑level price history:

{
  "offerings": [
    {
      "variety": "Thick Crust",
      "area": "Springfield",
      "currentPrice": 12,
      "priceHistory": [
        { "price": 10, "effectiveDate": ISODate("2024-01-01") },
        { "price": 11, "effectiveDate": ISODate("2024-03-15") },
        { "price": 12, "effectiveDate": ISODate("2024-06-01") }
      ]
    }
  ]
}

6NF design: use a temporal fact collection.

{ "pizzeria": "A1 Pizza", "variety": "Thick Crust", "area": "Springfield",
  "price": 10, "effectiveDate": ISODate("2024-01-01") }
{ "pizzeria": "A1 Pizza", "variety": "Thick Crust", "area": "Springfield",
  "price": 11, "effectiveDate": ISODate("2024-03-15") }
{ "pizzeria": "A1 Pizza", "variety": "Thick Crust", "area": "Springfield",
  "price": 12, "effectiveDate": ISODate("2024-06-01") }

Use this for auditing, analytics, or “price on a given date” queries.

0 views
Back to Blog

Related posts

Read more »

Primary & Secondary Keys In Power Bi.

!Cover image for Primary & Secondary Keys In Power Bi.https://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=auto/https%3A%2F%2...

Entity–Attribute–Value (EAV) Model

The Entity–Attribute–Value EAV model is a database design pattern that lets you define entity attributes at runtime. It’s especially useful when: - Different re...

Schemas and Data Modelling in Power BI

Introduction Data modelling, as the name suggests, encompasses restructuring data and creating insightful visuals from cleaned and structured data. In Power BI...