No Foreign Keys in MongoDB: Rethinking Referential Integrity

Published: (December 25, 2025 at 05:29 PM EST)
9 min read
Source: Dev.to

Source: Dev.to

Foreign Keys vs. Application‑Level Validation

In SQL databases, foreign keys act as immediate constraints that verify the correctness of relationships between tables before accepting a write. This was designed for scenarios in which end users can submit random queries directly to the database. As a result, the database is responsible for protecting the data model using normalization, integrity constraints, stored procedures, and triggers, rather than relying on validation performed before the application interacts with the database.

When relational integrity is violated, an error occurs, preventing the user from making the changes. The application rolls back the transaction and raises an exception.

MongoDB’s NoSQL approach differs from relational databases as it was designed for application developers. It relies on application code to enforce these rules. Use cases are clearly defined, validation occurs at the application level, and business logic takes precedence over foreign‑key verification. Eliminating the need for additional serializable reads associated with foreign keys can significantly boost write performance and scalability.

Asynchronous Referential Integrity

Referential integrity can be verified asynchronously. Instead of raising an exception—an unexpected event the application might not be ready for—MongoDB allows the write to proceed and offers tools like the

to detect and log errors. This approach enables error analysis, data correction, and application fixes without affecting the application’s availability and still includes the business logic.

Example: Departments and Employees

We’ll model a classic scenario where all employees must belong to a department.

Two Collections with References

Strong relationships (including one‑to‑many) don’t always require multiple collections with references, especially if the entities share the exact lifecycle. Depending on the domain’s context, you can:

  • Embed a list of employees inside each department document to guarantee referential integrity and prevent orphans.
  • Embed department information inside each employee document when department updates are infrequent (e.g., a simple multi‑document change to a department description) or when department changes usually happen as part of larger reorganisations.

When both entities are not always accessed together, have unbounded cardinality, or are updated independently, referencing is often the better choice. For example, store a deptno for each employee and maintain a separate departments collection, each with a unique deptno.

Below is a script that creates the collections, indexes, and sample data.

// -------------------------------------------------
// Reset collections
// -------------------------------------------------
db.departments.drop();
db.employees.drop();

// -------------------------------------------------
// Departments collection
// -------------------------------------------------
db.departments.createIndex(
  { deptno: 1 },               // deptno will be used as the referenced key
  { unique: true }             // must be unique for many‑to‑one relationships
);

db.departments.insertMany([
  { deptno: 10, dname: "ACCOUNTING",  loc: "NEW YORK" },
  { deptno: 20, dname: "RESEARCH",    loc: "DALLAS"   },
  { deptno: 30, dname: "SALES",       loc: "CHICAGO"  },
  { deptno: 40, dname: "OPERATIONS",  loc: "BOSTON"   }
]);

// -------------------------------------------------
// Employees collection
// -------------------------------------------------
db.employees.createIndex(
  { deptno: 1 }                // reference to departments (helps $lookup)
);

db.employees.insertMany([
  { empno: 7839, ename: "KING",   job: "PRESIDENT", deptno: 10 },
  { empno: 7698, ename: "BLAKE",  job: "MANAGER",   deptno: 30 },
  { empno: 7782, ename: "CLARK",  job: "MANAGER",   deptno: 10 },
  { empno: 7566, ename: "JONES",  job: "MANAGER",   deptno: 20 },
  { empno: 7788, ename: "SCOTT",  job: "ANALYST",   deptno: 20 },
  { empno: 7902, ename: "FORD",   job: "ANALYST",   deptno: 20 },
  { empno: 7844, ename: "TURNER", job: "SALESMAN",  deptno: 30 },
  { empno: 7900, ename: "JAMES",  job: "CLERK",     deptno: 30 },
  { empno: 7654, ename: "MARTIN", job: "SALESMAN",  deptno: 30 },
  { empno: 7499, ename: "ALLEN",  job: "SALESMAN",  deptno: 30 },
  { empno: 7521, ename: "WARD",   job: "SALESMAN",  deptno: 30 },
  { empno: 7934, ename: "MILLER", job: "CLERK",     deptno: 10 },
  { empno: 7369, ename: "SMITH",  job: "CLERK",     deptno: 20 },
  { empno: 7876, ename: "ADAMS",  job: "CLERK",     deptno: 20 }
]);

Note – I didn’t declare a schema up‑front; the documents arrive as‑is from the application. Indexes on both sides enable fast navigation between employees and departments, and vice‑versa.

Query Examples

This schema supports all cardinalities, including millions of employees per department (something you wouldn’t embed). It is normalized so that updates affect only a single document, yet it still allows bidirectional querying.

1. Employees → Department

db.employees.aggregate([
  {
    $lookup: {
      from: "departments",
      localField: "deptno",
      foreignField: "deptno",   // fast access thanks to the unique index
      as: "department"
    }
  },
  {
    $set: {
      // Keep only the first (and only) matching department document
      department: { $arrayElemAt: ["$department", 0] }
    }
  }
]);

2. Departments → Employees

db.departments.aggregate([
  {
    $lookup: {
      from: "employees",
      localField: "deptno",
      foreignField: "deptno",
      as: "employees"
    }
  },
  {
    $project: {
      _id: 0,
      deptno: 1,
      dname: 1,
      loc: 1,
      employees: {
        empno: 1,
        ename: 1,
        job:   1,
        deptno: 1
      }
    }
  }
]);

These pipelines simulate joins at read time, giving you the flexibility of an embedded model without sacrificing write performance or data integrity.

Takeaways

  • SQL: Foreign keys enforce referential integrity synchronously; violations raise immediate errors.
  • MongoDB: Referential integrity is typically enforced at the application level or verified asynchronously via aggregation pipelines or change streams.
  • Modeling choice:
    • Embed when the relationship is tightly coupled, low‑cardinality, and the data is always accessed together.
    • Reference when you need unbounded cardinality, independent updates, or separate access patterns.

By understanding these trade‑offs, you can design MongoDB schemas that retain data consistency while maximizing performance and scalability.

Aggregation Example

db.departments.aggregate([
  {
    $lookup: {
      from: "employees",
      localField: "deptno",
      foreignField: "deptno", // fast access by index on employees
      as: "employees"
    }
  }
]);

Performance Considerations

From a performance standpoint, performing a $lookup is more costly than reading from a single embedded collection. However, this overhead isn’t significant when browsing through tens or hundreds of documents.

When choosing this model, because a department might have a million employees, you don’t retrieve all the data at once. Instead:

  • A $match will filter documents before the $lookup in the first query, or
  • A filter will be applied within the $lookup pipeline in the second query.

I covered those variations in a previous post.

Referential Integrity

If an employee is inserted with a deptno that does not exist in departments, the $lookup finds no match:

  • The first query omits the department information.
  • The second query doesn’t show the new employee because it lists only the known department.

This is expected behavior for an application that didn’t insert the referenced department.

Relational DBAs often overstate how serious this is, even calling it data corruption. Because SQL defaults to inner joins, that employee would be missing from the result of the first query. With outer joins like $lookup in MongoDB, this does not happen—it’s more like a NULL in SQL: the information is not yet known, so it isn’t shown. You can add the department later, and the queries will reflect the information as it becomes available.

You may still want to detect when referenced items are not inserted after some time (e.g., due to a bug).

Foreign‑Key Definition as a $lookup Stage

Define referential integrity using two stages: a $lookup stage and a $match stage that verify whether the referenced document exists.

// $lookup stage
const lookupStage = {
  $lookup: {
    from: "departments",
    localField: "deptno",
    foreignField: "deptno",
    as: "dept"
  }
};

// $match stage – keep docs where the lookup returned an empty array
const matchStage = { $match: { dept: { $size: 0 } } };

The definition is simple and similar to an SQL foreign key. In practice it can be more complex and precise. Document databases excel in scenarios where business logic extends beyond what a static foreign key can express. For example:

  • Some employees may temporarily have no department (e.g., new hires).
  • Some may belong to two departments during a transition.

MongoDB’s flexible schema supports these cases, and you define referential‑integrity rules accordingly. I’ll keep it simple for this example.

One‑Time Validation with an Aggregation Pipeline

Insert a new employee, Eliot, into department 42, which doesn’t exist yet:

db.employees.insertOne({
  empno: 9002,
  ename: "Eliot",
  job: "CTO",
  deptno: 42 // Missing department
});

No error is raised. In all queries the employee is visible only by department number, with no other department information.

If you decide that such a situation should be detected, run the following aggregation pipeline to list the violations:

db.employees.aggregate([lookupStage, matchStage]);

Result:

[
  {
    "_id": { "$oid": "694d8b6cd0e5c67212d4b14f" },
    "empno": 9002,
    "ename": "Eliot",
    "job": "CTO",
    "deptno": 42,
    "dept": []
  }
]

We’ve caught the violation asynchronously and can decide what to do (e.g., fix the deptno, insert the missing department, or adjust business rules).

When to Run This Validation?

  • Depends on database size and risk of integrity issues.
  • After major data refactoring, run it as an extra check.
  • To avoid production impact, run it on a read replica (an advantage of asynchronous verification).
  • You don’t need a high isolation level; at worst, concurrent transactions may trigger a false warning, which can be reviewed later.
  • When restoring backups for disaster‑recovery testing, run the validation on the restored copy to verify both the restore process and data integrity.

Real‑Time Watcher with Change Streams

You may also want near‑real‑time validation, checking changes shortly after they occur.

const cs = db.employees.watch([
  { $match: { operationType: { $in: ["insert", "update", "replace"] } } }
]);

print("👀 Watching employees for referential integrity violations...");

while (cs.hasNext()) {
  const change = cs.next(); // Get the next change event

  if (["insert", "update", "replace"].includes(change.operationType)) {
    const result = db.employees.aggregate([
      { $match: { _id: change.documentKey._id } }, // check the new document
      lookupStage, // lookup dept info by deptno
      matchStage   // keep only docs with NO matching dept
    ]).toArray();

    if (result.length > 0) {
      // Handle the violation (e.g., log, alert, corrective action)
      printjson(result[0]);
    }
  }
}

The change stream watches employees for inserts, updates, and replacements. For each changed document it:

  1. Retrieves the document by _id.
  2. Performs the $lookup to fetch the referenced department.
  3. Applies $match to keep only documents where the lookup returned an empty array (i.e., no matching department).

If any violations are found, you can log them, raise alerts, or trigger corrective actions.

Summary

  • $lookup provides a flexible way to model relationships in MongoDB.
  • Referential integrity can be enforced asynchronously (periodic aggregation) or in near‑real time (change streams).
  • MongoDB’s schema flexibility lets you tailor integrity rules to actual business needs, rather than being forced into a rigid foreign‑key model.
print("\n⚠ Real-time Referential Integrity Violation Detected:");
printjson(result[0]);

Insert a new employee (Dwight) in a missing department 42

db.employees.insertOne({
  empno: 9001,
  ename: "Dwight",
  job: "CEO",
  deptno: 42 // missing department
});
Back to Blog

Related posts

Read more »

Academic Suite Database Design

The database serves as the primary foundation of Academic Suite. In an online exam system, improper database schema design can lead to serious bottlenecks, espe...