Indexes and the Rise of the DBMS

Published: (December 25, 2025 at 06:45 AM EST)
5 min read
Source: Dev.to

Source: Dev.to

Hello, I’m Maneshwar. I’m working on FreeDevTools online, currently building one place for all dev tools, cheat codes, and TLDRs — a free, open‑source hub where developers can quickly find and use tools without the hassle of searching all over the internet.

Recap

In the previous post we saw how the relational model gives structure to data:

  • Entities become relations.
  • Relationships become tables.
  • Integrity constraints define what a valid database state looks like.

But structure alone is not enough. Once data grows beyond a trivial size, how we access relations becomes just as important as what they represent. This is where indexing and the Database Management System (DBMS) enter the picture.

Why Indexes Exist

A full‑relation scan (reading every row in a table) is the most basic way to answer a query, and it is also one of the most expensive operations a database can perform.

Given what we already know about disks:

  • Reading data happens block by block.
  • Disk I/O dominates execution time.
  • Sequential scans over large relations are costly.

Hence, databases need a faster way to locate rows.

What Is an Index?

An index is a data structure built on top of a relation to support efficient lookup of rows based on the values of one or more attributes. An index does not store the data itself; it stores:

  • Search‑key values
  • Pointers or references to rows in the base relation
  • Routing information that guides the lookup along the most efficient path

Indexes are not part of the relational model itself, but they are indispensable in real systems.

Indexes and Relations

Important properties of indexes in relational databases:

PropertyDescription
AssociationAn index is always associated with one base relation.
MultiplicityA relation may have no index, one index, or many indexes.
Search keyDefined on one or more attributes (the index search key). The search key does not have to be a key of the relation.
Primary vs. secondaryIf the search key is the primary key of the relation, the index is a primary index; otherwise it is a secondary index. Most systems allow many secondary indexes on the same relation.
MaintenanceIndexes are created and dropped by users, but they are maintained automatically by the DBMS. Users never read from or write to an index directly.
Write costAdding more indexes improves read performance but slows down writes, because every modification must also update the index structures.

Whenever rows are inserted, deleted, or updated, the DBMS updates all affected indexes behind the scenes.

Hash Indexes and Tree Indexes

Two families of index structures dominate relational systems.

Hash Indexes

A hash index is optimized for fast point lookups. It works by:

  1. Applying a hash function to the search‑key value.
  2. Mapping the value to a bucket.
  3. Searching only within that bucket.

Hash indexes are extremely fast for exact‑match queries, but they have a major limitation:

  • They do not support range queries efficiently.
    Asking for values between two bounds requires scanning many or all buckets, defeating the purpose of the index.

B‑Tree and B⁺‑Tree Indexes

A B‑tree index stores search‑key values in sorted order, enabling:

  • Efficient point lookups.
  • Efficient range queries.
  • Ordered traversal without scanning the entire relation.

Each node in a B‑tree contains:

  • Search‑key values.
  • Pointers to child nodes.

Traversal decisions are made by comparing the search key at each node.

A widely used variant is the B⁺‑tree:

  • Internal nodes contain only routing information.
  • Leaf nodes contain the actual data references (often linked together to support fast range scans).

B⁺‑trees are especially well‑suited to disk‑based storage because they minimize random I/O and keep the tree height small.

At this point we have the first glimpse of how logical queries begin to map onto physical disk‑access patterns.

Why Applications Should Not Manage Data Directly

Historically, applications manipulated database files directly:

  • Reading bytes from files.
  • Modifying in‑memory structures.
  • Writing data back to disk.

This file‑processing approach quickly breaks down due to several fundamental problems:

  • Tight coupling to file formats.
  • Difficulty enforcing integrity constraints.
  • Partial writes and crashes that corrupt data.
  • Dangerous concurrent access.
  • Extremely complex recovery after failure.

As databases grow and multiple users access them concurrently, correctness becomes nearly impossible to guarantee at the application level. Even well‑written applications can produce inconsistent states under concurrency or failure.

Enter the Database Management System

The solution is to insert a dedicated software layer between applications and storage: the Database Management System (DBMS).

A DBMS provides:

  • Higher‑level data abstractions (relations instead of files).
  • Well‑defined APIs for querying and updating data.
  • Automatic index maintenance.
  • Enforcement of integrity constraints.
  • Concurrency control.
  • Failure recovery.

From the Application’s Perspective

  • Data appears as tables.
  • Operations are declarative.
  • Storage details are hidden.

From the System’s Perspective

  • Queries are translated into efficient execution plans.
  • Indexes are used when beneficial.
  • Disk I/O is carefully managed.
  • Failures are handled transparently.

The DBMS acts as a black box that bridges the gap between:

  • Conceptual data models.
  • Physical storage realities.

Where We Are Now

At this stage the layers are clearly visible:

  • Disks define physical constraints.
  • Relations define logical structure.
  • Indexes define access paths.
  • The DBMS coordinates everything.

What Remains Unresolved Is the Hardest Problem of All

What happens when many users modify data concurrently, and failures occur mid‑operation?

That question leads directly to transactions, ACID properties, transaction management, and concurrency control.

That’s where this journey goes next and where databases stop being simple data stores and start becoming serious systems.

FreeDevTools

👉 Check out: FreeDevTools

Any feedback or contributions are welcome!
It’s online, open‑source, and ready for anyone to use.

Star it on GitHub: FreeDevTools

Back to Blog

Related posts

Read more »