When Regex Meets the DOM (And Suddenly It’s Not Simple Anymore)

Published: 3 days ago (February 26, 2026 at 05:03 AM EST)

3 min read

Source: Dev.to

Source: Dev.to

Cover image for When Regex Meets the DOM (And Suddenly It’s Not Simple Anymore)

Goals

Support multi‑word queries
Prefer full‑phrase matches
Fall back to individual token matches
Highlight results in the DOM
Skip and blocks

In my head: “Easy. Just build a regex.”

Step 1: Build the Regex

If a user searches for:

power shell

I generate a pattern like:

power[\s\u00A0]+shell|power|shell

Logic

Try to match the full phrase first.
If that fails, match individual tokens.

On paper it looks clean, and in isolation it works.

Step 2: Enter the DOM

Now the problem moves from plain string matching to DOM traversal.

Tasks

Walk the DOM while avoiding UI elements.
Skip , , , blocks.
Preserve syntax highlighting.
Replace only text nodes and keep the DOM structure intact.

A TreeWalker does the job:

const walker = document.createTreeWalker(root, NodeFilter.SHOW_TEXT, {
  acceptNode(node) {
    const p = node.parentElement;
    if (!p) return NodeFilter.FILTER_REJECT;

    if (p.closest("code, pre, script, style")) {
      return NodeFilter.FILTER_REJECT;
    }

    return NodeFilter.FILTER_ACCEPT;
  },
});

Now we’re not just applying a regex; we’re performing controlled DOM mutation.

Step 3: The Alternation Problem

Even though the phrase appears first in the alternation:

phrase|token1|token2

the engine still happily matches individual tokens (power, shell, PowerShell) depending on context.

The challenges become

Overlapping matches
Execution order
Resetting lastIndex
Avoiding double mutation
Preventing nested “ elements

Step 4: Two Passes?

I considered splitting the process:

Try a phrase match.
If none found, try token matches.

That sounds simple—until you realize the DOM may already have been mutated after the first pass, requiring state management across passes.

The Realisation

Regex problems are easy in isolation.
DOM mutation problems are easy in isolation.
Combining them multiplies complexity.

The line between a “simple feature” and a “mini search engine” is very thin.

Where I Am Now

The search works mostly.
Highlights are applied.
Protected blocks are skipped.
Structure is respected.

It’s not yet a full browser‑level Ctrl + F, but the core functionality is there.

I now respect the DOM far more than before, and I’ve come to appreciate how making JavaScript logic behave predictably inside a living DOM tree is the real challenge.

What I Learned

Regex is deterministic; the DOM is structural and stateful.
Once you start replacing text nodes, everything becomes delicate.
Edge cases, state management, and robust mutation handling are essential for a truly reliable feature.

When Regex Meets the DOM (And Suddenly It’s Not Simple Anymore)

Goals

Step 1: Build the Regex

Step 2: Enter the DOM

Step 3: The Alternation Problem

Step 4: Two Passes?

The Realisation

Where I Am Now

What I Learned

Related posts

The Death of Flash: Building a Facebook-Integrated HTML5 Memory Game in 2013

I Built My Own dev.to Feed Page Instead of Embedding a Widget

JavaScript Promises Explained: A Startup Analogy for Beginners

I built 6 JavaScript widgets with zero dependencies — here's what I learned from each