When Regex Meets the DOM (And Suddenly It’s Not Simple Anymore)

Published: (February 26, 2026 at 05:03 AM EST)
3 min read
Source: Dev.to

Source: Dev.to

Cover image for When Regex Meets the DOM (And Suddenly It’s Not Simple Anymore)

Goals

  • Support multi‑word queries
  • Prefer full‑phrase matches
  • Fall back to individual token matches
  • Highlight results in the DOM
  • Skip and blocks

In my head: “Easy. Just build a regex.”

Step 1: Build the Regex

If a user searches for:

power shell

I generate a pattern like:

power[\s\u00A0]+shell|power|shell

Logic

  1. Try to match the full phrase first.
  2. If that fails, match individual tokens.

On paper it looks clean, and in isolation it works.

Step 2: Enter the DOM

Now the problem moves from plain string matching to DOM traversal.

Tasks

  • Walk the DOM while avoiding UI elements.
  • Skip , , , blocks.
  • Preserve syntax highlighting.
  • Replace only text nodes and keep the DOM structure intact.

A TreeWalker does the job:

const walker = document.createTreeWalker(root, NodeFilter.SHOW_TEXT, {
  acceptNode(node) {
    const p = node.parentElement;
    if (!p) return NodeFilter.FILTER_REJECT;

    if (p.closest("code, pre, script, style")) {
      return NodeFilter.FILTER_REJECT;
    }

    return NodeFilter.FILTER_ACCEPT;
  },
});

Now we’re not just applying a regex; we’re performing controlled DOM mutation.

Step 3: The Alternation Problem

Even though the phrase appears first in the alternation:

phrase|token1|token2

the engine still happily matches individual tokens (power, shell, PowerShell) depending on context.

The challenges become

  • Overlapping matches
  • Execution order
  • Resetting lastIndex
  • Avoiding double mutation
  • Preventing nested “ elements

Step 4: Two Passes?

I considered splitting the process:

  1. Try a phrase match.
  2. If none found, try token matches.

That sounds simple—until you realize the DOM may already have been mutated after the first pass, requiring state management across passes.

The Realisation

  • Regex problems are easy in isolation.
  • DOM mutation problems are easy in isolation.
  • Combining them multiplies complexity.

The line between a “simple feature” and a “mini search engine” is very thin.

Where I Am Now

  • The search works mostly.
  • Highlights are applied.
  • Protected blocks are skipped.
  • Structure is respected.

It’s not yet a full browser‑level Ctrl + F, but the core functionality is there.

I now respect the DOM far more than before, and I’ve come to appreciate how making JavaScript logic behave predictably inside a living DOM tree is the real challenge.

What I Learned

  • Regex is deterministic; the DOM is structural and stateful.
  • Once you start replacing text nodes, everything becomes delicate.
  • Edge cases, state management, and robust mutation handling are essential for a truly reliable feature.
0 views
Back to Blog

Related posts

Read more »