When Regex Meets the DOM (And Suddenly It’s Not Simple Anymore)
Source: Dev.to

Goals
- Support multi‑word queries
- Prefer full‑phrase matches
- Fall back to individual token matches
- Highlight results in the DOM
- Skip
andblocks
In my head: “Easy. Just build a regex.”
Step 1: Build the Regex
If a user searches for:
power shell
I generate a pattern like:
power[\s\u00A0]+shell|power|shell
Logic
- Try to match the full phrase first.
- If that fails, match individual tokens.
On paper it looks clean, and in isolation it works.
Step 2: Enter the DOM
Now the problem moves from plain string matching to DOM traversal.
Tasks
- Walk the DOM while avoiding UI elements.
- Skip
,,,blocks. - Preserve syntax highlighting.
- Replace only text nodes and keep the DOM structure intact.
A TreeWalker does the job:
const walker = document.createTreeWalker(root, NodeFilter.SHOW_TEXT, {
acceptNode(node) {
const p = node.parentElement;
if (!p) return NodeFilter.FILTER_REJECT;
if (p.closest("code, pre, script, style")) {
return NodeFilter.FILTER_REJECT;
}
return NodeFilter.FILTER_ACCEPT;
},
});
Now we’re not just applying a regex; we’re performing controlled DOM mutation.
Step 3: The Alternation Problem
Even though the phrase appears first in the alternation:
phrase|token1|token2
the engine still happily matches individual tokens (power, shell, PowerShell) depending on context.
The challenges become
- Overlapping matches
- Execution order
- Resetting
lastIndex - Avoiding double mutation
- Preventing nested “ elements
Step 4: Two Passes?
I considered splitting the process:
- Try a phrase match.
- If none found, try token matches.
That sounds simple—until you realize the DOM may already have been mutated after the first pass, requiring state management across passes.
The Realisation
- Regex problems are easy in isolation.
- DOM mutation problems are easy in isolation.
- Combining them multiplies complexity.
The line between a “simple feature” and a “mini search engine” is very thin.
Where I Am Now
- The search works mostly.
- Highlights are applied.
- Protected blocks are skipped.
- Structure is respected.
It’s not yet a full browser‑level Ctrl + F, but the core functionality is there.
I now respect the DOM far more than before, and I’ve come to appreciate how making JavaScript logic behave predictably inside a living DOM tree is the real challenge.
What I Learned
- Regex is deterministic; the DOM is structural and stateful.
- Once you start replacing text nodes, everything becomes delicate.
- Edge cases, state management, and robust mutation handling are essential for a truly reliable feature.