Go's Regexp is Slow. So I Built My Own - 3000x Faster

Published: 5 days ago (November 29, 2025 at 04:27 PM EST)

3 min read

Source: Dev.to

The Problem

Go’s regexp package uses a Thompson NFA exclusively. It guarantees O(n) time and prevents ReDoS, but it’s slow:

// Searching for "error" in 1 MB file
re := regexp.MustCompile(`.*error.*`)
start := time.Now()
re.Find(data) // ~27 ms

Why?

No SIMD – processes bytes sequentially
No pre‑filters – scans the whole input even when the pattern can’t match
Single engine for all patterns
Allocation overhead in hot paths

On the same pattern, Rust’s regex crate runs in ~21 µs (≈1 285× faster).

The Rust Rabbit Hole

Rust achieves speed with a multi‑engine architecture:

Literal extraction – pull fixed strings (.*error.* → “error”)
SIMD pre‑filter – AVX2 byte search (≈12× faster)
Strategy selection – DFA for simple patterns, NFA for complex, reverse search for suffixes
Specialized engines for common cases
Zero allocations – object pools, pre‑allocated buffers

Go’s regexp lacks all of this.

Building coregex – Month by Month

Month 1‑2: SIMD Primitives

Implemented AVX2‑based memchr in assembly:

// Find byte in slice using AVX2 (32 bytes parallel)
TEXT ·memchr(SB), NOSPLIT, $0-40
    MOVQ    haystack+0(FP), DI
    MOVQ    len+8(FP), CX
    MOVBQZX needle+24(FP), AX

    // Broadcast needle to YMM0
    VMOVD   AX, X0
    VPBROADCASTB X0, Y0

loop:
    VMOVDQU (DI), Y1       // Load 32 bytes
    VPCMPEQB Y0, Y1, Y2    // Compare (parallel)
    VPMOVMSKB Y2, AX       // Extract mask
    TESTL   AX, AX
    JNZ     found
    // … continue

Benchmark (finding ‘e’ in 64 KB):

Implementation	Time
stdlib `bytes.IndexByte`	8 367 ns
coregex SIMD `memchr`	679 ns
Speedup	12.3×

Follow‑up work added memmem (substring search) and a multi‑pattern SIMD engine (teddy).

Month 3‑4: Strategy Selection

Different patterns need different engines. Coregex builds a meta‑engine that selects the optimal strategy:

func selectStrategy(p *syntax.Regexp) Strategy {
    prefix := extractPrefix(p)
    suffix := extractSuffix(p)
    inner  := extractInner(p)

    switch {
    case len(suffix) >= 3:
        return UseReverseSuffix   // e.g. .*\.txt
    case len(inner) >= 3:
        return UseReverseInner    // e.g. .*error.*
    case len(prefix) >= 3:
        return UseDFA
    default:
        return UseAdaptive        // try DFA, fallback NFA
    }
}

Month 5: ReverseInner – The Killer Optimization

Problem: .*error.*connection.* on a large log.

Stdlib scans every position, backtracking, taking ~12 ms.

coregex:

SIMD scan for “connection” → ~200 ns
Reverse DFA from that point to find “error” → fast
Forward DFA to satisfy surrounding .* → fast
Return the first leftmost‑first match.

Benchmark (250 KB file):

Implementation	Time	Speedup
stdlib `regexp`	12.6 ms	–
coregex ReverseInner	4 µs	≈3 154×

Month 6: Zero Allocations

Hot paths were profiled with pprof and all allocations eliminated:

Object pools for DFA states
Pre‑allocated buffers
Stack‑only data structures

Result:

BenchmarkIsMatch-8  1000000  1.2 µs/op  0 B/op  0 allocs/op

How It Actually Works

Step‑by‑step for .*error.*connection.* on a 250 KB log with five “connection” occurrences:

SIMD pre‑filter → positions [1245, 5678, …]
For each candidate:
- Reverse DFA scans backward → finds “error” at 1230
- Forward DFA validates the surrounding .*
- Return the first successful match

Only a handful of candidates are examined, not 250 000 positions, yielding the 3000× speedup.

The Architecture

┌─────────────────────┐
│   Meta‑Engine       │   ← Strategy Selection
└───────┬─────────────┘
        │
   ┌────▼─────┐
   │ Prefilter│──► memchr (AVX2)
   └────┬─────┘──► memmem (SIMD)
        │        └─► teddy (multi‑pattern)
        │
 ┌──────▼─────────────┐
 │   Engines          │
 │ ┌─────┐ ┌─────┐ ┌─────┐ │
 │ │ DFA │ │ NFA │ │ Rev │ │
 │ └─────┘ └─────┘ └─────┘ │
 └───────────────────────┘

Key idea: Match each pattern with the algorithm that fits it best.

Real Benchmarks

Pattern	Size	stdlib	coregex	Speedup
`.*\.txt$`	1 MB	27 ms	21 µs	1 314×
`.error.`	250 KB	12.6 ms	4 µs	3 154×
`(?i)error`	32 KB	1.23 ms	4.7 µs	263×
`\w+@\w+\.\w+`	1 KB	688 ns	196 ns	3.5×

Even simple patterns see measurable gains.

API – Drop‑In Replacement

// Change one import line
// import "regexp"
import "github.com/coregx/coregex"

re := coregex.MustCompile(`.*error.*`)
matches := re.Find(data) // 3000× faster, same API

Supported methods (full compatibility):

Find, FindAll, Match, MatchString
FindSubmatch, SubexpNames (capture groups)
Replace, ReplaceAll, Split
Unicode handling via regexp/syntax

Missing feature: Backreferences (they require backtracking and break O(n) guarantees).

When to Use It

Use coregex if:

Regex processing is a proven bottleneck (profile first)
Patterns contain wildcards (.*error.*) or suffixes (.*\.txt)
You need high‑throughput log or trace parsing
Performance matters more than API stability

Stick with stdlib if:

Regex isn’t a hotspot
You need the stability of Go’s standard library (coregex is still v0.8)
You require backreferences or other exotic features

Typical Speedups by Pattern Type

Pattern type	Expected speedup
Suffix (`.*\.txt`)	1 000–1 500×
Inner wildcard (`.error.`	2 000–3 000×
Case‑insensitive	100–300×
Simple literals	2–10×

Part of CoreGX

coregex is the first library released by the CoreGX organization (github.com/coregx). The mission is to build production‑grade Go tools that should exist but don’t.

Go's Regexp is Slow. So I Built My Own - 3000x Faster

The Problem

The Rust Rabbit Hole

Building coregex – Month by Month

Month 1‑2: SIMD Primitives

Month 3‑4: Strategy Selection

Month 5: ReverseInner – The Killer Optimization

Month 6: Zero Allocations

How It Actually Works

The Architecture

Real Benchmarks

API – Drop‑In Replacement

When to Use It

Typical Speedups by Pattern Type

Part of CoreGX

Related posts

The React Native New Architecture Migration Process for 2026

Bf-Trees: Breaking the Page Barrier

Thoughts on Go vs. Rust vs. Zig

React Performance Optimization: 12 Tips Every Developer Should Know

The Problem

The Rust Rabbit Hole

Building coregex – Month by Month

Month 1‑2: SIMD Primitives

Month 3‑4: Strategy Selection

Month 5: ReverseInner – The Killer Optimization

Month 6: Zero Allocations

How It Actually Works

The Architecture

Real Benchmarks

API – Drop‑In Replacement

When to Use It

Typical Speedups by Pattern Type

Part of CoreGX

Related posts

The React Native New Architecture Migration Process for 2026

Bf-Trees: Breaking the Page Barrier

Thoughts on Go vs. Rust vs. Zig

React Performance Optimization: 12 Tips Every Developer Should Know

Month 1‑2: SIMD Primitives

Month 3‑4: Strategy Selection

Month 5: ReverseInner – The Killer Optimization

Month 6: Zero Allocations