Go's Regexp is Slow. So I Built My Own - 3000x Faster

Published: (November 29, 2025 at 04:27 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

The Problem

Go’s regexp package uses a Thompson NFA exclusively. It guarantees O(n) time and prevents ReDoS, but it’s slow:

// Searching for "error" in 1 MB file
re := regexp.MustCompile(`.*error.*`)
start := time.Now()
re.Find(data) // ~27 ms

Why?

  • No SIMD – processes bytes sequentially
  • No pre‑filters – scans the whole input even when the pattern can’t match
  • Single engine for all patterns
  • Allocation overhead in hot paths

On the same pattern, Rust’s regex crate runs in ~21 µs (≈1 285× faster).

The Rust Rabbit Hole

Rust achieves speed with a multi‑engine architecture:

  • Literal extraction – pull fixed strings (.*error.* → “error”)
  • SIMD pre‑filter – AVX2 byte search (≈12× faster)
  • Strategy selection – DFA for simple patterns, NFA for complex, reverse search for suffixes
  • Specialized engines for common cases
  • Zero allocations – object pools, pre‑allocated buffers

Go’s regexp lacks all of this.

Building coregex – Month by Month

Month 1‑2: SIMD Primitives

Implemented AVX2‑based memchr in assembly:

// Find byte in slice using AVX2 (32 bytes parallel)
TEXT ·memchr(SB), NOSPLIT, $0-40
    MOVQ    haystack+0(FP), DI
    MOVQ    len+8(FP), CX
    MOVBQZX needle+24(FP), AX

    // Broadcast needle to YMM0
    VMOVD   AX, X0
    VPBROADCASTB X0, Y0

loop:
    VMOVDQU (DI), Y1       // Load 32 bytes
    VPCMPEQB Y0, Y1, Y2    // Compare (parallel)
    VPMOVMSKB Y2, AX       // Extract mask
    TESTL   AX, AX
    JNZ     found
    // … continue

Benchmark (finding ‘e’ in 64 KB):

ImplementationTime
stdlib bytes.IndexByte8 367 ns
coregex SIMD memchr679 ns
Speedup12.3×

Follow‑up work added memmem (substring search) and a multi‑pattern SIMD engine (teddy).

Month 3‑4: Strategy Selection

Different patterns need different engines. Coregex builds a meta‑engine that selects the optimal strategy:

func selectStrategy(p *syntax.Regexp) Strategy {
    prefix := extractPrefix(p)
    suffix := extractSuffix(p)
    inner  := extractInner(p)

    switch {
    case len(suffix) >= 3:
        return UseReverseSuffix   // e.g. .*\.txt
    case len(inner) >= 3:
        return UseReverseInner    // e.g. .*error.*
    case len(prefix) >= 3:
        return UseDFA
    default:
        return UseAdaptive        // try DFA, fallback NFA
    }
}

Month 5: ReverseInner – The Killer Optimization

Problem: .*error.*connection.* on a large log.

Stdlib scans every position, backtracking, taking ~12 ms.

coregex:

  1. SIMD scan for “connection” → ~200 ns
  2. Reverse DFA from that point to find “error” → fast
  3. Forward DFA to satisfy surrounding .* → fast
  4. Return the first leftmost‑first match.

Benchmark (250 KB file):

ImplementationTimeSpeedup
stdlib regexp12.6 ms
coregex ReverseInner4 µs≈3 154×

Month 6: Zero Allocations

Hot paths were profiled with pprof and all allocations eliminated:

  • Object pools for DFA states
  • Pre‑allocated buffers
  • Stack‑only data structures

Result:

BenchmarkIsMatch-8  1000000  1.2 µs/op  0 B/op  0 allocs/op

How It Actually Works

Step‑by‑step for .*error.*connection.* on a 250 KB log with five “connection” occurrences:

  1. SIMD pre‑filter → positions [1245, 5678, …]
  2. For each candidate:
    • Reverse DFA scans backward → finds “error” at 1230
    • Forward DFA validates the surrounding .*
    • Return the first successful match

Only a handful of candidates are examined, not 250 000 positions, yielding the 3000× speedup.

The Architecture

┌─────────────────────┐
│   Meta‑Engine       │   ← Strategy Selection
└───────┬─────────────┘

   ┌────▼─────┐
   │ Prefilter│──► memchr (AVX2)
   └────┬─────┘──► memmem (SIMD)
        │        └─► teddy (multi‑pattern)

 ┌──────▼─────────────┐
 │   Engines          │
 │ ┌─────┐ ┌─────┐ ┌─────┐ │
 │ │ DFA │ │ NFA │ │ Rev │ │
 │ └─────┘ └─────┘ └─────┘ │
 └───────────────────────┘

Key idea: Match each pattern with the algorithm that fits it best.

Real Benchmarks

PatternSizestdlibcoregexSpeedup
.*\.txt$1 MB27 ms21 µs1 314×
.*error.*250 KB12.6 ms4 µs3 154×
(?i)error32 KB1.23 ms4.7 µs263×
\w+@\w+\.\w+1 KB688 ns196 ns3.5×

Even simple patterns see measurable gains.

API – Drop‑In Replacement

// Change one import line
// import "regexp"
import "github.com/coregx/coregex"

re := coregex.MustCompile(`.*error.*`)
matches := re.Find(data) // 3000× faster, same API

Supported methods (full compatibility):

  • Find, FindAll, Match, MatchString
  • FindSubmatch, SubexpNames (capture groups)
  • Replace, ReplaceAll, Split
  • Unicode handling via regexp/syntax

Missing feature: Backreferences (they require backtracking and break O(n) guarantees).

When to Use It

Use coregex if:

  • Regex processing is a proven bottleneck (profile first)
  • Patterns contain wildcards (.*error.*) or suffixes (.*\.txt)
  • You need high‑throughput log or trace parsing
  • Performance matters more than API stability

Stick with stdlib if:

  • Regex isn’t a hotspot
  • You need the stability of Go’s standard library (coregex is still v0.8)
  • You require backreferences or other exotic features

Typical Speedups by Pattern Type

Pattern typeExpected speedup
Suffix (.*\.txt)1 000–1 500×
Inner wildcard (.*error.*2 000–3 000×
Case‑insensitive100–300×
Simple literals2–10×

Part of CoreGX

coregex is the first library released by the CoreGX organization (github.com/coregx). The mission is to build production‑grade Go tools that should exist but don’t.

Back to Blog

Related posts

Read more »

Bf-Trees: Breaking the Page Barrier

Hello, I'm Maneshwar. I'm working on FreeDevTools – an online, open‑source hub that consolidates dev tools, cheat codes, and TLDRs in one place, making it easy...

Thoughts on Go vs. Rust vs. Zig

Article URL: https://sinclairtarget.com/blog/2025/08/thoughts-on-go-vs.-rust-vs.-zig/ Comments URL: https://news.ycombinator.com/item?id=46153466 Points: 11 Com...