Go's Regexp is Slow. So I Built My Own - 3000x Faster
Source: Dev.to
The Problem
Go’s regexp package uses a Thompson NFA exclusively. It guarantees O(n) time and prevents ReDoS, but it’s slow:
// Searching for "error" in 1 MB file
re := regexp.MustCompile(`.*error.*`)
start := time.Now()
re.Find(data) // ~27 ms
Why?
- No SIMD – processes bytes sequentially
- No pre‑filters – scans the whole input even when the pattern can’t match
- Single engine for all patterns
- Allocation overhead in hot paths
On the same pattern, Rust’s regex crate runs in ~21 µs (≈1 285× faster).
The Rust Rabbit Hole
Rust achieves speed with a multi‑engine architecture:
- Literal extraction – pull fixed strings (
.*error.*→ “error”) - SIMD pre‑filter – AVX2 byte search (≈12× faster)
- Strategy selection – DFA for simple patterns, NFA for complex, reverse search for suffixes
- Specialized engines for common cases
- Zero allocations – object pools, pre‑allocated buffers
Go’s regexp lacks all of this.
Building coregex – Month by Month
Month 1‑2: SIMD Primitives
Implemented AVX2‑based memchr in assembly:
// Find byte in slice using AVX2 (32 bytes parallel)
TEXT ·memchr(SB), NOSPLIT, $0-40
MOVQ haystack+0(FP), DI
MOVQ len+8(FP), CX
MOVBQZX needle+24(FP), AX
// Broadcast needle to YMM0
VMOVD AX, X0
VPBROADCASTB X0, Y0
loop:
VMOVDQU (DI), Y1 // Load 32 bytes
VPCMPEQB Y0, Y1, Y2 // Compare (parallel)
VPMOVMSKB Y2, AX // Extract mask
TESTL AX, AX
JNZ found
// … continue
Benchmark (finding ‘e’ in 64 KB):
| Implementation | Time |
|---|---|
stdlib bytes.IndexByte | 8 367 ns |
coregex SIMD memchr | 679 ns |
| Speedup | 12.3× |
Follow‑up work added memmem (substring search) and a multi‑pattern SIMD engine (teddy).
Month 3‑4: Strategy Selection
Different patterns need different engines. Coregex builds a meta‑engine that selects the optimal strategy:
func selectStrategy(p *syntax.Regexp) Strategy {
prefix := extractPrefix(p)
suffix := extractSuffix(p)
inner := extractInner(p)
switch {
case len(suffix) >= 3:
return UseReverseSuffix // e.g. .*\.txt
case len(inner) >= 3:
return UseReverseInner // e.g. .*error.*
case len(prefix) >= 3:
return UseDFA
default:
return UseAdaptive // try DFA, fallback NFA
}
}
Month 5: ReverseInner – The Killer Optimization
Problem: .*error.*connection.* on a large log.
Stdlib scans every position, backtracking, taking ~12 ms.
coregex:
- SIMD scan for “connection” → ~200 ns
- Reverse DFA from that point to find “error” → fast
- Forward DFA to satisfy surrounding
.*→ fast - Return the first leftmost‑first match.
Benchmark (250 KB file):
| Implementation | Time | Speedup |
|---|---|---|
stdlib regexp | 12.6 ms | – |
| coregex ReverseInner | 4 µs | ≈3 154× |
Month 6: Zero Allocations
Hot paths were profiled with pprof and all allocations eliminated:
- Object pools for DFA states
- Pre‑allocated buffers
- Stack‑only data structures
Result:
BenchmarkIsMatch-8 1000000 1.2 µs/op 0 B/op 0 allocs/op
How It Actually Works
Step‑by‑step for .*error.*connection.* on a 250 KB log with five “connection” occurrences:
- SIMD pre‑filter → positions
[1245, 5678, …] - For each candidate:
- Reverse DFA scans backward → finds “error” at 1230
- Forward DFA validates the surrounding
.* - Return the first successful match
Only a handful of candidates are examined, not 250 000 positions, yielding the 3000× speedup.
The Architecture
┌─────────────────────┐
│ Meta‑Engine │ ← Strategy Selection
└───────┬─────────────┘
│
┌────▼─────┐
│ Prefilter│──► memchr (AVX2)
└────┬─────┘──► memmem (SIMD)
│ └─► teddy (multi‑pattern)
│
┌──────▼─────────────┐
│ Engines │
│ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │ DFA │ │ NFA │ │ Rev │ │
│ └─────┘ └─────┘ └─────┘ │
└───────────────────────┘
Key idea: Match each pattern with the algorithm that fits it best.
Real Benchmarks
| Pattern | Size | stdlib | coregex | Speedup |
|---|---|---|---|---|
.*\.txt$ | 1 MB | 27 ms | 21 µs | 1 314× |
.*error.* | 250 KB | 12.6 ms | 4 µs | 3 154× |
(?i)error | 32 KB | 1.23 ms | 4.7 µs | 263× |
\w+@\w+\.\w+ | 1 KB | 688 ns | 196 ns | 3.5× |
Even simple patterns see measurable gains.
API – Drop‑In Replacement
// Change one import line
// import "regexp"
import "github.com/coregx/coregex"
re := coregex.MustCompile(`.*error.*`)
matches := re.Find(data) // 3000× faster, same API
Supported methods (full compatibility):
Find,FindAll,Match,MatchStringFindSubmatch,SubexpNames(capture groups)Replace,ReplaceAll,Split- Unicode handling via
regexp/syntax
Missing feature: Backreferences (they require backtracking and break O(n) guarantees).
When to Use It
Use coregex if:
- Regex processing is a proven bottleneck (profile first)
- Patterns contain wildcards (
.*error.*) or suffixes (.*\.txt) - You need high‑throughput log or trace parsing
- Performance matters more than API stability
Stick with stdlib if:
- Regex isn’t a hotspot
- You need the stability of Go’s standard library (coregex is still v0.8)
- You require backreferences or other exotic features
Typical Speedups by Pattern Type
| Pattern type | Expected speedup |
|---|---|
Suffix (.*\.txt) | 1 000–1 500× |
Inner wildcard (.*error.* | 2 000–3 000× |
| Case‑insensitive | 100–300× |
| Simple literals | 2–10× |
Part of CoreGX
coregex is the first library released by the CoreGX organization (github.com/coregx). The mission is to build production‑grade Go tools that should exist but don’t.