How I Compiled 647 Semgrep Rules to Native Rust

Published: (February 5, 2026 at 05:31 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

I love Semgrep. It has thousands of community‑contributed security rules that catch real vulnerabilities. But every time I ran it on a large codebase, I’d wait… and wait.

The problem? Semgrep interprets YAML rules at runtime using Python. For a 500 K‑line monorepo, that meant 4+ minutes per scan.

So I asked myself: what if I compiled those rules to native code instead?

The Idea

Semgrep rules are just pattern matching. A rule like this:

rules:
  - id: sql-injection
    pattern: execute($QUERY)
    message: "Possible SQL injection"

says “find any call to execute() with one argument.” That’s not fundamentally different from what Tree‑sitter does with its query language.

What if I translated Semgrep patterns into Tree‑sitter queries at build time, embedded them in the binary, and matched against ASTs directly?

The Hard Part: Metavariables

Semgrep uses $VARIABLES to capture arbitrary code:

eval($USER_INPUT)

This matches eval(x), eval(foo.bar), eval(getInput()) — anything.

Tree‑sitter queries don’t have metavariables; they have captures:

(call_expression
  function: (identifier) @func
  arguments: (arguments (_) @arg))

The @func and @arg are captures — they grab whatever matches that position.

So I built a translator. It parses Semgrep patterns, identifies metavariables, and generates Tree‑sitter queries with captures in the right places.

// Simplified version of the pattern compiler
fn compile_pattern(semgrep: &str) -> TreeSitterQuery {
    let ast = parse_semgrep_pattern(semgrep);
    let mut query = String::new();
    for node in ast.walk() {
        match node {
            Metavar(name) => {
                // $X becomes (_) @x
                query.push_str(&format!("(_) @{}", name.to_lowercase()));
            }
            Literal(text) => {
                query.push_str(&format!("\"{}\"", text));
            }
            // ... more cases
        }
    }
    TreeSitterQuery::new(&query)
}

The Ellipsis Problem

Semgrep’s ... operator matches “zero or more of anything”:

func($ARG, ...)

This matches func(a), func(a, b), func(a, b, c, d, e).

Tree‑sitter queries can’t express this directly. For such patterns I fall back to walking the AST manually and checking if the structure matches. It isn’t as fast as native queries, but it’s still faster than Python interpretation.

Build‑Time Compilation

The magic happens in build.rs. At compile time:

  • Parse all 647 Semgrep YAML files
  • Translate each pattern to a Tree‑sitter query (or AST walker)
  • Serialize everything to a binary blob

Then embed it with include_bytes!():

// In the compiled binary
static RULES: &[u8] = include_bytes!("compiled_rules.bin");

// At runtime – instant loading
fn load_rules() -> RuleSet {
    bincode::deserialize(RULES).unwrap()
}

No file I/O, no YAML parsing, no pattern compilation at runtime. The rules are just there.

Results

On a 500 K LOC monorepo:

ToolTime
Semgrep4 m 12 s
RMA23 s

About 10× faster. The difference grows as codebases get larger.

What’s Still Rough

  • False positives on generated code (working on better heuristics)
  • Some Semgrep features aren’t supported yet (taint mode is partial)
  • Error messages could be clearer

Try It

cargo install rma-cli
rma scan .

Or with the interactive TUI:

rma scan . --interactive

It’s MIT licensed:

I’d love feedback, especially if you try it on your own projects. What rules are missing? Too many false positives? Let me know.

If you’re interested in the pattern compiler implementation, check out crates/rules/build.rs in the repo.

Back to Blog

Related posts

Read more »

Simple Rust Program to Add Two Vectors

Linear algebra is the backbone of modern computing. GPUs utilize vector addition to calculate pixel positions and lighting for real‑time 3D rendering. Similarly...