After event viewer crashed on a 400mb evtx, i wrote my own log triage cli

Published: 3 days ago (May 1, 2026 at 05:53 AM EDT)

5 min read

Source: Dev.to

Introduction

Last week I was poking through event logs from a home‑lab VM I suspected had been scanned hard. Dropping the .evtx into Event Viewer took 90 seconds to load, then it crashed the moment I tried to filter by event ID 4624.

Splunk is overkill for one machine. Wazuh requires infrastructure I didn’t want to set up just to look at a single file. pysigma converts Sigma rules to backend queries, but I didn’t have a backend. So I wrote ThreatLens.

It’s a CLI. Point it at a log file or directory, and get alerts mapped to MITRE ATT&CK.

threatlens scan logs/ --min-severity high

That’s the whole interface for the common case.

What I Actually Wanted

Three things, roughly in priority order:

Works on a single laptop with no infra – no daemon, no agent, no message queue. The only runtime dependency is pyyaml.
Reads the formats I actually have – .evtx (Windows native), json/ndjson (modern stuff), syslog (Linux), CEF (network gear).
Speaks Sigma – the community already has thousands of detection rules; I didn’t want to invent another rule format.

What I Tried First

Parsing – python-evtx worked fine for EVTX files.
Sigma handling – I first tried plyara, but it’s a YARA parser, not Sigma. Then I tried pysigma, which converts Sigma to backend queries. I needed in‑memory matching against parsed events, not query strings.

I ended up writing my own Sigma loader (~400 lines). It handles:

Selection blocks and field modifiers (|contains, |startswith, |endswith, |re, |all)
Complex conditions like selection and not filter or 1 of selection*

Operator precedence was tricky; my first parser evaluated a or b and c left‑to‑right and gave the wrong result half the time. After three rewrites it finally matched Sigma’s reference behavior.

The Part I’m Most Proud Of: Elasticsearch Output

I wanted to push alerts to Elasticsearch so they’d appear alongside other security data. The official elasticsearch Python client is ~40 MB and pulls in dozens of transitive dependencies I didn’t want to audit.

Remembering that the Bulk API is just newline‑delimited JSON over HTTP, I implemented a lightweight client using only the standard library:

import json, urllib.request

def push_alerts(alerts, url, index, api_key=None):
    lines = []
    for a in alerts:
        lines.append(json.dumps({"index": {"_index": index}}))
        lines.append(json.dumps(a.to_dict()))
    body = ("\n".join(lines) + "\n").encode("utf-8")

    headers = {"Content-Type": "application/x-ndjson"}
    if api_key:
        headers["Authorization"] = f"ApiKey {api_key}"

    req = urllib.request.Request(
        f"{url.rstrip('/')}/_bulk",
        data=body,
        headers=headers,
        method="POST",
    )
    with urllib.request.urlopen(req) as resp:
        return json.loads(resp.read())

Stdlib only. Works against real ES clusters, saves ~40 MB of install size, and removes a whole category of supply‑chain risk.

Attack‑Chain Correlation

Single alerts are noisy. “Failed logon” by itself means nothing, but a sequence like:

Failed logon burst
Privilege escalation
Lateral movement on the same account within a 10‑minute window

tells a story.

The chain detector groups alerts by username and timestamp, then walks them through kill‑chain order (credential access → privilege escalation → lateral movement → execution). If the order matches and the events fall inside a tunable time window, it fires a single high‑severity chain alert that links back to the constituent events.

On a 52‑event mixed‑noise dataset I wrote, it extracts two distinct chains with zero false positives on benign activity. A focused 26‑event simulation also lights up correctly.

The 12 Detectors

Each detector is a separate Python module subclassing a DetectionRule base:

Brute force
Lateral movement
Privilege escalation
Suspicious process
Defense evasion
Persistence
Discovery
Exfiltration
Kerberos attacks (kerberoasting and AS‑REP roasting)
Credential access (LSASS, SAM, DCSync)
Initial access (external RDP, after‑hours logons)
Chain correlator

Custom YAML rules and Sigma rules are loaded on top of these. You can also drop a .py file into --plugin-dir; the loader picks it up at scan time.

What’s Still Rough

HTML report – functional but the CSS is ugly; SVG donut chart for severity works, but typography needs polish.
Real‑world testing – I haven’t tested against an enterprise dataset; sample data is hand‑crafted to exercise specific detectors. A synthetic generator can produce 1 000‑event datasets, but synthetic isn’t real.
Sigma loader limitations – no support for count() by aggregations or cross‑rule correlations yet. Those are next on the roadmap.
EVTX parsing – requires python-evtx as an optional extra. Without it you must export to JSON first. I’d like to auto‑detect at runtime and fall back gracefully.

What I’d Do Differently

Write the Sigma test corpus before the parser; every fix would have been faster with real test cases in place.
Design the alert model around Elasticsearch field‑naming rules from day one; I had to rename three fields late because they weren’t valid ES field names.
Decide upfront whether ThreatLens is a tool or a framework. The plugin system pushed it toward a framework, and the ambiguity cost some design clarity.

After event viewer crashed on a 400mb evtx, i wrote my own log triage cli

Introduction

What I Actually Wanted

What I Tried First

The Part I’m Most Proud Of: Elasticsearch Output

Attack‑Chain Correlation

The 12 Detectors

What’s Still Rough

What I’d Do Differently

Links

Related posts

The smarter the model, the more it saves.

Caching AI Responses in a Desktop App — Don't Pay Twice for the Same Question

LLM386: borrowing a 1990s idea for managing LLM context

Token Consumption Anxiety and the Open Source App I Built to Solve It