After event viewer crashed on a 400mb evtx, i wrote my own log triage cli
Source: Dev.to
Introduction
Last week I was poking through event logs from a home‑lab VM I suspected had been scanned hard. Dropping the .evtx into Event Viewer took 90 seconds to load, then it crashed the moment I tried to filter by event ID 4624.
Splunk is overkill for one machine. Wazuh requires infrastructure I didn’t want to set up just to look at a single file. pysigma converts Sigma rules to backend queries, but I didn’t have a backend. So I wrote ThreatLens.
It’s a CLI. Point it at a log file or directory, and get alerts mapped to MITRE ATT&CK.
threatlens scan logs/ --min-severity high
That’s the whole interface for the common case.
What I Actually Wanted
Three things, roughly in priority order:
- Works on a single laptop with no infra – no daemon, no agent, no message queue. The only runtime dependency is
pyyaml. - Reads the formats I actually have – .evtx (Windows native), json/ndjson (modern stuff), syslog (Linux), CEF (network gear).
- Speaks Sigma – the community already has thousands of detection rules; I didn’t want to invent another rule format.
What I Tried First
- Parsing –
python-evtxworked fine for EVTX files. - Sigma handling – I first tried
plyara, but it’s a YARA parser, not Sigma. Then I triedpysigma, which converts Sigma to backend queries. I needed in‑memory matching against parsed events, not query strings.
I ended up writing my own Sigma loader (~400 lines). It handles:
- Selection blocks and field modifiers (
|contains,|startswith,|endswith,|re,|all) - Complex conditions like
selection and not filteror1 of selection*
Operator precedence was tricky; my first parser evaluated a or b and c left‑to‑right and gave the wrong result half the time. After three rewrites it finally matched Sigma’s reference behavior.
The Part I’m Most Proud Of: Elasticsearch Output
I wanted to push alerts to Elasticsearch so they’d appear alongside other security data. The official elasticsearch Python client is ~40 MB and pulls in dozens of transitive dependencies I didn’t want to audit.
Remembering that the Bulk API is just newline‑delimited JSON over HTTP, I implemented a lightweight client using only the standard library:
import json, urllib.request
def push_alerts(alerts, url, index, api_key=None):
lines = []
for a in alerts:
lines.append(json.dumps({"index": {"_index": index}}))
lines.append(json.dumps(a.to_dict()))
body = ("\n".join(lines) + "\n").encode("utf-8")
headers = {"Content-Type": "application/x-ndjson"}
if api_key:
headers["Authorization"] = f"ApiKey {api_key}"
req = urllib.request.Request(
f"{url.rstrip('/')}/_bulk",
data=body,
headers=headers,
method="POST",
)
with urllib.request.urlopen(req) as resp:
return json.loads(resp.read())
Stdlib only. Works against real ES clusters, saves ~40 MB of install size, and removes a whole category of supply‑chain risk.
Attack‑Chain Correlation
Single alerts are noisy. “Failed logon” by itself means nothing, but a sequence like:
- Failed logon burst
- Privilege escalation
- Lateral movement on the same account within a 10‑minute window
tells a story.
The chain detector groups alerts by username and timestamp, then walks them through kill‑chain order (credential access → privilege escalation → lateral movement → execution). If the order matches and the events fall inside a tunable time window, it fires a single high‑severity chain alert that links back to the constituent events.
On a 52‑event mixed‑noise dataset I wrote, it extracts two distinct chains with zero false positives on benign activity. A focused 26‑event simulation also lights up correctly.
The 12 Detectors
Each detector is a separate Python module subclassing a DetectionRule base:
- Brute force
- Lateral movement
- Privilege escalation
- Suspicious process
- Defense evasion
- Persistence
- Discovery
- Exfiltration
- Kerberos attacks (kerberoasting and AS‑REP roasting)
- Credential access (LSASS, SAM, DCSync)
- Initial access (external RDP, after‑hours logons)
- Chain correlator
Custom YAML rules and Sigma rules are loaded on top of these. You can also drop a .py file into --plugin-dir; the loader picks it up at scan time.
What’s Still Rough
- HTML report – functional but the CSS is ugly; SVG donut chart for severity works, but typography needs polish.
- Real‑world testing – I haven’t tested against an enterprise dataset; sample data is hand‑crafted to exercise specific detectors. A synthetic generator can produce 1 000‑event datasets, but synthetic isn’t real.
- Sigma loader limitations – no support for
count() byaggregations or cross‑rule correlations yet. Those are next on the roadmap. - EVTX parsing – requires
python-evtxas an optional extra. Without it you must export to JSON first. I’d like to auto‑detect at runtime and fall back gracefully.
What I’d Do Differently
- Write the Sigma test corpus before the parser; every fix would have been faster with real test cases in place.
- Design the alert model around Elasticsearch field‑naming rules from day one; I had to rename three fields late because they weren’t valid ES field names.
- Decide upfront whether ThreatLens is a tool or a framework. The plugin system pushed it toward a framework, and the ambiguity cost some design clarity.
Links
- Repository:
If you do detection work and the Sigma compatibility breaks on a real community rule, please open an issue. That’s the part I most want stress‑tested.