I tried parsing emails with regex. It went exactly how you think.
Source: Dev.to
Introduction
Recently I needed to process incoming emails automatically.
The idea sounded simple:
Email arrives → extract some fields → trigger a webhook
Things like:
- order confirmations
- invoice emails
- shipping notifications
- support messages
Nothing complicated. Or so I thought.
Attempt #1 — Regex
Like most developers, I started with regex.
const price = email.match(/Total:\s\$(\d+)/)
For the first email it worked perfectly. Then the next email came in and said:
Amount paid: $29
Another one said:
Total price: USD 29
Then an HTML email arrived with nested tables, inline styles, and formatting from what looked like 2004 Outlook templates.
At this point my regex slowly evolved into something like this:
/(Total|Amount|Price).*?(\$|USD)?\s?(\d+(\.\d+)?)/
Which is usually the moment you realize the approach is already doomed.
Attempt #2 — Parsing the HTML
Okay fine. Let’s parse the HTML instead.
const dom = new JSDOM(emailHtml)
Which sometimes worked. Except email HTML is a special kind of chaos:
- tables inside tables
- inline styles everywhere
- different layouts for every sender
And suddenly you’re maintaining custom parsers for every email format.
The real problem
Emails aren’t structured data. They’re written for humans, not machines. Every sender formats them differently, and trying to enforce rigid parsing rules becomes fragile very quickly.
The obvious solution (in hindsight)
Instead of trying to force strict parsing rules, why not let AI interpret the email and extract the fields you want?
Example email:
Subject: Order confirmation
Customer: John Smith
Product: T-shirt
Total: $39
Structured output:
{
"customer": "John Smith",
"product": "T-shirt",
"total": 39
}
Now your backend receives clean structured data instead of raw email text.
So I built a small tool
Mostly because I kept running into this problem again and again. It’s called ParseForce.
The flow is simple:
Incoming email → AI parsing → structured JSON → webhook
You:
- Get a unique inbox
- Send emails to it
- Define the schema you want
Receive structured JSON in your webhook. That’s it.
Some things it works well for
So far I’ve been using it for:
- parsing order confirmation emails
- extracting invoice data
- processing lead emails
- triggering automation workflows
Basically anything where an email contains data you want your system to understand.
If you’re curious
You can check it out here: 👉 https://parseforce.io
I’m also curious how others deal with this problem. Are you using regex, templates, or something else entirely?
Tags: node, webdev, saas, automation, ai