How I Found 1,370 Fraudsters Hiding in Our Data (And Saved My Company $51,000)

Published: (December 29, 2025 at 06:57 PM EST)
4 min read
Source: Dev.to

Source: Dev.to – How I found 1,370 fraudsters hiding in our data and saved my company $51,000

The First Clue: When Numbers Tell a Story

Opening the data felt like looking at two different worlds.

  • Credit‑card transactions: fraud in only 0.5 % of cases – tiny red dots in a sea of green.
  • E‑commerce platform: nearly 1 in 3 transactions were fraudulent.

I remember thinking,

“How are we even still in business?”

That’s when I built my first visualization—side‑by‑side bars showing the stark difference. Seeing it visually made the problem real; it wasn’t just numbers anymore, it was a pattern screaming for attention.

Side‑by‑side bar chart of fraud rates in credit‑card vs. e‑commerce data

The Breakthrough: The 1‑Hour Rule

It started as a hunch: “What if fraudsters work fast?”

I created a simple calculation: hours between account creation and first purchase. When I plotted it, my coffee went cold.

There it was—a massive spike at the beginning. Transactions within the first hour had a 99.5 % fraud rate—6 685 cases of “sign up, steal, disappear.”

The visualization looked like a mountain with the peak shoved all the way to the left. It was so clear, so obvious. How had we missed this?

Histogram of fraud rate by hours since account creation

Building the Fraud Catchers

ChannelModelReasonResults
Credit‑cardXGBoostPowerful ensemble that learns complex interactions76 fraudsters caught, 15 false alarms
E‑commerceLogistic RegressionHigh interpretability for customer‑facing decisions1 370 frauds caught (vs. 1 409 possible) with clear explanations

My model‑comparison chart tells the story—different problems need different tools.

Model comparison chart for credit‑card and e‑commerce fraud detection

The Most Fascinating Part: Asking “Why?”

Using SHAP felt like putting on X‑ray glasses. Suddenly I could see what the model was thinking.

  • The top predictors weren’t what I expected. An anonymized V4 feature mattered most, followed by our custom anomaly score.
  • The model was finding patterns in places I hadn’t even looked.

The real magic was in the individual cases. A SHAP force plot for a caught $257 fraud let me trace exactly why—the timing, a weird V14 value, and the new account. It wasn’t magic; it was math we could explain.

SHAP force plot for a $257 fraud case

From Insights to Action: Three Changes We’re Making

  1. The 1‑Hour Checkpoint
    Starting Monday, any purchase within an hour of signup will trigger a gentle extra verification step (e.g., “Hey, confirm this is you?”). Based on our data, this alone could stop thousands of fraudulent attempts.

  2. Smarter Geography
    We found countries with shockingly high fraud rates (looking at you, Turkmenistan at 100 %). Rather than blanket blocks, we’ll add intelligent scrutiny: legitimate customers get through, fraudsters hit roadblocks.

  3. Dynamic Decisions
    Our confusion matrices showed we need different approaches.

    • Credit‑card channel: prioritize precision — be super sure before flagging.
    • E‑commerce channel: prioritize recall — catch more frauds while maintaining explainability.

Confusion matrices for credit‑card and e‑commerce models

The Business Impact (Or: How I Justified My Salary)

Let’s talk numbers

  • Test‑data impact: $51,000 saved
  • Monthly projection: $200,000+
  • Annual potential: Millions

But it’s not just about money—trust matters. We can now tell customers exactly why their transaction was flagged, eliminating the “the system says so” black‑box feeling.

The financial‑impact visualization made my case to management in 10 seconds flat.

Financial impact visualization

What I Wish I Knew Then

  • Simple beats complex – The 1‑hour rule required no machine learning to discover.
  • Explainability matters – Logistic Regression won for e‑commerce because we could defend it.
  • Fraudsters adapt – Today’s patterns become tomorrow’s history.

The Big Realization

The most valuable insight wasn’t in the fancy algorithms. It was in asking a simple question:

“What happens right after someone signs up?”

Sometimes the most powerful data science is asking obvious questions and having the courage to believe the answers, even when they seem too simple to be true.

Want to see how we did it?
The code, the struggles, and the celebrations are all here:

Question for you: What’s the most surprising pattern you’ve found in your data?

Coffee consumption during this project: 47 cups ☕
Regrets: Zero

Back to Blog

Related posts

Read more »

Data Leakage pada Machine Learning

Data Leakage pada Machine Learning Sering kali mentee melakukan kesalahan dasar dalam alur kerja Machine Learning: Exploratory Data Analysis EDA → preprocessin...