Making AI Data Flows Visible: Building an Open-Source Tool to Understand SaaS & LLM Data Risk

Published: (January 17, 2026 at 11:04 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

The Problem I Kept Seeing in Practice

In many SMEs and startups, AI adoption happens incrementally:

  • A support tool adds AI ticket summarisation
  • A CRM introduces AI‑driven insights
  • Marketing tools generate content using LLMs
  • Internal documents are analysed using AI assistants

Each feature feels isolated and low‑risk. Over time, however:

  • Personal data is processed in more places
  • Third‑party AI providers are introduced
  • Cross‑border data flows increase
  • Assumptions replace documentation

What’s missing is not intent or care, it’s visibility.

Why Existing Approaches Fall Short (for SMEs)

Most existing solutions are either:

  • Enterprise‑grade compliance platforms
  • Security tools focused on enforcement
  • Vendor‑specific dashboards
  • Static documentation or spreadsheets

For smaller teams these approaches tend to be:

  • Too heavyweight
  • Too expensive
  • Too opaque
  • Too disconnected from how systems actually behave

I wanted to explore whether a simple, engineering‑led approach could help teams reason about AI‑related data risk without turning it into a legal or compliance exercise.

Design Principles

  1. Visibility over judgement – Surface potential risks rather than declare violations.
  2. Deterministic and explainable – Risk identification is based on explicit rules, not black‑box AI decisions.
  3. Local‑first – Everything runs locally; no cloud services or data collection.
  4. Honest about uncertainty – Unknown or unclear data handling is treated as a risk signal, not an error.
  5. Narrow scope – Focuses specifically on SaaS + LLM data flows, not a full compliance platform.

What the Tool Does

  1. Accepts simple JSON inputs describing:

    • SaaS tools in use
    • AI/LLM features enabled
    • Known (or unknown) data‑handling details
  2. Builds a data‑flow model: Source → Processing → Destination.

  3. Applies deterministic risk rules, such as:

    • Personal data sent to third‑party LLM providers
    • Lack of anonymisation before LLM processing
    • Cross‑border data flows
    • Unknown provider or data location
  4. Generates:

    • A structured technical report
    • A plain‑English executive summary

The outputs are intended to be readable by both technical and non‑technical stakeholders.

Handling “Unknowns” Explicitly

In real organisations teams often don’t know:

  • Which LLM provider a feature uses
  • Whether data is anonymised
  • Where data is ultimately processed

Instead of treating this as a failure, the tool treats lack of transparency itself as a risk signal. Uncertainty increases risk, mirroring real‑world governance practices.

What This Tool Is (and Is Not)

  • Not legal advice
  • Not an automated compliance system
  • Not an audit or enforcement tool

It is a technical visibility tool designed to support better conversations, documentation, and decision‑making around AI usage.

Why Open Source

  • Transparency builds trust
  • Deterministic rules are inspectable
  • Others can adapt or extend the logic
  • Encourages responsible AI practices

Opacity often causes more harm than good in data protection and AI governance, so openness is essential.

Early Learnings

  • Teams are often surprised by how many AI touchpoints exist.
  • Mapping flows forces valuable cross‑team discussion.
  • Even simple models surface non‑obvious risks.
  • Clarity reduces fear more than silence.

The tool doesn’t “solve” compliance, but it helps teams see what they’re already doing.

What’s Next

The project is currently in pilot / exploratory mode. Future focus includes:

  • Gathering feedback from early users
  • Improving clarity and explanations
  • Refining rule logic
  • Keeping the scope intentionally narrow

If you’re interested in exploring how AI features interact with your data flows, or have thoughts on improving visibility, feedback is very welcome.

Repository

The project is available here:
👉

Closing Thought

AI adoption doesn’t fail because teams don’t care about data—it fails when systems become too complex to reason about. Sometimes the most useful thing you can build isn’t another layer of automation, but a clearer picture of what’s already happening.

Back to Blog

Related posts

Read more »

Rapg: TUI-based Secret Manager

We've all been there. You join a new project, and the first thing you hear is: > 'Check the pinned message in Slack for the .env file.' Or you have several .env...

Technology is an Enabler, not a Saviour

Why clarity of thinking matters more than the tools you use Technology is often treated as a magic switch—flip it on, and everything improves. New software, pl...