How a Small OSINT Team Turned the Epstein Files Dump Into Actionable Intelligence
Source: Dev.to
Overview
In February 2026 I took part in a collective investigation based exclusively on open‑source intelligence (OSINT) to contextualize vague references contained in public court records released by the United States Department of Justice (DOJ) in connection with the Epstein case.
This release – one of the largest ever related to the convicted financier Jeffrey Epstein – made millions of pages publicly available starting in January 2026, under the so‑called Epstein Files Transparency Act.
What began as collaborative analysis within online communities evolved, within a few days, into technical contributions that supported formal institutional actions. The key differentiator was the work of a small investigation team combining modern tools with rigorous human curation and highly efficient communication – in practice, operating more agilely than much larger structures.
This setup enabled direct collaboration with investigative journalism, which expanded the reach and contextualisation of the public data through in‑depth reporting, and with Brazil’s Federal Prosecution Service (Ministério Público Federal – MPF). These interactions proved essential for accelerating official procedures, including the opening of an administrative inquiry and its subsequent escalation to a national unit specialised in transnational crimes.
Below is a chronological description of the technical workflow adopted.
Technical methodology (step‑by‑step)
1. Initial entity extraction and human curation
(Days 1–3 of February 2026)
Public documents — mainly emails and excerpts from 2011 court records released in DOJ datasets — were reviewed manually and with basic supporting tools.
- Most of the early analytical value came from human curation: careful reading of socio‑economic descriptions, vague geographic references and implicit logistical elements.
- This stage established the foundation for all subsequent cross‑referencing.
2. Entity resolution and OSINT mapping with specialised tools
(Around February 4)
Using exclusively public sources, we performed multi‑source correlation involving business registries, corporate structures and open archival datasets.
-
Maltego was used to map digital networks and associated online connections.
-
Entity‑resolution techniques prioritised contextual matches, such as:
- Approximate geographic linkage
- Migration or relocation history
- Recurring logistical and temporal patterns
- Indirect but persistent relationships
-
Open Social Network and Results – as a result, a key intermediary entity was resolved within a matter of hours.
3. Graph construction and visualisation with Neo4j and Mermaid.js
Resolved entities and relationships were imported into Neo4j, enabling the modelling of complex investigative networks and the execution of graph queries focused on:
- Centrality
- Paths and intermediaries
- Logistical and institutional hubs
This graph‑based representation revealed temporal and geographic patterns that were not apparent through linear document analysis.
The entire workflow was visually documented using Mermaid.js, adopting a diagrams‑as‑code approach integrated into Markdown. We produced:
- Process flowcharts
- Timelines
- Entity‑relationship graphs
These visual artefacts greatly facilitated collaborative review, traceability and methodological transparency.
4. AI support (Grok) for chronology and partial analysis
Grok was used as an auxiliary tool to:
- Consolidate event timelines
- Identify dates of mentions and document releases
- Summarise selected text snippets
- Suggest optimised queries and candidate links between entities
AI was used strictly as an operational accelerator. All validation and critical decisions remained under human responsibility and manual source verification.
5. Responsible disclosure, collaboration and immediate impact
(February 4–9)
- ~Day 4: Controlled public disclosure of the resolved entities within s
Note: the original text ends abruptly at “within s”; the content has been retained unchanged.
Timeline (Illustrative Example)
| Day | Activity |
|---|---|
| Day 1 | Initial data collection from publicly available sources (court filings, corporate registries, news archives). |
| Day 2‑3 | Community‑driven enrichment – specialized online forums and OSINT groups cross‑checked entities, added missing links, and produced a preliminary graph model. |
| Day 4‑6 | Amplification by independent investigative journalists – using the same public data, they published in‑depth reports that broadened visibility and institutional pressure. |
| Day 7‑8 | Extension of the mapping to additional references found in the released files, including potential international hubs and publicly listed entities. |
| Day 8‑9 | Observed escalation of the administrative procedure to a national unit specialised in transnational crimes, in line with the rapid consolidation and documentation of the OSINT findings. |
Guiding Principles
- Strict reliance on open and publicly available sources only.
- No collection or disclosure of sensitive information beyond what is already public.
- Explicit recognition of the collective and collaborative nature of the work (online communities, investigative journalism, and the Brazilian Federal Prosecution Service – MPF).
- Continuous emphasis on human curation to ensure accuracy, ethical standards, and accountability.
Lessons Learned & Impact
- This case shows how a small, well‑coordinated team—using Neo4j for graph modeling, Maltego for network mapping, Mermaid.js for visual documentation, and Grok for analytical and chronological support—can achieve disproportionate results in open‑source investigations.
- The central factor was not automation, but rigorous cross‑referencing of public data combined with structured, auditable documentation.
- Direct collaboration with investigative journalists and the Brazilian Federal Prosecution Service turned technical analysis into practical institutional input.
- The workflow provides a concrete example of ethical and responsible use of OSINT and AI in a high‑impact social context (the Epstein case).
- Professionals working with OSINT, graph databases, investigative‑process visualization, or AI‑assisted analysis can adapt this workflow to scenarios such as compliance, due diligence, corporate investigations, and independent research.
Team Workflow, Data Curation & Lightweight Frameworks
- Kanban‑inspired workflow – coordinated tasks, controlled data quality, and ensured traceability throughout the OSINT process.
- Human‑centric data‑curation pipeline – raw extractions were reviewed, normalized, and validated before promotion to the shared graph and documentation layers.
- Each card represented a single investigative hypothesis or entity cluster and followed a clear lifecycle:
- Discovery
- Preliminary validation
- Multi‑source corroboration
- Graph integration
- Publication‑ready documentation
- Each card represented a single investigative hypothesis or entity cluster and followed a clear lifecycle:
- Curation focus – prevented entity conflation, managed ambiguous references, and avoided premature attribution. Particular attention was given to:
- Name disambiguation
- Geographic uncertainty
- Temporal consistency
- Source provenance
- Inclusion criteria – only entities supported by independent public sources and contextual coherence were incorporated into Neo4j and the Mermaid.js documentation.
The combination of a simple team framework (Kanban‑style coordination) with a strict human‑curation layer delivered operational speed without sacrificing methodological rigor, ethical standards, or auditability.
Positive Operational & Institutional Impacts
-
Operational benefits:
- Task visibility and well‑defined curation stages reduced duplication of effort.
- Minimized contradictory hypotheses and accelerated convergence toward high‑confidence entities.
-
Institutional benefits:
- Consistent curated datasets, clear provenance of sources, and traceable decision flow enabled faster reuse by investigative journalists and the MPF.
- Lowered verification costs, increased trust in OSINT outputs, and facilitated direct incorporation into formal analytical and administrative procedures.