[Paper] A TEE-based Approach for Preserving Data Secrecy in Process Mining with Decentralized Sources

Published: 4 days ago (February 4, 2026 at 11:06 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.04697v1

Overview

Process mining is becoming a go‑to technique for turning raw event logs into actionable process insights. When those logs are scattered across multiple independent companies, however, sharing data can expose sensitive business information. The paper introduces CONFINE, a framework that uses Trusted Execution Environments (TEEs) to let several parties collaboratively mine their logs while keeping each organization’s raw data secret.

Key Contributions

TEE‑based secrecy preservation – Deploys a trusted application inside a TEE (e.g., Intel SGX) that can ingest and process multi‑party logs without ever exposing the clear‑text data to the host OS or other participants.
Four‑stage secure protocol – Defines a complete end‑to‑end workflow (provisioning, secure transfer, aggregation, result release) that guarantees confidentiality and integrity of the exchanged logs.
Segmentation strategy for limited enclave memory – Breaks large logs into small batches that fit inside the enclave, preventing out‑of‑memory crashes while preserving the semantics of the mining algorithm.
Formal verification & security analysis – Uses model‑checking to prove protocol correctness and evaluates the TEE’s threat model to show that data leakage is infeasible under realistic attacker capabilities.
Scalable prototype evaluation – Demonstrates logarithmic memory growth with log size and linear growth with the number of participating organizations on both synthetic and real‑world datasets.

Methodology

Architecture – A central orchestrator coordinates the mining job, but the actual computation runs inside a TEE on a cloud node. Each organization runs a lightweight client that encrypts its log and streams it to the enclave.
Secure Data Exchange – The protocol uses mutual attestation (both parties prove they run genuine TEEs) followed by a Diffie‑Hellman key exchange to derive a session key. Logs are then sent in encrypted chunks.
Batch Processing – Inside the enclave, the log is reconstructed chunk‑by‑chunk. The mining algorithm (e.g., discovery of a process model) works on the incremental view, updating internal data structures without ever storing the full log in memory.
Result Release – Once processing finishes, the enclave signs the mined model and sends it back. The signature proves that the result was produced inside a verified TEE and that no raw data was leaked.
Verification – The authors model the protocol in the TLA+ language and automatically check safety properties (no data leakage) and liveness (the protocol eventually terminates).

The whole pipeline is implemented in Python/C++ with SGX SDK bindings, making it relatively easy to integrate into existing process‑mining stacks.

Results & Findings

Scenario	Log Size	# Orgs	Memory (Enclave)	Runtime
Synthetic (linear)	10 M events	3	12 MB (≈ log₂ size)	2.3 min
Real‑world (order‑to‑cash)	2.4 M events	5	8 MB	1.1 min
Stress test (50 M events)	50 M	2	18 MB	7.9 min

Memory grows logarithmically with the total number of events thanks to the batch‑wise processing.
Runtime scales linearly with the number of participating organizations, as each adds an extra encrypted transfer step.
The approach successfully mined standard process‑discovery models (e.g., BPMN, Petri nets) that were identical to those obtained from a non‑secure, centralized run, confirming functional equivalence.

Practical Implications

Secure SaaS Process‑Mining – Vendors can now offer cloud‑based mining services without requiring clients to hand over raw logs, opening doors to cross‑company analytics (supply‑chain, finance, healthcare).
Compliance‑by‑Design – The TEE guarantees that data never leaves the enclave in clear text, helping organizations meet GDPR, CCPA, or industry‑specific confidentiality clauses.
Plug‑and‑Play Integration – Because the client side is just a thin encryption wrapper, existing log exporters (e.g., from ERP or BPM systems) can be retrofitted with minimal code changes.
Cost‑Effective Scaling – The logarithmic memory footprint means a single modest‑size cloud VM can handle multi‑gigabyte logs, reducing infrastructure spend compared to heavyweight homomorphic‑encryption alternatives.

Developers can start experimenting by pulling the open‑source CONFINE prototype, swapping the SGX enclave for any TEE that supports remote attestation (e.g., AMD SEV, ARM TrustZone), and plugging in their favorite process‑mining library.

Limitations & Future Work

TEE Trust Assumptions – Security hinges on the integrity of the underlying hardware and firmware; side‑channel attacks (e.g., cache‑timing) are not fully mitigated.
Network Overhead – Encrypting and transmitting logs in many small batches adds latency, especially over high‑latency links.
Algorithm Scope – The current implementation focuses on discovery algorithms; conformance checking, predictive analytics, or deep‑learning‑based mining are not yet supported.
Future Directions – The authors plan to (1) integrate side‑channel hardened enclaves, (2) explore adaptive batch sizing to reduce round‑trips, and (3) extend the framework to support federated learning‑style process‑model refinement across many more participants.

Authors

Davide Basile
Valerio Goretti
Luca Barbaro
Hajo A. Reijers
Claudio Di Ciccio

Paper Information

arXiv ID: 2602.04697v1
Categories: cs.DC
Published: February 4, 2026
PDF: Download PDF

[Paper] A TEE-based Approach for Preserving Data Secrecy in Process Mining with Decentralized Sources

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

Homeland Security is trying to force tech companies to hand over data about Trump critics

Data Brokers Can Fuel Violence Against Public Servants

New York is considering two bills to rein in the AI industry

Dave Farber Dies at Age 91