[Paper] When High-Performance Computing Meets Software Testing: Distributed Fuzzing using MPI

Published: 4 days ago (December 1, 2025 at 07:38 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.01617v1

Overview

The paper investigates how Message Passing Interface (MPI)—the de‑facto standard for communication in high‑performance computing (HPC)—can replace the slower, file‑system‑based coordination that most distributed fuzzers rely on. By swapping heavyweight disk I/O for lightweight MPI primitives, the authors show measurable speed‑ups in fuzzing campaigns, especially during the early coverage‑building phase that matters most for continuous‑integration pipelines.

Key Contributions

MPI‑driven synchronization for distributed fuzzers, eliminating the need for shared‑filesystem checkpoints.
Design of a lightweight corpus‑exchange protocol that uses MPI collective operations (e.g., MPI_Bcast, MPI_Reduce) to keep fuzzing nodes in sync with minimal latency.
Comprehensive benchmark suite (AFL‑based workloads, real‑world binaries, and synthetic programs) demonstrating faster coverage growth and reduced stagnation.
Analysis of scalability up to dozens of nodes, showing near‑linear speed‑up until network saturation points.
Discussion of integration points for CI/CD environments, highlighting how early‑stage coverage gains translate into quicker bug detection in development cycles.

Methodology

Baseline System – The authors start from a conventional distributed fuzzing setup that shares a corpus via a network‑mounted filesystem (NFS).
MPI Layer – They replace file‑based sync with a thin MPI layer:
- Each fuzzing worker runs as an MPI rank.
- Periodic broadcasts push newly discovered inputs to all ranks.
- Reductions aggregate coverage metrics, allowing a master rank to decide when to trigger a global sync.
Instrumentation – The fuzzers are instrumented with AFL’s coverage map; the map is periodically serialized and sent over MPI.
Evaluation – Experiments run on a cluster of up to 32 nodes (each node = 2‑core VM) using:
- Synthetic “deep‑path” programs designed to cause coverage stalls.
- Real‑world open‑source utilities (e.g., libpng, openssl).

Metrics collected: unique edges discovered, time to reach 50 % coverage, and network overhead.

Results & Findings

Scenario	Baseline (NFS)	MPI‑Sync	Speed‑up
Synthetic deep‑path (8 nodes)	12 h to 60 % coverage	4.5 h to 60 % coverage	≈2.7×
libpng (16 nodes)	3.2 h to 80 % edges	1.8 h to 80 % edges	≈1.8×
OpenSSL (32 nodes)	6.5 h to 70 % coverage	3.9 h to 70 % coverage	≈1.7×

Early‑stage acceleration: Most of the gain appears in the first few hours, where rapid corpus sharing prevents individual workers from getting stuck in local minima.
Network cost: MPI traffic added < 5 % of total runtime, far lower than the I/O load of NFS sync (≈ 15 %).
Scalability: Up to ~20 nodes the speed‑up remains roughly linear; beyond that, network contention on the Ethernet fabric starts to flatten gains.

Practical Implications

CI/CD integration – Faster early coverage means fuzzing can be run as a pre‑merge gate without inflating pipeline latency. Teams can allocate a modest cluster (e.g., 8‑node MPI job) and still see a 2× reduction in time‑to‑detect.
Cost efficiency – By avoiding expensive shared storage solutions, organizations can spin up cheap compute‑only nodes (cloud VMs, on‑premise HPC nodes) and still achieve high throughput.
Tooling roadmap – Existing fuzzers (AFL, libFuzzer) can be wrapped with an MPI shim; the paper provides a reference implementation that could be turned into a plug‑in for popular fuzzing frameworks.
Deep‑path exploration – The coordinated corpus exchange mitigates “coverage stagnation,” a common pain point when fuzzing complex parsers or protocol stacks. This opens the door to more reliable security testing of large codebases (e.g., browsers, network daemons).

Limitations & Future Work

Network dependency – The approach assumes a low‑latency, high‑bandwidth interconnect; on highly contended cloud networks the MPI advantage may shrink.
Fault tolerance – MPI’s default runtime aborts the whole job on a single rank failure; the authors note the need for a resilient layer (e.g., ULFM or checkpoint‑restart).
Scalability ceiling – Experiments stop at 32 nodes; scaling to hundreds of workers will likely require hierarchical synchronization or hybrid MPI‑RDMA techniques.
Tool integration – The current prototype is a proof‑of‑concept; tighter integration with mainstream fuzzers and support for Windows environments remain open challenges.

Bottom line: By borrowing synchronization tricks from HPC, the authors demonstrate a practical path to make distributed fuzzing faster, cheaper, and more suitable for modern DevOps pipelines. For teams already running fuzzers at scale, swapping in an MPI‑based coordination layer could be a low‑effort win.

Authors

Pierciro Caliandro
Matteo Ciccaglione
Alessandro Pellegrini

Paper Information

arXiv ID: 2512.01617v1
Categories: cs.SE
Published: December 1, 2025
PDF: Download PDF

[Paper] When High-Performance Computing Meets Software Testing: Distributed Fuzzing using MPI

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Configuration Defects in Kubernetes

[Paper] POLARIS: Is Multi-Agentic Reasoning the Next Wave in Engineering Self-Adaptive Systems?

[Paper] Cross-Task Benchmarking and Evaluation of General-Purpose and Code-Specific Large Language Models

[Paper] PBFuzz: Agentic Directed Fuzzing for PoV Generation