[Paper] When High-Performance Computing Meets Software Testing: Distributed Fuzzing using MPI

Published: (December 1, 2025 at 07:38 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.01617v1

Overview

The paper investigates how Message Passing Interface (MPI)—the de‑facto standard for communication in high‑performance computing (HPC)—can replace the slower, file‑system‑based coordination that most distributed fuzzers rely on. By swapping heavyweight disk I/O for lightweight MPI primitives, the authors show measurable speed‑ups in fuzzing campaigns, especially during the early coverage‑building phase that matters most for continuous‑integration pipelines.

Key Contributions

  • MPI‑driven synchronization for distributed fuzzers, eliminating the need for shared‑filesystem checkpoints.
  • Design of a lightweight corpus‑exchange protocol that uses MPI collective operations (e.g., MPI_Bcast, MPI_Reduce) to keep fuzzing nodes in sync with minimal latency.
  • Comprehensive benchmark suite (AFL‑based workloads, real‑world binaries, and synthetic programs) demonstrating faster coverage growth and reduced stagnation.
  • Analysis of scalability up to dozens of nodes, showing near‑linear speed‑up until network saturation points.
  • Discussion of integration points for CI/CD environments, highlighting how early‑stage coverage gains translate into quicker bug detection in development cycles.

Methodology

  1. Baseline System – The authors start from a conventional distributed fuzzing setup that shares a corpus via a network‑mounted filesystem (NFS).
  2. MPI Layer – They replace file‑based sync with a thin MPI layer:
    • Each fuzzing worker runs as an MPI rank.
    • Periodic broadcasts push newly discovered inputs to all ranks.
    • Reductions aggregate coverage metrics, allowing a master rank to decide when to trigger a global sync.
  3. Instrumentation – The fuzzers are instrumented with AFL’s coverage map; the map is periodically serialized and sent over MPI.
  4. Evaluation – Experiments run on a cluster of up to 32 nodes (each node = 2‑core VM) using:
    • Synthetic “deep‑path” programs designed to cause coverage stalls.
    • Real‑world open‑source utilities (e.g., libpng, openssl).

Metrics collected: unique edges discovered, time to reach 50 % coverage, and network overhead.

Results & Findings

ScenarioBaseline (NFS)MPI‑SyncSpeed‑up
Synthetic deep‑path (8 nodes)12 h to 60 % coverage4.5 h to 60 % coverage≈2.7×
libpng (16 nodes)3.2 h to 80 % edges1.8 h to 80 % edges≈1.8×
OpenSSL (32 nodes)6.5 h to 70 % coverage3.9 h to 70 % coverage≈1.7×
  • Early‑stage acceleration: Most of the gain appears in the first few hours, where rapid corpus sharing prevents individual workers from getting stuck in local minima.
  • Network cost: MPI traffic added < 5 % of total runtime, far lower than the I/O load of NFS sync (≈ 15 %).
  • Scalability: Up to ~20 nodes the speed‑up remains roughly linear; beyond that, network contention on the Ethernet fabric starts to flatten gains.

Practical Implications

  • CI/CD integration – Faster early coverage means fuzzing can be run as a pre‑merge gate without inflating pipeline latency. Teams can allocate a modest cluster (e.g., 8‑node MPI job) and still see a 2× reduction in time‑to‑detect.
  • Cost efficiency – By avoiding expensive shared storage solutions, organizations can spin up cheap compute‑only nodes (cloud VMs, on‑premise HPC nodes) and still achieve high throughput.
  • Tooling roadmap – Existing fuzzers (AFL, libFuzzer) can be wrapped with an MPI shim; the paper provides a reference implementation that could be turned into a plug‑in for popular fuzzing frameworks.
  • Deep‑path exploration – The coordinated corpus exchange mitigates “coverage stagnation,” a common pain point when fuzzing complex parsers or protocol stacks. This opens the door to more reliable security testing of large codebases (e.g., browsers, network daemons).

Limitations & Future Work

  • Network dependency – The approach assumes a low‑latency, high‑bandwidth interconnect; on highly contended cloud networks the MPI advantage may shrink.
  • Fault tolerance – MPI’s default runtime aborts the whole job on a single rank failure; the authors note the need for a resilient layer (e.g., ULFM or checkpoint‑restart).
  • Scalability ceiling – Experiments stop at 32 nodes; scaling to hundreds of workers will likely require hierarchical synchronization or hybrid MPI‑RDMA techniques.
  • Tool integration – The current prototype is a proof‑of‑concept; tighter integration with mainstream fuzzers and support for Windows environments remain open challenges.

Bottom line: By borrowing synchronization tricks from HPC, the authors demonstrate a practical path to make distributed fuzzing faster, cheaper, and more suitable for modern DevOps pipelines. For teams already running fuzzers at scale, swapping in an MPI‑based coordination layer could be a low‑effort win.

Authors

  • Pierciro Caliandro
  • Matteo Ciccaglione
  • Alessandro Pellegrini

Paper Information

  • arXiv ID: 2512.01617v1
  • Categories: cs.SE
  • Published: December 1, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »