[Paper] An LLVM-Based Optimization Pipeline for SPDZ

Published: 1 month ago (December 11, 2025 at 03:53 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.11112v1

Overview

The paper presents a prototype compiler‑runtime stack that plugs the SPDZ secure‑multiparty computation (MPC) protocol into the LLVM ecosystem. By letting developers write ordinary (annotated) C code, the system automatically extracts parallelism, batches arithmetic operations, and overlaps communication with computation—yielding up to 5.5× speed‑ups on CPU and scalable GPU acceleration.

Key Contributions

LLVM‑based front‑end that accepts a small, privacy‑annotated subset of C and lowers it to LLVM IR, reusing LLVM’s mature analyses.
Automatic batching of independent arithmetic operations, removing the need for programmers to manually express parallelism.
Protocol‑aware scheduler in the back‑end that performs data‑flow and control‑flow analysis to drive a non‑blocking runtime, overlapping network traffic with local computation.
GPU off‑loading path that maps large batched arithmetic kernels to CUDA kernels when available.
Empirical evaluation showing up to 5.56× speed‑up over MP‑SPDZ on CPU and strong scaling with thread count; GPU back‑end scales better for larger inputs.

Methodology

Front‑end parsing – Developers write C code with lightweight annotations (e.g., @secret) to mark private values. The parser translates this into LLVM IR, preserving the annotations as metadata.
LLVM optimizations – Standard passes (dead‑code elimination, loop unrolling, etc.) run unchanged. Custom passes then detect independent arithmetic statements and group them into batches.
Back‑end analysis – A data‑flow pass builds a dependency graph of the batched operations. A control‑flow pass identifies points where communication (sending/receiving secret shares) can be overlapped with independent local work.
Runtime scheduler – The scheduler is non‑blocking: it issues network messages early, then continues executing any ready batches while awaiting replies. When a batch is large enough and a GPU is present, it is dispatched to a CUDA kernel.
Evaluation – The authors benchmark a suite of micro‑benchmarks (matrix multiplication, polynomial evaluation, etc.) in the online phase of SPDZ, comparing against the state‑of‑the‑art MP‑SPDZ implementation on both CPU‑only and CPU+GPU configurations.

Results & Findings

Configuration	Speed‑up vs. MP‑SPDZ	Scaling behavior
CPU, 1‑thread	1.8× – 2.3× (light workloads)	Near‑linear up to 8 cores
CPU, 8‑threads	up to 5.56× (heavy algebra)	Strong scaling, diminishing returns after 16 threads
GPU (CUDA)	2.5× – 4.0× over CPU‑only for large inputs	Improves as batch size grows; overhead negligible for small problems

Key takeaways

Automatic batching eliminates most of the manual parallelism engineering required by existing SPDZ toolchains.
Non‑blocking scheduling hides network latency, especially beneficial when the underlying network is high‑latency but high‑bandwidth.
GPU acceleration becomes worthwhile once the batched workload exceeds a few thousand arithmetic ops, matching the typical size of real‑world MPC tasks (e.g., privacy‑preserving ML inference).

Practical Implications

Lower barrier to entry – Developers can now write familiar C code with simple annotations instead of learning domain‑specific languages or hand‑crafting parallel MPC pipelines.
Faster production deployments – The observed speed‑ups translate directly into lower compute costs and tighter latency budgets for privacy‑preserving services (e.g., secure auctions, federated analytics).
Hardware‑agnostic scaling – The same codebase can run efficiently on multi‑core CPUs or be upgraded to GPU‑accelerated clusters without code changes, enabling a smooth migration path as workloads grow.
Potential for integration – Because the front‑end emits standard LLVM IR, existing toolchains (Clang, Rust‑LLVM back‑ends, etc.) could be extended to target SPDZ, opening the door for broader language support.

Limitations & Future Work

Subset of C – Only a limited set of language features (straight‑line arithmetic, simple loops) is currently supported; complex data structures and dynamic memory are out of scope.
Online‑phase focus – The evaluation concentrates on the online phase; offline preprocessing (pre‑computation of multiplication triples) is not accelerated.
Prototype maturity – The system is a proof‑of‑concept; robustness, error handling, and integration with existing MPC frameworks need further engineering.
Future directions – Extending the front‑end to full C/C++ (or other languages), adding support for other MPC protocols (e.g., BGV, CKKS), and exploring heterogeneous scheduling across CPU, GPU, and FPGA accelerators.

Authors

Tianye Dai
Hammurabi Mendes
Heuichan Lim

Paper Information

arXiv ID: 2512.11112v1
Categories: cs.CR, cs.DC, cs.SE
Published: December 11, 2025
PDF: Download PDF

[Paper] An LLVM-Based Optimization Pipeline for SPDZ

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Hypergraph based Multi-Party Payment Channel

[Paper] Stateless Snowflake: A Cloud-Agnostic Distributed ID Generator Using Network-Derived Identity

[Paper] FirecREST v2: lessons learned from redesigning an API for scalable HPC resource access

[Paper] Enhanced Pruning for Distributed Closeness Centrality under Multi-Packet Messaging