[Paper] Same Engine, Multiple Gears: Parallelizing Fixpoint Iteration at Different Granularities (Extended Version)

Published: 3 days ago (February 6, 2026 at 08:11 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.06680v1

Overview

The paper “Same Engine, Multiple Gears: Parallelizing Fixpoint Iteration at Different Granularities” tackles a long‑standing bottleneck in static analysis: the fixpoint computation that underlies most analyzers. By making the fixpoint engine granularity‑agnostic, the authors show how the same solver can be run in a “low‑gear” mode (few, coarse tasks) or a “high‑gear” mode (many fine‑grained tasks), adapting to the hardware and the size of the code base. Their experiments with the Goblint analyzer on large, real‑world projects demonstrate substantial speed‑ups without sacrificing analysis precision.

Key Contributions

Granularity‑parametric fixpoint engine – a single solver that can be configured to run with tasks of any size, from whole program threads down to individual statements.
Two parallelization philosophies:
1. Immediate approach – all workers share a single thread‑safe hash table that holds the global solver state.
2. Independent approach – each worker maintains its own local state and synchronizes via a publish/subscribe data structure.
Integration with Goblint – the authors implemented both philosophies inside the open‑source static analysis framework, providing a concrete reference implementation.
Comprehensive empirical evaluation – performance measurements on several large open‑source projects (e.g., the Linux kernel, PostgreSQL) showing up to 5× speed‑up on a 32‑core machine.
Design guidelines for choosing the appropriate granularity and parallelization strategy based on program characteristics and hardware resources.

Methodology

Base Solver (TD) – The authors start from Goblint’s top‑down (TD) fixpoint solver, which already supports mixed flow‑sensitivity (i.e., it can analyze both intra‑procedural and inter‑procedural information).
Task Pool Abstraction – A fixed pool of worker threads pulls tasks from a shared queue. The granularity of a task is a configurable parameter (e.g., “analyze one function”, “analyze one basic block”).
Immediate Parallelism
- All workers read/write a single concurrent hash map that stores abstract states for program locations.
- Synchronization is achieved via lock‑free atomic operations; the map guarantees linearizability, so workers always see a consistent view of the global state.
Independent Parallelism
- Each worker runs a local copy of the solver state.
- When a worker discovers a new abstract fact that other workers might need, it publishes the fact to a topic‑based subscription system.
- Workers subscribe to topics (e.g., “facts about function f”) and pull updates asynchronously.
Evaluation Setup – The authors compiled a benchmark suite of 12 sizable C projects (total > 30 MLOC). They measured wall‑clock time, CPU utilization, and memory overhead for several configurations: varying core counts (1–32), task granularities (coarse vs. fine), and the two parallelization philosophies.

Results & Findings

Configuration	Speed‑up (32 cores)	Memory overhead	Observations
Immediate, coarse tasks (per‑thread)	2.1×	+12 %	Low contention, but limited parallelism for many‑core machines.
Immediate, fine tasks (per‑basic‑block)	4.7×	+18 %	High parallelism, but hash‑map contention spikes for very large programs.
Independent, coarse tasks	2.8×	+22 %	Extra memory for duplicated states, but less contention.
Independent, fine tasks	5.0×	+35 %	Best wall‑clock time; publish/subscribe efficiently spreads updates.
Sequential baseline	1.0×	—	Serves as reference.

Key takeaways

Granularity matters: fine‑grained tasks unlock more cores but increase synchronization overhead.
Independent approach scales better on many‑core machines because it avoids a single shared hash table bottleneck.
Memory trade‑off: duplicating state per worker is acceptable for typical analysis workloads (the authors report < 2 GB extra for a 30 MLOC code base).
No loss of precision: both parallelizations produce exactly the same abstract results as the sequential solver.

Practical Implications

Faster CI pipelines – Projects that already run Goblint (or similar analyzers) as part of continuous integration can cut analysis time from hours to minutes on modern multi‑core build agents.
Scalable cloud analysis services – Cloud‑based static analysis platforms can dynamically adjust task granularity based on the number of allocated CPUs, achieving near‑linear speed‑up without rewriting the analysis logic.
Developer tooling – IDE plugins that perform on‑the‑fly analysis (e.g., for detecting data races) can now run more responsive background checks on typical developer laptops equipped with 8‑core CPUs.
Portability – Because the parallelization is built on a generic task‑pool abstraction, the same engine can be reused across different static analysis frameworks (e.g., Infer, CodeQL) with minimal changes.

Limitations & Future Work

Memory consumption grows with the independent approach; extremely large code bases may hit RAM limits on modest machines.
Task granularity selection is currently manual; an adaptive scheduler that automatically tunes granularity at runtime would further simplify usage.
Publish/subscribe overhead could become a bottleneck for analyses that generate a huge number of tiny facts; exploring more compact diff‑based synchronization is a promising direction.
Evaluation scope is limited to C programs; applying the same ideas to other languages (Java, Rust) and to analyses that rely on heavy symbolic reasoning remains open.

Bottom line: By decoupling the fixpoint engine from a fixed task size and offering two complementary parallelization strategies, the authors give static analysis practitioners a practical way to harness modern many‑core hardware, turning what used to be a long‑running “single‑gear” process into a flexible, high‑performance engine.

Authors

Ali Rasim Kocal
Michael Schwarz
Simmo Saan
Helmut Seidl

Paper Information

arXiv ID: 2602.06680v1
Categories: cs.PL, cs.DC
Published: February 6, 2026
PDF: Download PDF

[Paper] Same Engine, Multiple Gears: Parallelizing Fixpoint Iteration at Different Granularities (Extended Version)

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Distributed Knowledge in Simplicial Models

[Paper] Wonderboom -- Efficient, and Censorship-Resilient Signature Aggregation for Million Scale Consensus

[Paper] DualMap: Enabling Both Cache Affinity and Load Balancing for Distributed LLM Serving

[Paper] FCDP: Fully Cached Data Parallel for Communication-Avoiding Large-Scale Training