[Paper] TopoSZp: Lightweight Topology-Aware Error-controlled Compression for Scientific Data
Source: arXiv - 2602.17552v1
Overview
Large‑scale HPC simulations generate petabytes of data that must be stored, transferred, and visualized efficiently. Traditional error‑bounded lossy compressors (e.g., SZ, ZFP) excel at reducing size while guaranteeing numeric fidelity, but they often destroy the topology of the data—critical points such as minima, maxima, and saddles that scientists rely on for downstream analysis. TopoSZp introduces a lightweight, topology‑aware compression pipeline that preserves these features without sacrificing speed, making it a practical tool for both researchers and developers handling scientific datasets.
Key Contributions
- Topology‑preserving compression built on SZp: Extends the high‑throughput SZp compressor with inexpensive critical‑point detection and refinement steps.
- Strict error‑bound enforcement with relaxed topology constraints: Guarantees the user‑specified numeric error while allowing a controlled relaxation to keep critical points intact.
- Local ordering preservation: Ensures that the relative ordering of neighboring values around each critical point is maintained, preventing spurious topology changes.
- Targeted saddle‑point refinement: Refines only the regions around saddles—typically the most fragile structures—avoiding a full‑scale topology reconstruction.
- Massive performance gains: Achieves 100–10 000× faster compression and 10–500× faster decompression than prior topology‑aware compressors, with comparable compression ratios.
Methodology
-
Base Compressor (SZp)
- SZp is a variant of the popular SZ compressor that uses predictive modeling and quantization to meet a user‑defined absolute or relative error bound. It is already optimized for multi‑core CPUs and GPUs.
-
Critical‑Point Detection (Lightweight)
- A single‑pass scan identifies minima, maxima, and saddles by comparing each voxel to its immediate neighbors (6‑ or 26‑connectivity).
- The detection is lazy: it flags only points that are likely to change type after compression, reducing unnecessary work.
-
Local Ordering Preservation
- For each flagged critical point, TopoSZp records a tiny “ordering mask” that captures the relative magnitude of the point and its neighbors.
- During compression, the quantization step is constrained to keep the ordering consistent, ensuring the critical point’s type does not flip.
-
Saddle‑Point Refinement
- Saddles are the most topology‑sensitive structures. TopoSZp performs a focused refinement: it locally re‑compresses the saddle’s neighborhood with a tighter error bound until the original saddle topology is recovered.
-
Error‑Bound Enforcement
- The overall error bound (e.g., 1e‑4) is never violated. The algorithm may temporarily relax the bound locally to preserve topology, but a final pass guarantees the bound is restored before output.
-
Parallel Execution
- All steps are embarrassingly parallel across blocks of the dataset, allowing the compressor to scale on many‑core CPUs and GPUs without complex synchronization.
Results & Findings
| Dataset | Compression Ratio | Non‑preserved Critical Points (Δ) | Compression Speed (×) vs. prior topology‑aware | Decompression Speed (×) vs. prior |
|---|---|---|---|---|
| Turbulent flow (3 TB) | 12.3:1 | 0.02 % (vs. 2 % for SZ) | 1 200× | 150× |
| Climate simulation (1.2 TB) | 10.8:1 | 0.05 % (vs. 1.8 % for ZFP) | 3 500× | 300× |
| Combustion (800 GB) | 13.5:1 | 0.01 % (vs. 0.9 % for SZ‑Topo) | 9 800× | 420× |
- Topology preservation: TopoSZp eliminated false‑positive critical points and never mis‑typed a point (e.g., a minimum reported as a saddle).
- Compression ratio: Within 5 % of the best‑in‑class SZp compression, showing that topology preservation incurs minimal overhead.
- Speed: The lightweight detection and localized refinement keep the runtime orders of magnitude lower than earlier topology‑aware methods that required global Morse‑Smale complex reconstruction.
Practical Implications
- In‑situ data reduction: HPC applications can embed TopoSZp directly into simulation pipelines, compressing data on the fly while guaranteeing that downstream analysis (e.g., feature tracking, topology‑based segmentation) remains valid.
- Visualization pipelines: Scientific visualizers can load compressed datasets without worrying that critical structures have been lost, enabling accurate isosurface extraction and feature‑aware rendering.
- Storage and I/O cost savings: Because TopoSZp retains the same compression ratio as SZp but adds topology safety, organizations can reduce storage footprints without adding post‑hoc validation steps.
- Developer friendliness: The API mirrors SZp’s existing C/C++ and Python bindings, requiring only a few extra parameters (e.g.,
preserve_topology=true). This lowers the barrier for integration into existing workflows.
Limitations & Future Work
- Assumption of regular grids: The current implementation works on structured, uniform meshes; extending to unstructured or adaptive grids will require redesigning the critical‑point detection kernel.
- Error‑bound relaxation granularity: While the algorithm guarantees the final bound, the intermediate relaxation may affect downstream algorithms that rely on intermediate compressed values (e.g., iterative solvers). Future work will explore tighter, per‑block error budgeting.
- GPU‑only optimization: Early GPU experiments show promising speedups, but the saddle‑refinement kernel still lags behind the CPU version on some architectures. Optimizing memory access patterns for GPUs is an active research direction.
TopoSZp demonstrates that preserving scientific topology need not come at the cost of performance. By marrying a proven lossy compressor with a few clever, locality‑focused topology checks, the authors deliver a tool that can be dropped into existing HPC pipelines, giving developers confidence that their compressed data remains analytically trustworthy.
Authors
- Tripti Agarwal
- Sheng Di
- Xin Liang
- Zhaoyuan Su
- Yuxiao Li
- Ganesh Gopalakrishnan
- Hanqi Guo
- Franck Cappello
Paper Information
- arXiv ID: 2602.17552v1
- Categories: cs.DC
- Published: February 19, 2026
- PDF: Download PDF