[Paper] On GPU Implementation for Multi-Precision Integer Division

Published: 6 days ago (June 4, 2026 at 12:51 PM EDT)

1 min read

Source: arXiv

Source: arXiv - 2606.06386v1

Overview

This paper presents the issues arising in implementing a fast integer division algorithm on general purpose GPUs. The algorithm uses a Newton iteration based on the shifted inverse operation, keeping all arithmetic in the integer domain and relying on data-parallel operators. The principal contribution is an efficient GPU/CUDA implementation for integer precisions from $2^{15}$ to $2^{18}$ — sizes not supported by \cgbn{} division. We propose algorithmic refinements, define a cost model in terms of multiplications, build on prefix sums and previous work on multi-precision multiplication, and present an evaluation showing near-optimal performance relative to the model for the target precision.

Key Contributions

This paper presents research in the following areas:

cs.DC
cs.MS
cs.SC

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.DC.

Authors

Martin B. Marchioro
Aske N. Raahauge
Marc I. Løvenskjold
Cosmin E. Oancea
Stephen M. Watt

Paper Information

arXiv ID: 2606.06386v1
Categories: cs.DC, cs.MS, cs.SC
Published: June 4, 2026
PDF: Download PDF

[Paper] On GPU Implementation for Multi-Precision Integer Division

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

[Paper] Predictive Autoscaling in Cloud-Native and Federated Cloud-Edge Computing Environments: A Taxonomy and Future Directions

[Paper] PCCL: Process Group-Aware Scalable and Generic Collective Algorithm Synthesizer

[Paper] Mission-Level Runtime Assurance Framework for Autonomous Driving