[Paper] Graph Traversal on Tensor Cores: A BFS Framework for Modern GPUs
Modern GPUs have Tensor Cores (TCs) capable of extremely high-throughput matrix operations, yet graph algorithms remain difficult to accelerate because of their...
Modern GPUs have Tensor Cores (TCs) capable of extremely high-throughput matrix operations, yet graph algorithms remain difficult to accelerate because of their...
In this paper, we consider the problem of locally certifying that the size of a network is even, or more generally, congruent to some fixed number. The parity p...
Pearl, a Layer-1 blockchain with high-profile AI industry endorsements, markets its Proof-of-Useful-Work (PoUW) protocol as simultaneously securing the network ...
Directed Acyclic Graph (DAG) based BFT protocols have demonstrated the capability to achieve significantly high throughput in practice. Recent advancements focu...
We study rectangular matrix multiplication in the low-bandwidth model of distributed computing. There are n computers; initially the input matrices are distribu...
The complexity of biomolecular simulations has substantially increased the demand for High-Performance Computing (HPC) infrastructures, particularly in molecula...
The edge-cloud paradigm improves service delivery by orchestrating resources across edge nodes and cloud data centres. These environments consist of heterogeneo...
Timed-Arc Petri net (TAPN) is a timed extension of the classical Petri net model where tokens have their age and input arcs are associated with time intervals r...
Deployers of online LLM services usually seek to maximize cluster-wide performance given a fixed number of GPUs. Tensor parallelism (TP) is necessary to fit mod...
Exact tensor network contraction underpins quantum circuit simulation, quantum error correction, combinatorial optimization, and many-body dynamics. The dominan...
Hospitals run more machine learning on GPUs while the carbon footprint of grid electricity rises and falls through the day. Using a computer simulation, we comp...
Distributed applications increasingly support local-first collaboration over shared data, allowing multiple users to perform updates concurrently without global...
Forewords & Praise When I decided to self‑publish Docker and Kubernetes Security in early 2025, I never imagined the incredible support from the community that...
We study the aggregation problem in synchronous multi-hop radio networks with O(log n)-bit messages and no collision detection. Each node initially holds a valu...
This paper investigates scheduling strategies for wireless sensor-actuator networks (WSANs) in Industry 4.0 scenarios. In particular, we address the problem of ...
This work introduces a self-optimizing virtual processor (VP) for numerical array programs that shifts parallelization from a manual developer task to a coopera...
We present RaFI, a CUDA and MPI based software framework that simplifies the task of building GPU-enabled data-parallel software where rays or similar work item...
We present and show how to implement a non-trivial all-to-all communication algorithm for arbitrary d-dimensional tori effectively in MPI. Given a factorization...
In recent years, HPC systems and CPU architectures as their central components, have become increasingly complex, making application development and optimizatio...
Sparse tensors are the most used representation of sparse multidimensional data. Operations that decompose them, selecting their most important features while r...
Maximal Independent Set (MIS) in a graph is a fundamental problem with applications in resource allocation, scheduling, and network optimization. Although graph...
Modern logistics systems tend to generate continuous streams of data from sources such as GPS, IoT sensors, and logistics management systems. The aggregation, p...
The trend of increasing cluster sizes of supercomputers leads to a growing susceptibility to Silent Data Corruption (SDC) that can invalidate program results. A...
Anthropic's latest model on GitLab is built for precise execution across complex multi-step agent work. Agents fail most often on complex, multi-step work: task...
May 12, 2026 Docker AI Governance: Unlock Agent Autonomy, Safely Introducing Docker AI Governance: centralized control over how agents execute, what they can re...
Asynchronous iterative methods tolerate straggling processors by allowing workers to proceed with stale data, but at a cost: the iterates become inconsistent, p...
State-of-the-art multiple sequence alignment (MSA) algorithms are based on progressive approaches that rely on pairwise sequence alignment (PSA) to generate gui...
Multi-constraint hypergraph partitioning is a generalization of balanced partitioning, where the vertex set of a hypergraph is partitioned such that the inter-b...
As high-performance computing systems scale in size and complexity, efficient resource management is essential to minimize communication overhead. The HyperX is...
The rapid adoption of large language models (LLMs) has shifted a substantial portion of inference workloads into throughput-oriented offline regimes, where full...
Datacenter network design plays a critical role in AI training by supporting scaling to thousands of accelerators. An open problem, designing a near-optimal thr...
Multimodal LLM datasets are inherently heterogeneous, with significant data variability. Although each modality exhibits independent variability, sample-level e...
Digital certificates quietly underpin almost everything that matters in modern IT: public websites, internal systems, APIs, and machine‑to‑machine traffic. For...
In the adrenaline‑filled moments leading up to a Red Hat Certification exam, we know you want to focus and maybe get in a little last‑minute preparation. Our go...
The end of support for Amazon Linux 2 AL2 is upcoming on June 30, 2026. For users, that means that the migration to other distributions is a necessary step in o...
Neighbor graphs capture relationships among data points and are widely used in data analytics and AI workloads. Many studies have explored approximate construct...
DORA metrics have been a reliable compass for engineering teams for over a decade. Deployment frequency, lead time for changes, change failure rate, mean time t...
Password‑less Provisioning & Atomic Customization In the modern cloud landscape, security is more important than ever. AI‑based, GPU‑powered algorithms have tu...
Partitioning and Z‑Ordering have long been fundamental techniques in Delta Lake for optimizing data layout and query performance. However, these methods require...
The fight to maintain security has moved to the engineer’s messy desktop. Last week, AI‑search provider Perplexity open‑sourced an internal tool, Bumblebee, for...
The edge-cloud computing continuum demands self-management mechanisms that scale across autonomous administrative domains while honouring tenant- and operator-s...
Nonlinear reformulations of the spectral clustering method have gained a lot of recent attention due to their increased numerical benefits and their solid mathe...
Docker AI Governance: Unlock Agent Autonomy, Safely May 12, 2026 Introducing Docker AI Governance: centralized control over how agents execute, what they can r...
Extreme-scale data centers are the backbone of next-generation computing, enabling breakthroughs in science, artificial intelligence, and global innovation thro...
All-to-All communication is a key performance bottleneck for distributed machine learning (ML) and high-performance computing (HPC) workloads, where dense traff...
AI‑Driven Operations for Modern Hybrid Enterprises As AI capabilities continue to evolve, AI is becoming central to managing the growing complexity of distribu...
Reverse k nearest neighbor (RkNN) queries are fundamental in spatial databases, location-based analytics, and recommendation systems. Existing state-of-the-art ...
AI coding agents produce code faster than most teams can validate it. Without a validation step between the agent and CI, every problem gets caught after the pu...