[Paper] Distributed Linearly Separable Computation with Arbitrary Heterogeneous Data Assignment

Published: 3 weeks ago (January 15, 2026 at 03:35 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.10177v1

Overview

This paper tackles a core challenge in large‑scale distributed systems: computing linearly separable functions (e.g., weighted sums, linear regressions) when the data stored on each worker node is heterogeneous and pre‑assigned. Unlike prior work that assumes a perfectly balanced, centrally‑controlled data placement, the authors study the realistic scenario where each worker may hold a different number of datasets and the assignment cannot be changed. They derive the fundamental trade‑off between the computable dimension of the target function and the communication cost needed to retrieve the result at a master node.

Key Contributions

General heterogeneous model: Formalizes distributed linearly separable computation with arbitrary data assignments, covering any mix of dataset counts per worker.
Universal computing scheme: Proposes a single, constructive algorithm that works for any heterogeneous assignment and achieves the optimal trade‑off in several parameter regimes.
Tight converse bound: Develops a matching information‑theoretic lower bound on communication cost, proving optimality of the scheme under integer cost constraints.
Extension to fractional costs: Shows how to interpolate between integer communication rounds using time‑sharing, broadening the applicability of the results.
Structural insight: Introduces a novel way to characterize data‑assignment graphs that reveals when the universal scheme is provably optimal.

Methodology

Problem formulation
- One master node and (N) workers.
- There are (K) messages, each derived from a distinct dataset.
- The master wants (K_c) linear combinations of these messages (the computable dimension).
- Workers hold subsets of the (K) datasets; the assignment is given and may be uneven.
Communication model
- Workers send coded symbols to the master over a shared, error‑free link.
- The communication cost is the total number of transmitted symbols (integer or fractional).
Universal scheme design
- Build a bipartite graph between workers and messages.
- Identify covering sets that allow the master to reconstruct each desired linear combination using linear algebra (essentially solving a system of equations).
- Use linear network coding at each worker: each transmitted symbol is a linear combination of the messages it stores, chosen to align with the covering sets.
Converse (lower bound) proof
- Apply cut‑set arguments on the bipartite graph to bound the minimum number of symbols needed to recover the target functions.
- Show that any scheme must transmit at least as many symbols as the size of a minimal covering set, yielding the tight bound.
Fractional extension
- Allow workers to split their transmission budget across multiple rounds (time‑sharing).
- Convexify the integer trade‑off region to obtain the optimal fractional communication cost.

Results & Findings

Scenario	Achievable Communication Cost	Converse (Lower Bound)	Gap
Integer costs, arbitrary heterogeneous assignment	Universal scheme → cost = size of minimal covering set	Same expression from converse	Zero (optimal) in regimes where covering set size matches task dimension
Fractional costs	Time‑shared version of universal scheme	Convex hull of integer lower bounds	Zero (optimal) for all examined parameter ranges
Special homogeneous case (all workers hold same #datasets)	Reduces to known cyclic‑assignment optimality	Matches prior optimal results	Consistent with literature

Interpretation:

The universal scheme attains the information‑theoretic minimum communication cost whenever the task dimension (K_c) does not exceed the effective rank of the data‑assignment matrix.
Even when the assignment is highly skewed (some workers hold many datasets, others few), the scheme automatically adapts by leveraging the workers with richer data to carry more of the coding burden.

Practical Implications

Edge‑cloud and IoT deployments – Devices often have uneven storage capacities; this work shows how to orchestrate linear analytics (e.g., federated averaging, distributed PCA) without reshuffling data.
MapReduce‑style pipelines – When reducers receive uneven numbers of map outputs, the proposed coding strategy can minimize shuffle traffic, directly cutting network costs.
Serverless compute platforms – Functions may be invoked with arbitrary subsets of data; the universal scheme provides a recipe for aggregating results with minimal data movement.
Data‑center resource allocation – Operators can evaluate the computable dimension they can support given a fixed network budget, informing placement policies that need not be perfectly balanced.
Framework integration – The linear coding operations are simple matrix multiplications, making it straightforward to embed the algorithm into existing distributed ML libraries (e.g., TensorFlow, PyTorch) or data‑processing engines (Spark, Flink).

Limitations & Future Work

Assumes error‑free, synchronous communication; real networks may experience packet loss or latency variability, which could affect the feasibility of the tight coding schedule.
The model focuses on linear functions; extending the theory to non‑linear aggregations (e.g., max, median) remains open.
The converse bound relies on integer communication costs; while the fractional extension bridges this gap, practical systems may need to handle granular packet sizes and protocol overheads.
Future research could explore adaptive schemes that dynamically re‑assign data or adjust coding coefficients in response to worker failures or stragglers, further enhancing robustness in heterogeneous environments.

Authors

Ziting Zhang
Kai Wan
Minquan Cheng
Shuo Shao
Giuseppe Caire

Paper Information

arXiv ID: 2601.10177v1
Categories: cs.DC, cs.IT
Published: January 15, 2026
PDF: Download PDF

[Paper] Distributed Linearly Separable Computation with Arbitrary Heterogeneous Data Assignment

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Space-Optimal, Computation-Optimal, Topology-Agnostic, Throughput-Scalable Causal Delivery through Hybrid Buffering

[Paper] Konflux: Optimized Function Fusion for Serverless Applications

[Paper] AFLL: Real-time Load Stabilization for MMO Game Servers Based on Circular Causality Learning

[Paper] Breaking the Storage-Bandwidth Tradeoff in Distributed Storage with Quantum Entanglement