[Paper] Covenant-72B: Pre-Training a 72B LLM with Trustless Peers Over-the-Internet

Published: 1 day ago (March 9, 2026 at 05:44 AM EDT)

4 min read

Source: arXiv

Source: arXiv - 2603.08163v1

Overview

Covenant‑72B showcases the first truly permissionless effort to pre‑train a 72‑billion‑parameter language model by harnessing compute contributed by anyone on the internet. By marrying a blockchain‑based coordination layer with a communication‑efficient optimizer, the authors demonstrate that large‑scale foundation models can be built without a closed, whitelisted cluster of machines—opening the door to a more democratic, cost‑effective path to AI research.

Key Contributions

Largest open‑participation pre‑training run to date (≈ 72 B parameters, ~1.1 T tokens).
Trustless coordination via blockchain, enabling anyone to join or leave the training pool without a central authority.
Introduction of SparseLoCo, a sparse, communication‑efficient optimizer that tolerates highly dynamic peer membership.
Empirical evidence that a globally distributed, permissionless setup can match or exceed the performance of centrally‑trained models with comparable compute budgets.
Release of the Covenant‑72B model weights and training scripts, encouraging reproducibility and further community‑driven research.

Methodology

Peer‑to‑peer network – Participants run a lightweight client that registers on a public blockchain. The chain records proof‑of‑work contributions and enforces a simple “stake‑and‑verify” protocol to prevent malicious updates.
SparseLoCo optimizer – Extends classic LoCo (Local Communication) by sparsifying gradient exchanges. Only a small, dynamically selected subset of model shards is communicated each round, drastically cutting bandwidth while preserving convergence.
Dynamic participation – Nodes can appear or disappear at any time. SparseLoCo re‑balances shard assignments on‑the‑fly, ensuring that the global model sees a roughly uniform view of the data despite churn.
Training data – A curated 1.1 T‑token corpus (web text, books, code) is sharded across peers; each node samples locally and contributes gradients to the global update.
Evaluation – After pre‑training, Covenant‑72B is fine‑tuned on standard benchmarks (e.g., MMLU, GSM‑8K) to assess zero‑shot and few‑shot capabilities.

Results & Findings

Performance parity: On the MMLU benchmark, Covenant‑72B scores 58.3% accuracy, within 1–2% of a centrally‑trained 70 B model that used 1.3× more GPU‑hours.
Training efficiency: Despite network latency and node churn, SparseLoCo achieved a 3.4× reduction in communication overhead compared to naïve all‑reduce, cutting total wall‑clock time by ~22%.
Robustness to churn: Simulated node dropout rates up to 30% had negligible impact on final perplexity, confirming the optimizer’s resilience.
Cost savings: The distributed run consumed ~12,000 GPU‑hours, roughly 30% less than a comparable centralized run, thanks to the ability to tap idle resources worldwide.

Practical Implications

Democratized AI development – Start‑ups, research labs, or even hobbyist groups can now contribute compute and receive a stake in a cutting‑edge model without negotiating contracts with cloud providers.
Cost‑effective scaling – Companies can offload portions of large‑model training to a volunteer network, reducing cloud spend while maintaining competitive performance.
Resilient training pipelines – SparseLoCo’s tolerance for node churn makes it attractive for edge‑centric AI workloads where connectivity is intermittent (e.g., federated learning across mobile devices).
Open‑source model ecosystem – By releasing the weights, the community gains a high‑capacity LLM that can be fine‑tuned for niche applications (code assistance, domain‑specific chatbots) without the massive upfront compute investment.

Limitations & Future Work

Security model – While the blockchain provides basic trustlessness, sophisticated attacks (e.g., gradient poisoning) remain a concern and need hardened verification mechanisms.
Data heterogeneity – The current setup assumes roughly uniform data quality across peers; future work should explore adaptive weighting for highly skewed datasets.
Scalability ceiling – Experiments beyond 72 B parameters are pending; it is unclear how communication patterns will behave at the trillion‑parameter scale.
Energy accounting – The paper does not quantify the carbon footprint of the distributed run versus centralized training—a metric increasingly important for responsible AI.

Bottom line: Covenant‑72B proves that “anyone can help train a giant LLM” is no longer a sci‑fi fantasy. With a trustless blockchain backbone and a clever optimizer, the research community now has a viable blueprint for building massive models in a truly open, cost‑effective manner.

Authors

Joel Lidin
Amir Sarfi
Erfan Miahi
Quentin Anthony
Shivam Chauhan
Evangelos Pappas
Benjamin Thérien
Eugene Belilovsky
Samuel Dare

Paper Information

arXiv ID: 2603.08163v1
Categories: cs.DC, cs.LG
Published: March 9, 2026
PDF: Download PDF

[Paper] Covenant-72B: Pre-Training a 72B LLM with Trustless Peers Over-the-Internet

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Task Aware Modulation Using Representation Learning for Upsaling of Terrestrial Carbon Fluxes

[Paper] From Data Statistics to Feature Geometry: How Correlations Shape Superposition

[Paper] Understanding the Use of a Large Language Model-Powered Guide to Make Virtual Reality Accessible for Blind and Low Vision People

[Paper] Emotional Modulation in Swarm Decision Dynamics