[Paper] Covenant-72B: Pre-Training a 72B LLM with Trustless Peers Over-the-Internet

Published: (March 9, 2026 at 05:44 AM EDT)
4 min read
Source: arXiv

Source: arXiv - 2603.08163v1

Overview

Covenant‑72B showcases the first truly permissionless effort to pre‑train a 72‑billion‑parameter language model by harnessing compute contributed by anyone on the internet. By marrying a blockchain‑based coordination layer with a communication‑efficient optimizer, the authors demonstrate that large‑scale foundation models can be built without a closed, whitelisted cluster of machines—opening the door to a more democratic, cost‑effective path to AI research.

Key Contributions

  • Largest open‑participation pre‑training run to date (≈ 72 B parameters, ~1.1 T tokens).
  • Trustless coordination via blockchain, enabling anyone to join or leave the training pool without a central authority.
  • Introduction of SparseLoCo, a sparse, communication‑efficient optimizer that tolerates highly dynamic peer membership.
  • Empirical evidence that a globally distributed, permissionless setup can match or exceed the performance of centrally‑trained models with comparable compute budgets.
  • Release of the Covenant‑72B model weights and training scripts, encouraging reproducibility and further community‑driven research.

Methodology

  1. Peer‑to‑peer network – Participants run a lightweight client that registers on a public blockchain. The chain records proof‑of‑work contributions and enforces a simple “stake‑and‑verify” protocol to prevent malicious updates.
  2. SparseLoCo optimizer – Extends classic LoCo (Local Communication) by sparsifying gradient exchanges. Only a small, dynamically selected subset of model shards is communicated each round, drastically cutting bandwidth while preserving convergence.
  3. Dynamic participation – Nodes can appear or disappear at any time. SparseLoCo re‑balances shard assignments on‑the‑fly, ensuring that the global model sees a roughly uniform view of the data despite churn.
  4. Training data – A curated 1.1 T‑token corpus (web text, books, code) is sharded across peers; each node samples locally and contributes gradients to the global update.
  5. Evaluation – After pre‑training, Covenant‑72B is fine‑tuned on standard benchmarks (e.g., MMLU, GSM‑8K) to assess zero‑shot and few‑shot capabilities.

Results & Findings

  • Performance parity: On the MMLU benchmark, Covenant‑72B scores 58.3% accuracy, within 1–2% of a centrally‑trained 70 B model that used 1.3× more GPU‑hours.
  • Training efficiency: Despite network latency and node churn, SparseLoCo achieved a 3.4× reduction in communication overhead compared to naïve all‑reduce, cutting total wall‑clock time by ~22%.
  • Robustness to churn: Simulated node dropout rates up to 30% had negligible impact on final perplexity, confirming the optimizer’s resilience.
  • Cost savings: The distributed run consumed ~12,000 GPU‑hours, roughly 30% less than a comparable centralized run, thanks to the ability to tap idle resources worldwide.

Practical Implications

  • Democratized AI development – Start‑ups, research labs, or even hobbyist groups can now contribute compute and receive a stake in a cutting‑edge model without negotiating contracts with cloud providers.
  • Cost‑effective scaling – Companies can offload portions of large‑model training to a volunteer network, reducing cloud spend while maintaining competitive performance.
  • Resilient training pipelines – SparseLoCo’s tolerance for node churn makes it attractive for edge‑centric AI workloads where connectivity is intermittent (e.g., federated learning across mobile devices).
  • Open‑source model ecosystem – By releasing the weights, the community gains a high‑capacity LLM that can be fine‑tuned for niche applications (code assistance, domain‑specific chatbots) without the massive upfront compute investment.

Limitations & Future Work

  • Security model – While the blockchain provides basic trustlessness, sophisticated attacks (e.g., gradient poisoning) remain a concern and need hardened verification mechanisms.
  • Data heterogeneity – The current setup assumes roughly uniform data quality across peers; future work should explore adaptive weighting for highly skewed datasets.
  • Scalability ceiling – Experiments beyond 72 B parameters are pending; it is unclear how communication patterns will behave at the trillion‑parameter scale.
  • Energy accounting – The paper does not quantify the carbon footprint of the distributed run versus centralized training—a metric increasingly important for responsible AI.

Bottom line: Covenant‑72B proves that “anyone can help train a giant LLM” is no longer a sci‑fi fantasy. With a trustless blockchain backbone and a clever optimizer, the research community now has a viable blueprint for building massive models in a truly open, cost‑effective manner.

Authors

  • Joel Lidin
  • Amir Sarfi
  • Erfan Miahi
  • Quentin Anthony
  • Shivam Chauhan
  • Evangelos Pappas
  • Benjamin Thérien
  • Eugene Belilovsky
  • Samuel Dare

Paper Information

  • arXiv ID: 2603.08163v1
  • Categories: cs.DC, cs.LG
  • Published: March 9, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »