z.ai's open source GLM-5 achieves record low hallucination rate and leverages new RL 'slime' technique

Published: (February 11, 2026 at 07:09 PM EST)
5 min read

Source: VentureBeat

Chinese AI Startup Zhupai (z.ai) Announces GLM‑5

GLM‑5 is the newest large‑language model (LLM) in Zhupai’s GLM series. It is released under an MIT open‑source license, making it suitable for enterprise deployment. Notable achievements include:

  • Record‑low hallucination rate on the independent Artificial Analysis Intelligence Index v4.0 (AA‑Omniscience Index score: ‑1, a 35‑point improvement over GLM‑4.5).
  • Industry‑leading knowledge reliability – the model prefers to abstain rather than fabricate, outperforming U.S. competitors such as Google, OpenAI, and Anthropic.
  • Native “Agent Mode” that converts raw prompts or source material directly into professional office documents (.docx, .pdf, .xlsx).

Pricing

  • Input tokens: ~ $0.80 / 1 M tokens
  • Output tokens: ~ $2.56 / 1 M tokens

This is roughly 6× cheaper than proprietary rivals like Claude Opus 4.6.

Technology: Scaling for Agentic Efficiency

FeatureDetails
Parameters744 B total (up from 355 B in GLM‑4.5) with 40 B active per token via a Mixture‑of‑Experts (MoE) architecture
Pre‑training data28.5 T tokens
Context length200 K tokens (enabled by DeepSeek Sparse Attention)
Training infrastructure“Slime” – an asynchronous reinforcement‑learning (RL) system that breaks lock‑step bottlenecks. Includes Active Partial Rollouts (APRIL) to cut RL training time.
System architecture1. Training module – powered by Megatron‑LM
2. Rollout module – built on SGLang + custom routers for high‑throughput data generation
3. Data Buffer – manages prompt initialization and rollout storage
Agentic capabilitiesAdaptive verifiable environments, multi‑turn compilation feedback loops, and high‑throughput generation for long‑horizon tasks.

End‑to‑End Knowledge Work

GLM‑5 is positioned as an “office” tool for the AGI era:

  • Document generation: Turns prompts into ready‑to‑use .docx, .pdf, and .xlsx files (e.g., financial reports, sponsorship proposals, complex spreadsheets).
  • Agentic Engineering: Humans define quality gates; the model handles execution, decomposing high‑level goals into actionable subtasks.

Benchmark Performance

BenchmarkGLM‑5 ScoreCompetitor
SWE‑bench Verified77.8Gemini 3 Pro (76.2)
Vending Bench 2 (business simulation)$4,432.12 (final balance)#1 among open‑source models
AA‑Omniscience Index‑135‑point improvement over GLM‑4.5

According to Artificial Analysis, GLM‑5 is now the most powerful open‑source model globally, surpassing Moonshot’s Kimi K2.5 released two weeks earlier.

Cost Comparison

ModelInput (per 1 M tokens)Output (per 1 M tokens)Total (1 M in + 1 M out)Source
Qwen 3 Turbo$0.05$0.20$0.25Alibaba Cloud
Grok 4.1 Fast (reasoning)$0.20$0.50$0.70xAI
Grok 4.1 Fast (non‑reasoning)$0.20$0.50$0.70xAI
deepseek‑chat (V3.2‑Exp)$0.28$0.42$0.70DeepSeek
deepseek‑reasoner (V3.2‑Exp)$0.28$0.42$0.70DeepSeek
Gemini 3 Flash Preview$0.50$3.00$3.50Google
Kimi‑k2.5$0.60$3.00$3.60Moonshot
GLM‑5$1.00$3.20$4.20Z.ai
ERNIE 5.0$0.85$3.40$4.25Qianfan
Claude Haiku 4.5$1.00$5.00$6.00Anthropic
Qwen3‑Max (2026‑01‑23)$1.20$6.00$7.20Alibaba Cloud
Gemini 3 Pro (≤200K)$2.00$12.00$14.00Google
GPT‑5.2$1.75$14.00$15.75OpenAI
Claude Sonnet 4.5$3.00$15.00$18.00Anthropic
Gemini 3 Pro (>200K)$4.00$18.00$22.00Google
Claude Opus 4.6$5.00$25.00$30.00Anthropic
GPT‑5.2 Pro$21.00$168.00$189.00OpenAI

GLM‑5’s input cost is ~6× cheaper and its output cost ~10× cheaper than Claude Opus 4.6.

Additional Notes

  • The release on OpenRouter (Feb 11 2026) confirms rumors that Zhipu AI (the parent of Zhupai) was behind the stealth model “Pony Alpha,” which previously dominated coding benchmarks on the platform.
  • Despite its aggressive pricing, GLM‑5 delivers top‑tier benchmark performance, positioning it as a “steal” for enterprises seeking high‑quality, cost‑effective LLM capabilities.

Benchmarks and Low Cost

Not all early users are enthusiastic about the model, noting its high performance doesn’t tell the whole story.

Lukas Petersson, co‑founder of the safety‑focused autonomous AI protocol startup Andon Labs, remarked on X:
“After hours of reading GLM‑5 traces: an incredibly effective model, but far less situationally aware. Achieves goals via aggressive tactics but doesn’t reason about its situation or leverage experience. This is scary. This is how you get a paperclip maximizer.

The paperclip maximizer refers to a hypothetical scenario described by Oxford philosopher Nick Bostrom (2003), in which an AI pursues a seemingly benign objective—such as maximizing paperclip production—to an extreme degree, potentially leading to catastrophic outcomes.

Should Your Enterprise Adopt GLM‑5?

Strategic Advantages

  • Open‑source licensing – MIT License with open weights, allowing organizations to host their own frontier‑level intelligence.
  • Vendor lock‑in mitigation – Full control over deployment and customization, unlike closed‑source competitors.

Practical Constraints

  • Hardware requirements – 744 B parameters demand substantial GPU resources, which may be out of reach for smaller firms.
  • Geopolitical considerations – Enterprises in regulated industries must assess data residency and provenance risks when adopting a China‑based model.

Governance Risks

  • Autonomous AI agents introduce new governance challenges.
  • As models shift from “chat” to “work,” they operate across apps and files autonomously.
  • Without robust agent‑specific permissions and human‑in‑the‑loop quality gates, the risk of autonomous error rises sharply.

Ideal Use Cases

  • Organizations outgrowing simple copilots and ready to build a truly autonomous office.
  • Engineers needing to refactor legacy backends or create “self‑healing” pipelines that run continuously.

While Western labs continue to optimize for “thinking” and reasoning depth, Zai is optimizing for execution and scale.

Bottom Line

Enterprises that adopt GLM‑5 today are not just buying a cheaper model; they are betting on a future where the most valuable AI is the one that can finish the project without being asked twice.

0 views
Back to Blog

Related posts

Read more »