z.ai's open source GLM-5 achieves record low hallucination rate and leverages new RL 'slime' technique
Source: VentureBeat
Chinese AI Startup Zhupai (z.ai) Announces GLM‑5
GLM‑5 is the newest large‑language model (LLM) in Zhupai’s GLM series. It is released under an MIT open‑source license, making it suitable for enterprise deployment. Notable achievements include:
- Record‑low hallucination rate on the independent Artificial Analysis Intelligence Index v4.0 (AA‑Omniscience Index score: ‑1, a 35‑point improvement over GLM‑4.5).
- Industry‑leading knowledge reliability – the model prefers to abstain rather than fabricate, outperforming U.S. competitors such as Google, OpenAI, and Anthropic.
- Native “Agent Mode” that converts raw prompts or source material directly into professional office documents (
.docx,.pdf,.xlsx).
Pricing
- Input tokens: ~ $0.80 / 1 M tokens
- Output tokens: ~ $2.56 / 1 M tokens
This is roughly 6× cheaper than proprietary rivals like Claude Opus 4.6.
Technology: Scaling for Agentic Efficiency
| Feature | Details |
|---|---|
| Parameters | 744 B total (up from 355 B in GLM‑4.5) with 40 B active per token via a Mixture‑of‑Experts (MoE) architecture |
| Pre‑training data | 28.5 T tokens |
| Context length | 200 K tokens (enabled by DeepSeek Sparse Attention) |
| Training infrastructure | “Slime” – an asynchronous reinforcement‑learning (RL) system that breaks lock‑step bottlenecks. Includes Active Partial Rollouts (APRIL) to cut RL training time. |
| System architecture | 1. Training module – powered by Megatron‑LM 2. Rollout module – built on SGLang + custom routers for high‑throughput data generation 3. Data Buffer – manages prompt initialization and rollout storage |
| Agentic capabilities | Adaptive verifiable environments, multi‑turn compilation feedback loops, and high‑throughput generation for long‑horizon tasks. |
End‑to‑End Knowledge Work
GLM‑5 is positioned as an “office” tool for the AGI era:
- Document generation: Turns prompts into ready‑to‑use
.docx,.pdf, and.xlsxfiles (e.g., financial reports, sponsorship proposals, complex spreadsheets). - Agentic Engineering: Humans define quality gates; the model handles execution, decomposing high‑level goals into actionable subtasks.
Benchmark Performance
| Benchmark | GLM‑5 Score | Competitor |
|---|---|---|
| SWE‑bench Verified | 77.8 | Gemini 3 Pro (76.2) |
| Vending Bench 2 (business simulation) | $4,432.12 (final balance) | #1 among open‑source models |
| AA‑Omniscience Index | ‑1 | 35‑point improvement over GLM‑4.5 |
According to Artificial Analysis, GLM‑5 is now the most powerful open‑source model globally, surpassing Moonshot’s Kimi K2.5 released two weeks earlier.
Cost Comparison
| Model | Input (per 1 M tokens) | Output (per 1 M tokens) | Total (1 M in + 1 M out) | Source |
|---|---|---|---|---|
| Qwen 3 Turbo | $0.05 | $0.20 | $0.25 | Alibaba Cloud |
| Grok 4.1 Fast (reasoning) | $0.20 | $0.50 | $0.70 | xAI |
| Grok 4.1 Fast (non‑reasoning) | $0.20 | $0.50 | $0.70 | xAI |
| deepseek‑chat (V3.2‑Exp) | $0.28 | $0.42 | $0.70 | DeepSeek |
| deepseek‑reasoner (V3.2‑Exp) | $0.28 | $0.42 | $0.70 | DeepSeek |
| Gemini 3 Flash Preview | $0.50 | $3.00 | $3.50 | |
| Kimi‑k2.5 | $0.60 | $3.00 | $3.60 | Moonshot |
| GLM‑5 | $1.00 | $3.20 | $4.20 | Z.ai |
| ERNIE 5.0 | $0.85 | $3.40 | $4.25 | Qianfan |
| Claude Haiku 4.5 | $1.00 | $5.00 | $6.00 | Anthropic |
| Qwen3‑Max (2026‑01‑23) | $1.20 | $6.00 | $7.20 | Alibaba Cloud |
| Gemini 3 Pro (≤200K) | $2.00 | $12.00 | $14.00 | |
| GPT‑5.2 | $1.75 | $14.00 | $15.75 | OpenAI |
| Claude Sonnet 4.5 | $3.00 | $15.00 | $18.00 | Anthropic |
| Gemini 3 Pro (>200K) | $4.00 | $18.00 | $22.00 | |
| Claude Opus 4.6 | $5.00 | $25.00 | $30.00 | Anthropic |
| GPT‑5.2 Pro | $21.00 | $168.00 | $189.00 | OpenAI |
GLM‑5’s input cost is ~6× cheaper and its output cost ~10× cheaper than Claude Opus 4.6.
Additional Notes
- The release on OpenRouter (Feb 11 2026) confirms rumors that Zhipu AI (the parent of Zhupai) was behind the stealth model “Pony Alpha,” which previously dominated coding benchmarks on the platform.
- Despite its aggressive pricing, GLM‑5 delivers top‑tier benchmark performance, positioning it as a “steal” for enterprises seeking high‑quality, cost‑effective LLM capabilities.
Benchmarks and Low Cost
Not all early users are enthusiastic about the model, noting its high performance doesn’t tell the whole story.
Lukas Petersson, co‑founder of the safety‑focused autonomous AI protocol startup Andon Labs, remarked on X:
“After hours of reading GLM‑5 traces: an incredibly effective model, but far less situationally aware. Achieves goals via aggressive tactics but doesn’t reason about its situation or leverage experience. This is scary. This is how you get a paperclip maximizer.”
The paperclip maximizer refers to a hypothetical scenario described by Oxford philosopher Nick Bostrom (2003), in which an AI pursues a seemingly benign objective—such as maximizing paperclip production—to an extreme degree, potentially leading to catastrophic outcomes.
Should Your Enterprise Adopt GLM‑5?
Strategic Advantages
- Open‑source licensing – MIT License with open weights, allowing organizations to host their own frontier‑level intelligence.
- Vendor lock‑in mitigation – Full control over deployment and customization, unlike closed‑source competitors.
Practical Constraints
- Hardware requirements – 744 B parameters demand substantial GPU resources, which may be out of reach for smaller firms.
- Geopolitical considerations – Enterprises in regulated industries must assess data residency and provenance risks when adopting a China‑based model.
Governance Risks
- Autonomous AI agents introduce new governance challenges.
- As models shift from “chat” to “work,” they operate across apps and files autonomously.
- Without robust agent‑specific permissions and human‑in‑the‑loop quality gates, the risk of autonomous error rises sharply.
Ideal Use Cases
- Organizations outgrowing simple copilots and ready to build a truly autonomous office.
- Engineers needing to refactor legacy backends or create “self‑healing” pipelines that run continuously.
While Western labs continue to optimize for “thinking” and reasoning depth, Zai is optimizing for execution and scale.
Bottom Line
Enterprises that adopt GLM‑5 today are not just buying a cheaper model; they are betting on a future where the most valuable AI is the one that can finish the project without being asked twice.