z.ai's open source GLM-5 achieves record low hallucination rate and leverages new RL 'slime' technique

Published: 3 days ago (February 11, 2026 at 07:09 PM EST)

5 min read

Source: VentureBeat

Chinese AI Startup Zhupai (z.ai) Announces GLM‑5

GLM‑5 is the newest large‑language model (LLM) in Zhupai’s GLM series. It is released under an MIT open‑source license, making it suitable for enterprise deployment. Notable achievements include:

Record‑low hallucination rate on the independent Artificial Analysis Intelligence Index v4.0 (AA‑Omniscience Index score: ‑1, a 35‑point improvement over GLM‑4.5).
Industry‑leading knowledge reliability – the model prefers to abstain rather than fabricate, outperforming U.S. competitors such as Google, OpenAI, and Anthropic.
Native “Agent Mode” that converts raw prompts or source material directly into professional office documents (.docx, .pdf, .xlsx).

Pricing

Input tokens: ~ $0.80 / 1 M tokens
Output tokens: ~ $2.56 / 1 M tokens

This is roughly 6× cheaper than proprietary rivals like Claude Opus 4.6.

Technology: Scaling for Agentic Efficiency

Feature	Details
Parameters	744 B total (up from 355 B in GLM‑4.5) with 40 B active per token via a Mixture‑of‑Experts (MoE) architecture
Pre‑training data	28.5 T tokens
Context length	200 K tokens (enabled by DeepSeek Sparse Attention)
Training infrastructure	“Slime” – an asynchronous reinforcement‑learning (RL) system that breaks lock‑step bottlenecks. Includes Active Partial Rollouts (APRIL) to cut RL training time.
System architecture	1. Training module – powered by Megatron‑LM 2. Rollout module – built on SGLang + custom routers for high‑throughput data generation 3. Data Buffer – manages prompt initialization and rollout storage
Agentic capabilities	Adaptive verifiable environments, multi‑turn compilation feedback loops, and high‑throughput generation for long‑horizon tasks.

End‑to‑End Knowledge Work

GLM‑5 is positioned as an “office” tool for the AGI era:

Document generation: Turns prompts into ready‑to‑use .docx, .pdf, and .xlsx files (e.g., financial reports, sponsorship proposals, complex spreadsheets).
Agentic Engineering: Humans define quality gates; the model handles execution, decomposing high‑level goals into actionable subtasks.

Benchmark Performance

Benchmark	GLM‑5 Score	Competitor
SWE‑bench Verified	77.8	Gemini 3 Pro (76.2)
Vending Bench 2 (business simulation)	$4,432.12 (final balance)	#1 among open‑source models
AA‑Omniscience Index	‑1	35‑point improvement over GLM‑4.5

According to Artificial Analysis, GLM‑5 is now the most powerful open‑source model globally, surpassing Moonshot’s Kimi K2.5 released two weeks earlier.

Cost Comparison

Model	Input (per 1 M tokens)	Output (per 1 M tokens)	Total (1 M in + 1 M out)	Source
Qwen 3 Turbo	$0.05	$0.20	$0.25	Alibaba Cloud
Grok 4.1 Fast (reasoning)	$0.20	$0.50	$0.70	xAI
Grok 4.1 Fast (non‑reasoning)	$0.20	$0.50	$0.70	xAI
deepseek‑chat (V3.2‑Exp)	$0.28	$0.42	$0.70	DeepSeek
deepseek‑reasoner (V3.2‑Exp)	$0.28	$0.42	$0.70	DeepSeek
Gemini 3 Flash Preview	$0.50	$3.00	$3.50	Google
Kimi‑k2.5	$0.60	$3.00	$3.60	Moonshot
GLM‑5	$1.00	$3.20	$4.20	Z.ai
ERNIE 5.0	$0.85	$3.40	$4.25	Qianfan
Claude Haiku 4.5	$1.00	$5.00	$6.00	Anthropic
Qwen3‑Max (2026‑01‑23)	$1.20	$6.00	$7.20	Alibaba Cloud
Gemini 3 Pro (≤200K)	$2.00	$12.00	$14.00	Google
GPT‑5.2	$1.75	$14.00	$15.75	OpenAI
Claude Sonnet 4.5	$3.00	$15.00	$18.00	Anthropic
Gemini 3 Pro (>200K)	$4.00	$18.00	$22.00	Google
Claude Opus 4.6	$5.00	$25.00	$30.00	Anthropic
GPT‑5.2 Pro	$21.00	$168.00	$189.00	OpenAI

GLM‑5’s input cost is ~6× cheaper and its output cost ~10× cheaper than Claude Opus 4.6.

Additional Notes

The release on OpenRouter (Feb 11 2026) confirms rumors that Zhipu AI (the parent of Zhupai) was behind the stealth model “Pony Alpha,” which previously dominated coding benchmarks on the platform.
Despite its aggressive pricing, GLM‑5 delivers top‑tier benchmark performance, positioning it as a “steal” for enterprises seeking high‑quality, cost‑effective LLM capabilities.

Benchmarks and Low Cost

Not all early users are enthusiastic about the model, noting its high performance doesn’t tell the whole story.

Lukas Petersson, co‑founder of the safety‑focused autonomous AI protocol startup Andon Labs, remarked on X:
“After hours of reading GLM‑5 traces: an incredibly effective model, but far less situationally aware. Achieves goals via aggressive tactics but doesn’t reason about its situation or leverage experience. This is scary. This is how you get a paperclip maximizer.”

The paperclip maximizer refers to a hypothetical scenario described by Oxford philosopher Nick Bostrom (2003), in which an AI pursues a seemingly benign objective—such as maximizing paperclip production—to an extreme degree, potentially leading to catastrophic outcomes.

Should Your Enterprise Adopt GLM‑5?

Strategic Advantages

Open‑source licensing – MIT License with open weights, allowing organizations to host their own frontier‑level intelligence.
Vendor lock‑in mitigation – Full control over deployment and customization, unlike closed‑source competitors.

Practical Constraints

Hardware requirements – 744 B parameters demand substantial GPU resources, which may be out of reach for smaller firms.
Geopolitical considerations – Enterprises in regulated industries must assess data residency and provenance risks when adopting a China‑based model.

Governance Risks

Autonomous AI agents introduce new governance challenges.
As models shift from “chat” to “work,” they operate across apps and files autonomously.
Without robust agent‑specific permissions and human‑in‑the‑loop quality gates, the risk of autonomous error rises sharply.

Ideal Use Cases

Organizations outgrowing simple copilots and ready to build a truly autonomous office.
Engineers needing to refactor legacy backends or create “self‑healing” pipelines that run continuously.

While Western labs continue to optimize for “thinking” and reasoning depth, Zai is optimizing for execution and scale.

Bottom Line

Enterprises that adopt GLM‑5 today are not just buying a cheaper model; they are betting on a future where the most valuable AI is the one that can finish the project without being asked twice.