Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers

Published: 3 days ago (February 28, 2026 at 03:20 PM EST)

5 min read

Source: Hacker News

Alibaba’s Qwen 3.5 Medium Model Series

A little more than a day ago the Qwen AI team released the Qwen 3.5 Medium Model series, a family of four new large‑language models (LLMs) that support agentic tool calling. Three of the models are available for commercial use under the Apache 2.0 license:

Qwen 3.5‑35B‑A3B
Qwen 3.5‑122B‑A10B
Qwen 3.5‑27B

The models can be downloaded from Hugging Face and ModelScope.

A fourth model, Qwen 3.5‑Flash, is proprietary and only accessible via the Alibaba Cloud Model Studio API, but it offers a strong cost advantage compared with Western alternatives (see the pricing table below).

Why the Open‑Source Models Matter

Benchmark performance – On third‑party tests the open‑source Qwen 3.5 models match or beat similarly‑sized proprietary models from OpenAI and Anthropic, surpassing OpenAI’s GPT‑5‑mini and Anthropic’s Claude Sonnet 4.5 (released only five months ago).
Quantization‑friendly – The team reports that the models stay highly accurate even when quantized, i.e., when the numeric precision of weights and KV‑cache values is reduced.
Frontier‑level context windows on the desktop – The flagship Qwen 3.5‑35B‑A3B can exceed a 1 million‑token context length on consumer‑grade GPUs with 32 GB VRAM, far less compute than many competing solutions.
Near‑lossless 4‑bit quantization – Enables massive datasets to be processed on modest hardware.

Technology: Delta Force

Qwen 3.5’s performance stems from a hybrid architecture that blends Gated Delta Networks with a sparse Mixture‑of‑Experts (MoE) system. Highlights from the Qwen 3.5‑35B‑A3B specifications:

Feature	Detail
Parameter Efficiency	35 B total parameters, but only 3 B are active for any given token.
Expert Diversity	MoE layer contains 256 experts; 8 are routed per token plus 1 shared expert, reducing inference latency.
Near‑Lossless Quantization	Maintains high accuracy with 4‑bit weights, shrinking the memory footprint for local deployment.
Base Model Release	Alibaba open‑sourced the Qwen 3.5‑35B‑A3B‑Base model alongside the instruction‑tuned variants.

Product: Intelligence That “Thinks” First

Qwen 3.5 introduces a native “Thinking Mode.” Before emitting a final answer, the model generates an internal reasoning chain wrapped in “ tags, allowing it to work through complex logic.

Model	Target Hardware	Context Length	Notable Traits
Qwen 3.5‑27B	High‑efficiency GPUs	> 800 K tokens	Optimized for low‑resource environments.
Qwen 3.5‑Flash	Hosted on Alibaba Cloud	1 M + tokens (default)	Production‑grade, includes official tools.
Qwen 3.5‑122B‑A10B	Server‑grade GPUs (80 GB VRAM)	1 M + tokens	Bridges the gap to the world’s largest frontier models.

Benchmark results show the 35B‑A3B model surpasses larger predecessors (e.g., Qwen‑3‑235B) and the proprietary GPT‑5‑mini and Claude Sonnet 4.5 in knowledge (MMMLU) and visual reasoning (MMMU‑Pro).

Alibaba Qwen 3.5 Medium models benchmark comparison chart. Credit: Alibaba

Pricing & API Integration

For users who prefer not to host the weights themselves, Alibaba Cloud Model Studio offers an API for Qwen 3.5‑Flash with the following rates:

Operation	Price (per 1 M tokens)
Input	$0.10
Output	$0.40
Cache Creation	$0.125
Cache Read	$0.01
Tool Calling – Web Search	$10 per 1 000 calls
Tool Calling – Code Interpreter	Free (limited‑time offer)

Cost Comparison with Other Major LLM APIs

Model	Input	Output	Total Cost*	Source
Qwen 3 Turbo	$0.05	$0.20	$0.25	Alibaba Cloud
Qwen 3.5‑Flash	$0.10	$0.40	$0.50	Alibaba Cloud
DeepSeek‑Chat (v3.2‑Exp)	$0.28	$0.42	$0.70	DeepSeek
DeepSeek‑Reasoner (v3.2‑Exp)	$0.28	$0.42	$0.70	DeepSeek
Grok 4.1 Fast (reasoning)	$0.20	$0.50	$0.70	xAI
Grok 4.1 Fast (non‑reasoning)	$0.20	$0.50	$0.70	xAI

*Total cost = Input + Output (per 1 M tokens).

Qwen 3.5‑Flash is therefore among the most affordable LLM APIs worldwide.

All information is current as of 28 Feb 2026.

Model Pricing Overview

Model	Input $ / 1K tokens	Output $ / 1K tokens	Total $ / 1K tokens*	Provider
MiniMax M2.5	0.15	1.20	1.35	MiniMax
MiniMax M2.5‑Lightning	0.30	2.40	2.70	MiniMax
Gemini 3 Flash Preview	0.50	3.00	3.50	Google
Kimi‑k2.5	0.60	3.00	3.60	Moonshot
GLM‑5	1.00	3.20	4.20	Z.ai
ERNIE 5.0	0.85	3.40	4.25	Baidu
Claude Haiku 4.5	1.00	5.00	6.00	Anthropic
Qwen3‑Max (2026‑01‑23)	1.20	6.00	7.20	Alibaba Cloud
Gemini 3 Pro (≤200K)	2.00	12.00	14.00	Google
GPT‑5.2	1.75	14.00	15.75	OpenAI
Claude Sonnet 4.5	3.00	15.00	18.00	Anthropic
Gemini 3 Pro (>200K)	4.00	18.00	22.00	Google
Claude Opus 4.6	5.00	25.00	30.00	Anthropic
GPT‑5.2 Pro	21.00	168.00	189.00	OpenAI

*Total = Input + Output cost per 1 K tokens (rounded to two decimals).

What It Means for Enterprise Technical Leaders and Decision‑Makers

With the launch of the Qwen 3.5 Medium Models, rapid iteration and fine‑tuning—once the exclusive domain of well‑funded labs—are now accessible for on‑premise development at many non‑technical firms. This effectively decouples sophisticated AI from massive capital expenditure.

Across the organization, this architecture transforms how data is handled and secured. The ability to ingest massive document repositories or hour‑scale videos locally enables deep institutional analysis without the privacy risks of third‑party APIs.

By running these specialized Mixture‑of‑Experts models within a private firewall, organizations can maintain sovereign control over their data while leveraging native “thinking” modes and official tool‑calling capabilities to build more reliable, autonomous agents.

Early adopters on Hugging Face have specifically lauded the model’s ability to “narrow the gap” in agentic scenarios where previously only the largest closed models could compete.

This shift toward architectural efficiency over raw scale ensures that AI integration remains cost‑conscious, secure, and agile enough to keep pace with evolving operational needs.

Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers

Alibaba’s Qwen 3.5 Medium Model Series

Why the Open‑Source Models Matter

Technology: Delta Force

Product: Intelligence That “Thinks” First

Pricing & API Integration

Cost Comparison with Other Major LLM APIs

Model Pricing Overview

What It Means for Enterprise Technical Leaders and Decision‑Makers

Related posts

SkyDiscover: An Open Framework for LLM-Driven Algorithm Discovery

SNEAK PEAK - I Saw This AI Efficiency Trend Coming a Mile Away ....

Alibaba's small, open source Qwen3.5-9B beats OpenAI's gpt-oss-120B and can run on standard laptops

I Built an AI That Automates Literature Reviews — Here's How It Works Under the Hood

Alibaba’s Qwen 3.5 Medium Model Series

Why the Open‑Source Models Matter

Technology: Delta Force

Product: Intelligence That “Thinks” First

Pricing & API Integration

Cost Comparison with Other Major LLM APIs

Model Pricing Overview

What It Means for Enterprise Technical Leaders and Decision‑Makers

Related posts

SkyDiscover: An Open Framework for LLM-Driven Algorithm Discovery

SNEAK PEAK - I Saw This AI Efficiency Trend Coming a Mile Away ....

Alibaba's small, open source Qwen3.5-9B beats OpenAI's gpt-oss-120B and can run on standard laptops

I Built an AI That Automates Literature Reviews — Here's How It Works Under the Hood

Alibaba’s Qwen 3.5 Medium Model Series