Alibaba's new open source Qwen3.5-Medium models offer Sonnet 4.5 performance on local computers

Published: 3 days ago (February 25, 2026 at 10:39 PM EST)

6 min read

Source: VentureBeat

Qwen 3.5 Medium Model Series – Overview

Alibaba’s now‑famed Qwen AI development team has done it again. A little more than a day ago they released the Qwen 3.5 Medium Model series, consisting of four new large‑language models (LLMs) with support for agentic tool‑calling. Three of the models are available for commercial usage by enterprises and indie developers under the standard open‑source Apache 2.0 license:

Model
Qwen3.5‑35B‑A3B
Qwen3.5‑122B‑A10B
Qwen3.5‑27B

Developers can download them now on Hugging Face and ModelScope.

A fourth model, Qwen3.5‑Flash, appears to be proprietary and is only available through the Alibaba Cloud Model Studio API, but it still offers a strong cost advantage compared with other Western models (see the pricing comparison table below).

Key point: The open‑source models achieve comparably high performance on third‑party benchmark tests to similarly‑sized proprietary models from major U.S. startups like OpenAI or Anthropic, actually beating OpenAI’s GPT‑5‑mini and Anthropic’s Claude Sonnet 4.5 (released five months ago).

The Qwen team also says the models remain highly accurate when quantized—a process that reduces the model’s footprint by storing fewer numeric values.

Frontier‑Level Context Windows on Desktop PCs

The flagship Qwen3.5‑35B‑A3B can exceed a 1 million‑token context length on consumer‑grade GPUs with 32 GB of VRAM.
This is far less compute than many other comparably‑performant options.
Near‑lossless accuracy under 4‑bit weight and KV‑cache quantization makes massive dataset processing possible without server‑grade infrastructure.

Technology: Delta Force

At the heart of Qwen 3.5’s performance is a sophisticated hybrid architecture. While many models rely solely on standard Transformer blocks, Qwen 3.5 integrates Gated Delta Networks combined with a sparse Mixture‑of‑Experts (MoE) system.

Technical specifications for Qwen3.5‑35B‑A3B

Feature	Detail
Parameter Efficiency	35 B total parameters, but only 3 B are activated for any given token.
Expert Diversity	MoE layer uses 256 experts; 8 routed experts + 1 shared expert maintain performance while slashing inference latency.
Near‑Lossless Quantization	High accuracy retained when compressed to 4‑bit weights, dramatically reducing memory footprint for local deployment.
Base Model Release	Alibaba has open‑sourced the Qwen3.5‑35B‑A3B‑Base model alongside the instruct‑tuned versions.

Product: Intelligence That “Thinks” First

Qwen 3.5 introduces a native “Thinking Mode” as its default state. Before providing a final answer, the model generates an internal reasoning chain—delimited by “ tags—to work through complex logic.

Model lineup (tailored for varying hardware environments)

Model	Highlights
Qwen3.5‑27B	Optimized for high efficiency; supports a context length of > 800 K tokens.
Qwen3.5‑Flash	Production‑grade hosted version; default 1 M‑token context length; built‑in official tools.
Qwen3.5‑122B‑A10B	Designed for server‑grade GPUs (80 GB VRAM); supports 1 M+ token contexts; narrows the gap with the world’s largest frontier models.

Benchmark results validate this architectural shift. The 35B‑A3B model notably surpasses much larger predecessors (e.g., Qwen3‑235B) as well as the proprietary GPT‑5 mini and Claude Sonnet 4.5 in categories such as knowledge (MMMLU) and visual reasoning (MMMU‑Pro).

Pricing & API Integration

For those not hosting their own weights, Alibaba Cloud Model Studio provides a competitive API for Qwen3.5‑Flash.

Operation	Cost (per 1 M tokens)
Input	$0.10
Output	$0.40
Cache Creation	$0.125
Cache Read	$0.01

Tool‑Calling pricing (selected)

Tool	Cost
Web Search	$10 / 1 000 calls
Code Interpreter	Free (limited‑time offer)

These rates make Qwen3.5‑Flash among the most affordable LLM APIs worldwide. Below is a comparison table (all values are per 1 M tokens unless otherwise noted).

Model	Input	Output	Total Cost	Source
Qwen 3 Turbo	$0.05	$0.20	$0.25	Alibaba Cloud
Qwen3.5‑Flash	$0.10	$0.40	$0.50	Alibaba Cloud
deepseek‑chat (V3.2‑Exp)	$0.28	$0.42	$0.70	DeepSeek
deepseek‑reasoner (V3.2‑Exp)	$0.28	$0.42	$0.70	DeepSeek
Grok 4.1 Fast (reasoning)	$0.20	$0.50	$0.70	xAI
Grok 4.1 Fast (non‑reasoning)	$0.20	$0.50	$0.70	xAI
MiniMax M2.5	$0.15	$1.20	$1.35	MiniMax
MiniMax M2.5‑Lightning	$0.30	$2.40	$2.70	MiniMax
Gemini 3 Flash Preview	$0.50	$3.00	$3.50	Google
Kimi‑k2.5	$0.60	$3.00	$3.60	Moonshot
GLM‑5	$1.00	$3.20	$4.20	Z.ai
ERNIE 5.0	$0.85	$3.40	$4.25	Baidu
Claude Haiku 4.5	$1.00	$5.00	$6.00	Anthropic
Qwen3‑Max (2026‑01‑23)	$1.20	$6.00	$7.20	Alibaba Cloud
Gemini 3 Pro (≤200K)	$2.00	$12.00	$14.00	Google
GPT‑5.2	$1.75	$14.00	$15.75	OpenAI
Claude Sonnet 4.5	$3.00	$15.00	$18.00	Anthropic
Gemini 3 Pro (>200K)	$4.00	$18.00	$22.00	Google
Claude Opus 4.6	$5.00	$25.00	$30.00	Anthropic
GPT‑5.2 Pro	$21.00	$168.00	$189.00	OpenAI

What It Means for Enterprise Technical Leaders & Decision‑Makers

With the launch of the Qwen 3.5 Medium Models, rapid iteration and fine‑tuning—once reserved for well‑funded labs—are now accessible for on‑premise development at many non‑technical firms. This effectively decouples sophisticated AI from massive capital expenditure.

Data security & privacy: Organizations can ingest massive document repositories or hour‑scale videos locally, enabling deep institutional analysis without exposing sensitive data to third‑party clouds.
Cost efficiency: Near‑lossless 4‑bit quantization and the ability to run 1 M‑token contexts on consumer‑grade GPUs dramatically lower compute spend.
Flexibility: The open‑source Apache 2.0 licensing lets enterprises customize, extend, and integrate the models into existing pipelines without vendor lock‑in.

In short, Qwen 3.5 brings frontier‑level AI capabilities to the desktop and on‑premise environments, opening the door for a new wave of enterprise AI innovation.

party APIs.

By running these specialized "Mixture-of-Experts" models within a private firewall, organizations can maintain sovereign control over their data while utilizing native "thinking" modes and official tool‑calling capabilities to build more reliable, autonomous agents.

Early adopters on Hugging Face have specifically lauded the model’s ability to "narrow the gap" in agentic scenarios where previously only the largest closed models could compete.

This shift toward architectural efficiency over raw scale ensures that AI integration remains cost‑conscious, secure, and agile enough to keep pace with evolving operational needs.

Alibaba's new open source Qwen3.5-Medium models offer Sonnet 4.5 performance on local computers

Qwen 3.5 Medium Model Series – Overview

Frontier‑Level Context Windows on Desktop PCs

Technology: Delta Force

Technical specifications for Qwen3.5‑35B‑A3B

Product: Intelligence That “Thinks” First

Model lineup (tailored for varying hardware environments)

Pricing & API Integration

Tool‑Calling pricing (selected)

What It Means for Enterprise Technical Leaders & Decision‑Makers

Related posts

Vibe coding with overeager AI: Lessons learned from treating Google AI Studio like a teammate

Anthropic vs. The Pentagon: what enterprises should do

OpenAI's big investment from Amazon comes with something else: new 'stateful' architecture for enterprise agents

OpenAI's big investment from AWS comes with something else: new 'stateful' architecture for enterprise agents

Qwen 3.5 Medium Model Series – Overview

Frontier‑Level Context Windows on Desktop PCs

Technology: Delta Force

Technical specifications for Qwen3.5‑35B‑A3B

Product: Intelligence That “Thinks” First

Model lineup (tailored for varying hardware environments)

Pricing & API Integration

Tool‑Calling pricing (selected)

What It Means for Enterprise Technical Leaders & Decision‑Makers

Related posts

Vibe coding with overeager AI: Lessons learned from treating Google AI Studio like a teammate

Anthropic vs. The Pentagon: what enterprises should do

OpenAI's big investment from Amazon comes with something else: new 'stateful' architecture for enterprise agents

OpenAI's big investment from AWS comes with something else: new 'stateful' architecture for enterprise agents

Qwen 3.5 Medium Model Series – Overview