Alibaba's small, open source Qwen3.5-9B beats OpenAI's gpt-oss-120B and can run on standard laptops
Source: VentureBeat
Overview
Despite political turmoil in the U.S. AI sector, China’s AI advances are continuing apace without a hitch.
Earlier today, e‑commerce giant Alibaba’s Qwen Team of AI researchers—focused primarily on developing and releasing a growing family of powerful, open‑source language and multimodal models—unveiled its newest batch: the Qwen 3.5 Small Model Series.
Model lineup
| Model | Size | Primary focus |
|---|---|---|
| Qwen 3.5‑0.8B | 0.8 B parameters | Tiny, fast performance; prototyping & edge deployment where battery life is paramount |
| Qwen 3.5‑2B | 2 B parameters | Same as above – optimized for “tiny” devices |
| Qwen 3.5‑4B | 4 B parameters | Strong multimodal base for lightweight agents; native 262,144‑token context window |
| Qwen 3.5‑9B | 9 B parameters | Compact reasoning model that outperforms the 13.5× larger U.S. rival OpenAI’s open‑source gpt‑oss‑120B on key third‑party benchmarks (multilingual knowledge, graduate‑level reasoning) |
Context: These models sit at the low‑end of today’s general‑purpose model spectrum—comparable to MIT off‑shoot LiquidAI’s LFM2 series—while flagship models from OpenAI, Anthropic, and Google’s Gemini series run in the trillion‑parameter regime.
The weights are available globally now under the Apache 2.0 license (perfect for enterprise and commercial use, including customization) on Hugging Face and ModelScope.
Technology: Hybrid efficiency & native multimodality
-
Efficient Hybrid Architecture – combines Gated‑Delta Networks (a form of linear attention) with sparse Mixture‑of‑Experts (MoE).
- Tackles the “memory wall” that typically limits small models.
- Delivers higher throughput and significantly lower inference latency.
-
Native multimodality – unlike earlier generations that “bolted on” a vision encoder, Qwen 3.5 was trained with early‑fusion multimodal tokens.
- The 4B and 9B variants can read UI elements, count objects in video, and perform visual reasoning that previously required models 10× larger.
Benchmark performance: “Small” models that defy scale
| Benchmark | Model | Score | Competitor (Score) |
|---|---|---|---|
| MMMU‑Pro (visual reasoning) | Qwen 3.5‑9B | 70.1 | Gemini 2.5 Flash‑Lite (59.7), Qwen 3‑VL‑30B‑A3B (63.0) |
| GPQA Diamond (graduate‑level reasoning) | Qwen 3.5‑9B | 81.7 | gpt‑oss‑120B (80.1) |
| Video‑MME (with subtitles) | Qwen 3.5‑9B | 84.5 | Gemini 2.5 Flash‑Lite (74.6) |
| Qwen 3.5‑4B | 83.5 | — | |
| HMMT Feb 2025 (STEM math) | Qwen 3.5‑9B | 83.2 | — |
| Qwen 3.5‑4B | 74.0 | — | |
| OmniDocBench v1.5 (document recognition) | Qwen 3.5‑9B | 87.7 | — |
| MMMLU (multilingual knowledge) | Qwen 3.5‑9B | 81.2 | gpt‑oss‑120B (78.2) |
The 9B model consistently matches or exceeds much larger competitors across multimodal, reasoning, video, mathematical, and multilingual tasks.
Community reactions: “More intelligence, less compute”
The announcement followed last week’s release of the Qwen 3.5‑Medium (single‑GPU runnable) and sparked immediate excitement among “local‑first” AI developers.
Paul Couvert, AI & tech educator (Blueshell AI) – X:
“How is this even possible?! Qwen has released 4 new models and the 4B version is almost as capable as the previous 80B A3B one. And the 9B is as good as GPT‑OSS 120B while being 13× smaller!”
Couvert’s take‑aways:
- “They can run on any laptop.”
- “0.8B and 2B for your phone.”
- “Offline and open source.”
Karan Kendre, Kargul Studio:
“These models [can run] locally on my M1 MacBook Air for free.”
Xenova, Hugging Face developer:
“The new Qwen 3.5 Small Model series can even run directly in a user’s web browser, handling sophisticated tasks like video analysis.”
Why the Base models matter
- Provide a blank‑slate foundation without RLHF or SFT bias, which can otherwise cause “refusals” or overly specific conversational styles.
- Enable real‑world industrial innovation for enterprises and research teams that need full control over downstream fine‑tuning.
Quick links
-
Model weights & code:
- Hugging Face – Qwen 3.5 Small Series
- ModelScope – Qwen 3.5 Small Series
-
License: Apache 2.0 (commercial‑friendly)
-
Documentation & tutorials: (to be added by the Qwen Team)
The Qwen 3.5 Small Model Series demonstrates that high‑quality AI can be both compact and accessible, opening the door for broader deployment on edge devices, browsers, and offline environments.
Qwen 3.5 Series – Open‑Source, Small‑Scale, Agentic Models
A More Accessible Starting Point
For developers who want to customize a model for specific tasks, the Qwen 3.5 Small series offers an easier entry. You can now apply your own instruction‑tuning and post‑training without having to strip away Alibaba’s original components.
Licensing – A Win for the Open Ecosystem
Alibaba has released the weights and configuration files for the Qwen 3.5 series under the Apache 2.0 license. This permissive license enables:
| Permission | What It Means |
|---|---|
| Commercial use | Integrate the models into commercial products royalty‑free. |
| Modification | Fine‑tune (SFT) or apply RLHF to create specialized versions. |
| Distribution | Redistribute the models in local‑first AI applications (e.g., Ollama). |
Contextualising the News – Why “Small” Matters Right Now
The release arrives during a period of “Agentic Realignment.” We have moved beyond simple chatbots; the goal now is autonomous agents that can:
- Think – perform reasoning.
- See – handle multimodal inputs.
- Act – use tools and APIs.
Running these loops on trillion‑parameter models is prohibitively expensive, but a local Qwen 3.5‑9B can achieve comparable functionality for a fraction of the cost.
- By scaling Reinforcement Learning (RL) across million‑agent environments, Alibaba has endowed these small models with human‑aligned judgment, enabling multi‑step objectives such as:
- Organising a desktop.
- Reverse‑engineering gameplay footage into code.
Whether it’s a 0.8 B model on a smartphone or a 9 B model powering a coding terminal, the Qwen 3.5 series is democratising the agentic era.
The shift from “chatbits” to native multimodal agents lets enterprises move sophisticated reasoning to the edge (individual devices and local servers), automating tasks that previously required costly cloud APIs or high‑latency processing.
Strategic Enterprise Applications & Considerations
The 0.8 B – 9 B models are re‑engineered for efficiency, using a hybrid architecture that activates only the necessary parts of the network for each task.
| Application Area | Capability | Example Use‑Case |
|---|---|---|
| Visual Workflow Automation | Pixel‑level grounding | Navigate desktop/mobile UIs, fill out forms, organise files via natural‑language instructions. |
| Complex Document Parsing | > 90 % benchmark scores on document understanding | Replace separate OCR + layout pipelines; extract structured data from diverse forms and charts. |
| Autonomous Coding & Refactoring | 1 M‑token context window | Feed entire repositories (up to 400 k lines of code) for production‑ready refactors or automated debugging. |
| Real‑Time Edge Analysis | Optimised for mobile | Offline video summarisation (≤ 60 s at 8 FPS) and spatial reasoning without draining battery. |
Enterprise Functions That Benefit Most from Local, Small‑Model Deployment
| Function | Primary Benefit | Key Use Case |
|---|---|---|
| Software Engineering | Local code intelligence | Repository‑wide refactoring and terminal‑based agentic coding. |
| Operations & IT | Secure automation | Automating multi‑step system settings and file‑management tasks locally. |
| Product & UX | Edge interaction | Integrating native multimodal reasoning directly into mobile/desktop apps. |
| Data & Analytics | Efficient extraction | High‑fidelity OCR and structured data extraction from complex visual reports. |
Operational “Flags” to Monitor
| Issue | Description |
|---|---|
| Hallucination Cascade | In multi‑step agentic workflows, a small early error can trigger a cascade of failures, leading the agent down an incorrect or nonsensical plan. |
| Debugging vs. Greenfield Coding | Models excel at writing new (“greenfield”) code but may struggle with debugging or modifying existing, complex legacy systems. |
| Memory & VRAM Demands | Even “small” models (e.g., 9 B) need significant VRAM for high‑throughput inference; the total parameter count still occupies considerable GPU space. |
| Regulatory & Data Residency | Using models from a China‑based provider may raise data‑residency concerns in certain jurisdictions, though the Apache 2.0 open‑weight version can be hosted on sovereign local clouds. |
Recommendations
- Prioritise verifiable tasks—coding, math, or instruction following—where outputs can be automatically checked against predefined rules.
- Implement monitoring to catch early‑step errors and prevent reward‑hacking or silent failures.
The Qwen 3.5 Small series offers a powerful, open‑source foundation for building the next generation of edge‑centric, autonomous AI agents.