Dec 19, 2025 | The Tongyi Weekly: Your weekly dose of cutting-edge AI from Tongyi Lab

Published: 1 month ago (December 19, 2025 at 02:47 AM EST)

5 min read

Source: Dev.to

Hello, creators and builders,

This week was a harvest of breakthroughs in voice and video AI. From Wan2.6 — our cinematic multimodal generation model that brings characters to life with consistent appearance, voice, and cinematic storytelling — to Fun‑ASR and Fun‑CosyVoice 3, our speech models now available with open‑source versions, the future of expressive AI has never felt closer.

Let’s dive in.

👉 Subscribe to The Tongyi Weekly and never miss a release
Subscribe Now →

📣 Model Release & Updates

Introducing Wan2.6 – The Cinematic Multimodal Generation Model

Starring – Cast characters from reference videos into new scenes. Supports human or human‑like figures, enabling complex multi‑person and human‑object interactions with appearance and voice consistency.
Intelligent Multi‑shot Narrative – Turn simple prompts into auto‑storyboarded, multi‑shot videos. Maintains visual consistency and upgrades storytelling from single shots to rich narratives.
Native A/V Sync – Generate multi‑speaker dialogue with natural lip‑sync and studio‑quality audio. It doesn’t just look real – it sounds real.
Cinematic Quality – 15 s 1080p HD generation with comprehensive upgrades to instruction adherence, motion physics, and aesthetic control.
Advanced Image Synthesis & Editing – Deliver cinematic photorealism with precise control over lens and lighting. Supports multi‑image referencing for commercial‑grade consistency and faithful aesthetic transfer.
Storytelling with Structure – Generate interleaved texts and images powered by real‑world knowledge and reasoning capabilities, enabling hierarchical and structured visual narratives.

🔗 Try Wan 2.6 yourself – 150 free credits every day!
🔗 API Documentation

Fun‑ASR Upgrade – Noise‑Robust, Multilingual, Customizable ASR

We’re thrilled to unveil the newest evolution of Fun‑ASR, our enterprise‑grade end‑to‑end Automatic Speech Recognition model – now more noise‑robust, more multilingual, and more customizable than ever. We’re also releasing the lightweight Fun‑ASR‑Nano (0.8 B) model as open source.

Major Upgrades in Fun‑ASR

93 % accuracy in real‑world noisy environments such as conferences, metro stations, and in‑car speech.
Lyric recognition breakthrough – accurately transcribes vocals even with strong background music or rap‑style delivery.
31 languages supported, with enhanced performance for East Asian & Southeast Asian languages (e.g., Japanese, Vietnamese).
7 major Chinese dialect groups and 26 regional accents covered with high precision.
RAG‑based customization – hotword limit raised from 1 000 to 10 000 without compromising accuracy.

Fun‑ASR‑Nano (0.8 B) – Open Source

Lightweight yet highly noise‑resistant, optimized for compute‑sensitive scenarios, edge devices, and low‑latency real‑time recognition.

🔗 Now available on:

Fun‑CosyVoice 3 – The Next‑Generation Text‑to‑Speech Model

Fun‑CosyVoice 3 is now faster, more expressive, and officially open‑sourced.

What’s New

50 % lower first‑token latency with full bidirectional streaming TTS, enabling true real‑time “type‑to‑speech” experiences.
Improved Chinese–English code‑switching – WER reduced by 56.4 %.
Enhanced zero‑shot voice cloning – replicate a voice using only 3 s of audio, with better consistency and emotion control.
30+ timbres, 9 languages, 18 Chinese dialect accents, and 9 emotion styles, plus cross‑lingual voice cloning capability.
Benchmark gains – 26 % relative reduction in character error rate (CER) on challenging test‑hard scenarios; several metrics approach human‑recorded speech quality.

Fun‑CosyVoice 3 (0.5 B) – Open Source

A lightweight 0.5 B‑parameter version with zero‑shot voice cloning and local deployment support, outperforming popular open‑source TTS models across evaluated metrics.

🔗 Explore & Download

Qwen Code v0.5.0 – Smarter AI Coding Assistant

What’s New

VSCode Integration – Bundled CLI in the VSCode release package with improved cross‑platform compatibility.
Native TypeScript SDK – Seamlessly integrate with Node/TS projects.
Smart Session Management – Auto‑save and continue conversations.
Support for OpenAI‑compatible reasoning models (e.g., DeepSeek V3.2, Kimi‑K2, and more).
Custom tool control via SDK‑hosted servers.
Russian language support – Internationalization with a Russian UI option.
Enhanced UX – Terminal bell for audio notifications and a session‑resume command display.
Testing & Stability – Numerous bug fixes and stability improvements.

(The release notes continue beyond this excerpt.)

🚀 New Release Highlights

Ubuntu shell support
Faster SDK timeouts
Rock‑solid test stability

Get started in Terminal

npm install -g @qwen-code/qwen-code

🔗 Full changelog

✨ Community Spotlights

Children’s Storytelling: COOLKIDS LoRA – by Clumsy_Trainer

This Z‑Image‑Turbo LoRA captures the whimsy, warmth, and visual charm of children’s illustration — perfect for picture books, educational content, or animated shorts. The generations feel like pages from a beloved storybook.

👉 Try it here

Portrait Polisher: AWPortrait‑Z – by Shakker‑Labs

AWPortrait‑Z is a native noise‑reduction LoRA that polishes Z‑Image’s portrait capabilities. From “relit” lighting to authentic skin texture, it is a massive quality‑of‑life upgrade for character generation.

👉 Try it here

Z‑Image Workflow Masterpiece – by luneva

This workflow generates pixel‑level realistic details for both foregrounds and backgrounds at incredible speeds. No brute force, no upscaling needed—just pure, high‑density realism. A must‑try for the community.

👉 Try it here

🔥 Upcoming Events

WAN MUSE+ Season 3 “IN CHARACTER” – Now Live

We’re thrilled to launch WAN MUSE+ Season 3: “IN CHARACTER” — a global creative challenge inviting you to explore identity, narrative, and AI expression.

Prize pool: Up to $14,000
Award categories:
- Best Narrative
- Best Animated Short
- Best Visual
- Best PSA
- Nomination & Special Inspiration Awards

How to enter

Post on TikTok, Instagram, X, or YouTube.
Use the hashtags #incharacter #wanmuse #wan.

AIGC platforms: SeaArt.Ai, WaveSpeedAI, Tensor.Art

🔗 Full details

📬 Want More? Stay Updated

Every week we bring you:

New model releases & upgrades
AI research breakthroughs
Open‑source tools you can use today
Community highlights that inspire

👉 Subscribe to The Tongyi Weekly and never miss a release.

Subscribe now →

About Tongyi Lab

Tongyi Lab is a research institution under Alibaba Group dedicated to artificial intelligence and foundation models. We focus on the research, development, and innovative applications of AI across diverse domains, including large language models (LLMs), multimodal understanding and generation, visual AIGC, speech technologies, and more.

Dec 19, 2025 | The Tongyi Weekly: Your weekly dose of cutting-edge AI from Tongyi Lab

📣 Model Release & Updates

Introducing Wan2.6 – The Cinematic Multimodal Generation Model