Dec 19, 2025 | The Tongyi Weekly: Your weekly dose of cutting-edge AI from Tongyi Lab
Source: Dev.to
Hello, creators and builders,
This week was a harvest of breakthroughs in voice and video AI. From Wan2.6 — our cinematic multimodal generation model that brings characters to life with consistent appearance, voice, and cinematic storytelling — to Fun‑ASR and Fun‑CosyVoice 3, our speech models now available with open‑source versions, the future of expressive AI has never felt closer.
Let’s dive in.
👉 Subscribe to The Tongyi Weekly and never miss a release
Subscribe Now →
📣 Model Release & Updates
Introducing Wan2.6 – The Cinematic Multimodal Generation Model
- Starring – Cast characters from reference videos into new scenes. Supports human or human‑like figures, enabling complex multi‑person and human‑object interactions with appearance and voice consistency.
- Intelligent Multi‑shot Narrative – Turn simple prompts into auto‑storyboarded, multi‑shot videos. Maintains visual consistency and upgrades storytelling from single shots to rich narratives.
- Native A/V Sync – Generate multi‑speaker dialogue with natural lip‑sync and studio‑quality audio. It doesn’t just look real – it sounds real.
- Cinematic Quality – 15 s 1080p HD generation with comprehensive upgrades to instruction adherence, motion physics, and aesthetic control.
- Advanced Image Synthesis & Editing – Deliver cinematic photorealism with precise control over lens and lighting. Supports multi‑image referencing for commercial‑grade consistency and faithful aesthetic transfer.
- Storytelling with Structure – Generate interleaved texts and images powered by real‑world knowledge and reasoning capabilities, enabling hierarchical and structured visual narratives.
🔗 Try Wan 2.6 yourself – 150 free credits every day!
🔗 API Documentation
Fun‑ASR Upgrade – Noise‑Robust, Multilingual, Customizable ASR
We’re thrilled to unveil the newest evolution of Fun‑ASR, our enterprise‑grade end‑to‑end Automatic Speech Recognition model – now more noise‑robust, more multilingual, and more customizable than ever. We’re also releasing the lightweight Fun‑ASR‑Nano (0.8 B) model as open source.
Major Upgrades in Fun‑ASR
- 93 % accuracy in real‑world noisy environments such as conferences, metro stations, and in‑car speech.
- Lyric recognition breakthrough – accurately transcribes vocals even with strong background music or rap‑style delivery.
- 31 languages supported, with enhanced performance for East Asian & Southeast Asian languages (e.g., Japanese, Vietnamese).
- 7 major Chinese dialect groups and 26 regional accents covered with high precision.
- RAG‑based customization – hotword limit raised from 1 000 to 10 000 without compromising accuracy.
Fun‑ASR‑Nano (0.8 B) – Open Source
Lightweight yet highly noise‑resistant, optimized for compute‑sensitive scenarios, edge devices, and low‑latency real‑time recognition.
🔗 Now available on:
Fun‑CosyVoice 3 – The Next‑Generation Text‑to‑Speech Model
Fun‑CosyVoice 3 is now faster, more expressive, and officially open‑sourced.
What’s New
- 50 % lower first‑token latency with full bidirectional streaming TTS, enabling true real‑time “type‑to‑speech” experiences.
- Improved Chinese–English code‑switching – WER reduced by 56.4 %.
- Enhanced zero‑shot voice cloning – replicate a voice using only 3 s of audio, with better consistency and emotion control.
- 30+ timbres, 9 languages, 18 Chinese dialect accents, and 9 emotion styles, plus cross‑lingual voice cloning capability.
- Benchmark gains – 26 % relative reduction in character error rate (CER) on challenging test‑hard scenarios; several metrics approach human‑recorded speech quality.
Fun‑CosyVoice 3 (0.5 B) – Open Source
A lightweight 0.5 B‑parameter version with zero‑shot voice cloning and local deployment support, outperforming popular open‑source TTS models across evaluated metrics.
🔗 Explore & Download
Qwen Code v0.5.0 – Smarter AI Coding Assistant
What’s New
- VSCode Integration – Bundled CLI in the VSCode release package with improved cross‑platform compatibility.
- Native TypeScript SDK – Seamlessly integrate with Node/TS projects.
- Smart Session Management – Auto‑save and continue conversations.
- Support for OpenAI‑compatible reasoning models (e.g., DeepSeek V3.2, Kimi‑K2, and more).
- Custom tool control via SDK‑hosted servers.
- Russian language support – Internationalization with a Russian UI option.
- Enhanced UX – Terminal bell for audio notifications and a session‑resume command display.
- Testing & Stability – Numerous bug fixes and stability improvements.
(The release notes continue beyond this excerpt.)
🚀 New Release Highlights
- Ubuntu shell support
- Faster SDK timeouts
- Rock‑solid test stability
Get started in Terminal
npm install -g @qwen-code/qwen-code
✨ Community Spotlights
Children’s Storytelling: COOLKIDS LoRA – by Clumsy_Trainer
This Z‑Image‑Turbo LoRA captures the whimsy, warmth, and visual charm of children’s illustration — perfect for picture books, educational content, or animated shorts. The generations feel like pages from a beloved storybook.
Portrait Polisher: AWPortrait‑Z – by Shakker‑Labs
AWPortrait‑Z is a native noise‑reduction LoRA that polishes Z‑Image’s portrait capabilities. From “relit” lighting to authentic skin texture, it is a massive quality‑of‑life upgrade for character generation.
Z‑Image Workflow Masterpiece – by luneva
This workflow generates pixel‑level realistic details for both foregrounds and backgrounds at incredible speeds. No brute force, no upscaling needed—just pure, high‑density realism. A must‑try for the community.
🔥 Upcoming Events
WAN MUSE+ Season 3 “IN CHARACTER” – Now Live
We’re thrilled to launch WAN MUSE+ Season 3: “IN CHARACTER” — a global creative challenge inviting you to explore identity, narrative, and AI expression.
- Prize pool: Up to $14,000
- Award categories:
- Best Narrative
- Best Animated Short
- Best Visual
- Best PSA
- Nomination & Special Inspiration Awards
How to enter
- Post on TikTok, Instagram, X, or YouTube.
- Use the hashtags
#incharacter #wanmuse #wan.
AIGC platforms: SeaArt.Ai, WaveSpeedAI, Tensor.Art
📬 Want More? Stay Updated
Every week we bring you:
- New model releases & upgrades
- AI research breakthroughs
- Open‑source tools you can use today
- Community highlights that inspire
👉 Subscribe to The Tongyi Weekly and never miss a release.
Subscribe now →
About Tongyi Lab
Tongyi Lab is a research institution under Alibaba Group dedicated to artificial intelligence and foundation models. We focus on the research, development, and innovative applications of AI across diverse domains, including large language models (LLMs), multimodal understanding and generation, visual AIGC, speech technologies, and more.


