[Paper] DreamReasoner-8B: Block-Size Curriculum Learning for Diffusion Reasoning Models

Published: 1 day ago (June 17, 2026 at 12:34 PM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.19257v1

Overview

Block diffusion language models accelerate decoding through parallel block-wise denoising, yet whether they can be reliably scaled for long chain-of-thought (CoT) reasoning remains unresolved. To this end, we develop DreamReasoner-8B, an open-source block diffusion reasoning model, and conduct a systematic study of how training and inference block sizes affect long-CoT reasoning. Our analysis reveals a stark performance disparity: training with large block sizes yields remarkably poor reasoning, whereas small block sizes preserve effective reasoning. To bridge this granularity gap, we propose block-size curriculum learning, which gradually transitions training from fine-grained to coarse-grained block sizes, thereby overcoming this limitation and enabling strong reasoning performance that generalizes across diverse inference block sizes. On mathematical and code reasoning benchmarks, DreamReasoner-8B achieves results competitive with leading open autoregressive models such as Qwen3-8B. This work establishes a practical foundation for efficient, reasoning-capable diffusion language models. We release our model at https://github.com/DreamLM/DreamReasoner.

Key Contributions

This paper presents research in the following areas:

cs.CL

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.CL.

Authors

Zirui Wu
Lin Zheng
Jiacheng Ye
Shansan Gong
Xueliang Zhao
Yansong Feng
Wei Bi
Lingpeng Kong

Paper Information

arXiv ID: 2606.19257v1
Categories: cs.CL
Published: June 17, 2026
PDF: Download PDF

[Paper] DreamReasoner-8B: Block-Size Curriculum Learning for Diffusion Reasoning Models

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] Native Active Perception as Reasoning for Omni-Modal Understanding

[Paper] Learning User Simulators with Turing Rewards

[Paper] Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

[Paper] Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation