[Paper] The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

Published: 1 month ago (January 2, 2026 at 12:10 PM EST)

3 min read

Source: arXiv

Source: arXiv - 2601.00747v1

Overview

The paper The Reasoning‑Creativity Trade‑off: Toward Creativity‑Driven Problem Solving examines why modern large‑language‑model (LLM) pipelines that repeatedly “sample‑think‑refine” tend to lose creative spark as they chase correctness. By framing reasoning as a probability distribution over solution traces, the authors propose a unified variational objective—Distributional Creative Reasoning (DCR)—that simultaneously preserves answer quality and semantic diversity.

Key Contributions

Unified theoretical framework (DCR): Shows that popular methods (STaR, GRPO, DPO, entropy bonuses, etc.) are special cases of a single variational loss over reasoning‑path distributions.
Diversity Decay Theorem: Formal proof that correctness‑centric objectives inevitably contract the entropy of reasoning paths, with distinct decay patterns for STaR, GRPO, and DPO.
Stability‑Diversity design recipe: Practical algorithmic tweaks (e.g., entropy‑regularized gradient flow, adaptive temperature scaling) that guarantee convergence to a policy that is both accurate and diverse.
Empirical validation: Benchmarks on creative reasoning tasks (puzzle solving, open‑ended code generation, story continuation) demonstrate that DCR‑enhanced models retain higher semantic entropy while matching or exceeding baseline accuracy.

Methodology

Trace‑level modeling: Each reasoning episode is represented as a trace—the ordered sequence of intermediate tokens or “thought steps.” The model’s policy defines a probability measure over all possible traces.
Variational objective: DCR minimizes a KL‑type divergence between the model’s trace distribution and a target distribution that balances two forces:
- Correctness pressure (rewarding high‑scoring traces).
- Creativity pressure (entropy bonus encouraging spread across diverse traces).
Gradient flow on measures: By treating the trace distribution as a continuous object, the authors derive a gradient‑flow update that can be implemented with standard back‑propagation plus a few extra terms (entropy gradient, adaptive temperature).
Special‑case mapping: They mathematically show that setting the creativity weight to zero recovers STaR/GRPO/DPO, while adding a constant entropy term reproduces existing entropy‑bonus tricks.

Results & Findings

Setting	Accuracy (↑)	Semantic Entropy (↑)	Diversity Score*
Baseline STaR	84.2 %	1.31 bits	0.42
GRPO (no entropy)	85.0 %	1.08 bits	0.35
DPO (reward‑only)	84.7 %	0.97 bits	0.31
DCR (proposed)	85.3 %	2.04 bits	0.58

*Diversity Score = normalized pairwise trace‑distance.

Key takeaways

Correctness is preserved – DCR matches or slightly exceeds the best baseline accuracy.
Semantic entropy more than doubles, indicating a richer set of reasoning paths.
Human evaluation on open‑ended code generation shows a 23 % increase in “novel yet functional” solutions.

Practical Implications

Developer‑centric toolchains: Integrating DCR into existing “self‑refine” pipelines (e.g., OpenAI’s function_call loops, LangChain agents) can yield assistants that propose multiple viable strategies instead of converging on a single “safe” answer.
Creative coding & debugging: For code‑generation models, higher trace diversity translates into alternative algorithmic approaches, aiding developers who need to explore trade‑offs (performance vs. readability).
Product design & ideation: LLM‑powered brainstorming bots can maintain a steady flow of unconventional suggestions without sacrificing factual correctness, improving user engagement.
Safety & alignment: By preventing mode collapse, DCR reduces the risk of over‑optimizing toward narrow reward proxies, a known source of unintended behavior.

Limitations & Future Work

Computational overhead: Estimating entropy gradients adds ~15 % runtime compared with vanilla STaR; scaling to multi‑billion‑parameter models may require approximation tricks.
Task scope: Experiments focus on reasoning‑heavy benchmarks; the benefits for short‑answer QA or pure classification tasks remain unclear.
Hyper‑parameter sensitivity: The trade‑off weight between correctness and creativity needs careful tuning per domain; automated scheduling is an open problem.
Future directions: The authors suggest (i) hierarchical trace representations to further reduce cost, (ii) curriculum‑style annealing of the creativity term, and (iii) extending DCR to multimodal reasoning (e.g., vision‑language agents).

Authors

Max Ruiz Luyten
Mihaela van der Schaar

Paper Information

arXiv ID: 2601.00747v1
Categories: cs.LG
Published: January 2, 2026
PDF: Download PDF

[Paper] The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Two Deep Learning Approaches for Automated Segmentation of Left Ventricle in Cine Cardiac MRI

[Paper] Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning

[Paper] FedHypeVAE: Federated Learning with Hypernetwork Generated Conditional VAEs for Differentially Private Embedding Sharing

[Paper] Categorical Reparameterization with Denoising Diffusion models