[Paper] Divergent-Convergent Thinking in Large Language Models for Creative Problem Generation
Source: arXiv - 2512.23601v1
Overview
The paper introduces CreativeDC, a novel prompting technique that steers large language models (LLMs) through a divergent‑then‑convergent thinking cycle when they generate educational problems. By explicitly separating free‑form idea exploration from the final constraint‑checking step, the authors show that LLMs can produce a richer, more varied set of questions without sacrificing quality—addressing the “Artificial Hivemind” tendency of models to churn out repetitive content.
Key Contributions
- CreativeDC prompting framework: a two‑phase recipe (divergent exploration → convergent refinement) that can be applied to any off‑the‑shelf LLM.
- Quantitative diversity & novelty metrics: a comprehensive evaluation suite that measures how different, unexpected, and useful the generated problems are.
- Empirical validation: experiments on multiple LLMs demonstrate that CreativeDC boosts diversity and novelty by large margins while keeping utility on par with standard baselines.
- Scaling analysis: shows that as more samples are drawn, CreativeDC’s “effective number of distinct problems” grows faster than baseline methods, indicating better coverage of the problem space.
Methodology
-
Prompt Design – Divergent Phase
- The model receives a prompt that encourages free‑thinking: “List as many distinct ways to ask about X, without worrying about correctness.”
- No hard constraints are imposed, so the LLM can wander into unconventional angles, analogies, or contexts.
-
Prompt Design – Convergent Phase
- The raw ideas from the first phase are fed back into a second prompt that asks the model to select and polish the most promising candidates while satisfying explicit problem‑generation constraints (e.g., solvable, appropriate difficulty).
-
Implementation
- The two prompts are chained programmatically; the output of the divergent stage becomes the input for the convergent stage.
- The approach works with any decoder‑only LLM (GPT‑3.5, LLaMA, etc.) and does not require fine‑tuning.
-
Evaluation Suite
- Diversity: pairwise semantic distance and lexical variety across generated problems.
- Novelty: comparison against a large corpus of existing textbook questions.
- Utility: human expert rating of pedagogical soundness and answerability.
Results & Findings
| Metric | Baseline (single‑prompt) | CreativeDC |
|---|---|---|
| Diversity (Avg. pairwise cosine) | 0.42 | 0.68 |
| Novelty (unique concepts %) | 31 % | 57 % |
| Utility (expert rating /5) | 4.2 | 4.1 |
| Effective distinct problems (1000 samples) | 210 | 398 |
- Higher diversity & novelty: CreativeDC’s divergent stage injects a broader set of concepts, which survive the convergent filter.
- Utility stays high: The convergent stage successfully weeds out incoherent or unsolvable ideas, keeping pedagogical quality comparable to the baseline.
- Scalability: When sampling more problems, the growth curve for distinct items under CreativeDC outpaces the baseline, suggesting better “coverage” of the creative space.
Practical Implications
- Curriculum designers can auto‑generate large banks of varied practice questions, reducing manual authoring effort while ensuring students see multiple perspectives on a topic.
- Adaptive learning platforms can pull from a more diverse pool to personalize problem sets, mitigating the risk of students encountering the same pattern repeatedly.
- Assessment creation tools can integrate CreativeDC to propose novel distractors or alternative problem statements, enriching multiple‑choice items and open‑ended tasks.
- Beyond education: any domain that needs creative content—e.g., brainstorming product ideas, generating test cases for software, or drafting interview questions—can adopt the divergent‑convergent prompting recipe to break out of the “hivemind” mode of LLMs.
Limitations & Future Work
- Prompt sensitivity: The quality of the divergent ideas depends heavily on how the first prompt is phrased; poorly worded prompts can still lead to low‑quality noise.
- Computational overhead: Running two inference passes per problem roughly doubles latency, which may be a bottleneck for real‑time applications.
- Domain specificity: The study focuses on math/physics educational problems; extending to highly specialized fields (e.g., law, medicine) may require domain‑specific constraint engineering.
Future Directions
- Automating prompt optimization via meta‑learning.
- Exploring multi‑stage (more than two) pipelines that interleave divergent and convergent loops.
- Integrating human‑in‑the‑loop feedback to further refine the convergent filtering step.
Authors
- Manh Hung Nguyen
- Adish Singla
Paper Information
- arXiv ID: 2512.23601v1
- Categories: cs.AI
- Published: December 29, 2025
- PDF: Download PDF