[Paper] A Comprehensive Anatomy of Human and DeepSeek-R1 LLM Mathematical Reasoning

Published: (June 5, 2026 at 11:57 AM EDT)
2 min read
Source: arXiv

Source: arXiv - 2606.07410v1

Overview

The emergence of “Aha moments” in large language models, particularly DeepSeek-R1-0120, has raised the question of whether these systems genuinely reason or merely imitate the appearance of reasoning. We conduct a comprehensive empirical comparison between model and human reasoning across all 30 problems from AIME 2025, exhaustively annotating 10,247 reasoning steps into five functional categories: Analysis, Inference, Branch, Backtrace, and Reflection. We find a clear structural difference. Human solutions maintain a compact alternation between analysis and deduction, whereas DeepSeek-R1 frequently revisits intermediate results, performs shallow and often unnecessary verification, and loops through local checks without meaningful logical progress. We describe this as topological mimicry: reproducing the surface form of reasoning without its functional role. Despite this, we identify two signals of genuine reasoning. First, successful traces exhibit stable use of branching and backtracking, while failed traces either underuse or overuse exploratory actions. Second, reflection is only effective when placed within deductive inference; reflections trapped in analysis loops focus on local numerical details while missing global logical errors. These findings suggest that current long-CoT models may be rewarded more for the appearance of reasoning than for genuine deductive progress. We discuss directions for improving evaluation and training, including measuring cross-trace stability, penalising “spinning-wheel” traces, encouraging deeper logical correction, and reallocating inference-time compute toward deduction and backtracking. Overall, reasoning quality depends not simply on how much reflection occurs, but on whether reflection appears consistently and at the appropriate logical scale.

Key Contributions

This paper presents research in the following areas:

  • cs.LG
  • cs.AI

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.LG.

Authors

  • Yuxiang Chen
  • Jun Wang

Paper Information

  • arXiv ID: 2606.07410v1
  • Categories: cs.LG, cs.AI
  • Published: June 5, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »