[Paper] How reliable are LLMs when it comes to playing dice?

Published: 5 days ago (June 5, 2026 at 01:59 PM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.07515v1

Overview

We investigate the probabilistic reasoning capabilities of large language models through a controlled benchmarking study on discrete probability problems. We constructed two datasets, respectively a set of standard exercises and a set of counterintuitive exercises, designed to trigger heuristic reasoning, and evaluated 8 state-of-the-art models, each tested with and without Chain-of-Thought prompting. Models achieve an average accuracy of 0.96 on standard problems but only 0.59 on counterintuitive ones. We further provide empirical evidence of token bias: performance drops by over 20% when canonical formulations are replaced by disguised variants. Embedding misleading suggestions in the prompt reduces performance by up to 34%, with no model proving immune. Taken together, the reported findings suggest that current LLMs are not yet genuine probabilistic reasoners, despite their success in advanced mathematical problems.

Key Contributions

This paper presents research in the following areas:

cs.CL
cs.AI
cs.HC
math.PR

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.CL.

Authors

Luca Avena
Gianmarco Bet
Bernardo Busoni

Paper Information

arXiv ID: 2606.07515v1
Categories: cs.CL, cs.AI, cs.HC, math.PR
Published: June 5, 2026
PDF: Download PDF

[Paper] How reliable are LLMs when it comes to playing dice?

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

[Paper] Supervision versus Demonstration-Based In-Context Learning for Multiword Expression Classification

[Paper] TEVI: Text-Conditioned Editing of Visual Representations via Sparse Autoencoders for Improved Vision-Language Alignment

[Paper] The Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMs