[Paper] Causally Evaluating the Learnability of Formal Language Tasks

Published: 3 days ago (June 8, 2026 at 01:58 PM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.09822v1

Overview

Language models, as multi-task learners, acquire a wide range of abilities during training. A fundamental question is how much task-specific data is needed to learn a given task. Answering this for natural language is difficult: tasks are hard to delineate and can confound one another. To rigorously investigate the relationship between data frequency and learnability, we turn to a controlled setting using formal languages induced from probabilistic finite automata. These serve as a methodological testbed to demonstrate that standard correlational evaluation practices are inherently flawed. To enable causal analysis, we introduce the binning semiring, an algebraic object that lets us control how often a targeted property occurs in a sampled corpus. We formulate the experimental pipeline as a causal graphical model and derive decomposed Kullback-Leibler divergence metrics to measure the learnability of specific sub-tasks. Our experiments show that evaluating learnability without causal intervention leads to incorrect conclusions due to confounders in correlational analysis, and serve as a warning about correlational pitfalls in natural-language settings.

Key Contributions

This paper presents research in the following areas:

cs.CL
cs.FL

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.CL.

Authors

Vésteinn Snæbjarnarson
Anej Svete
Josef Valvoda
Reda Boumasmoud
Brian DuSell
Ryan Cotterell

Paper Information

arXiv ID: 2606.09822v1
Categories: cs.CL, cs.FL
Published: June 8, 2026
PDF: Download PDF

[Paper] Causally Evaluating the Learnability of Formal Language Tasks

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] Context-Driven Incremental Compression for Multi-Turn Dialogue Generation

[Paper] Doc-to-Atom: Learning to Compile and Compose Memory Atoms

[Paper] Redesign Mixture-of-Experts Routers with Manifold Power Iteration

[Paper] System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5