[Paper] Causally Evaluating the Learnability of Formal Language Tasks
Source: arXiv - 2606.09822v1
Overview
Language models, as multi-task learners, acquire a wide range of abilities during training. A fundamental question is how much task-specific data is needed to learn a given task. Answering this for natural language is difficult: tasks are hard to delineate and can confound one another. To rigorously investigate the relationship between data frequency and learnability, we turn to a controlled setting using formal languages induced from probabilistic finite automata. These serve as a methodological testbed to demonstrate that standard correlational evaluation practices are inherently flawed. To enable causal analysis, we introduce the binning semiring, an algebraic object that lets us control how often a targeted property occurs in a sampled corpus. We formulate the experimental pipeline as a causal graphical model and derive decomposed Kullback-Leibler divergence metrics to measure the learnability of specific sub-tasks. Our experiments show that evaluating learnability without causal intervention leads to incorrect conclusions due to confounders in correlational analysis, and serve as a warning about correlational pitfalls in natural-language settings.
Key Contributions
This paper presents research in the following areas:
- cs.CL
- cs.FL
Methodology
Please refer to the full paper for detailed methodology.
Practical Implications
This research contributes to the advancement of cs.CL.
Authors
- Vésteinn Snæbjarnarson
- Anej Svete
- Josef Valvoda
- Reda Boumasmoud
- Brian DuSell
- Ryan Cotterell
Paper Information
- arXiv ID: 2606.09822v1
- Categories: cs.CL, cs.FL
- Published: June 8, 2026
- PDF: Download PDF