[Paper] Causally Evaluating the Learnability of Formal Language Tasks

Published: (June 8, 2026 at 01:58 PM EDT)
2 min read
Source: arXiv

Source: arXiv - 2606.09822v1

Overview

Language models, as multi-task learners, acquire a wide range of abilities during training. A fundamental question is how much task-specific data is needed to learn a given task. Answering this for natural language is difficult: tasks are hard to delineate and can confound one another. To rigorously investigate the relationship between data frequency and learnability, we turn to a controlled setting using formal languages induced from probabilistic finite automata. These serve as a methodological testbed to demonstrate that standard correlational evaluation practices are inherently flawed. To enable causal analysis, we introduce the binning semiring, an algebraic object that lets us control how often a targeted property occurs in a sampled corpus. We formulate the experimental pipeline as a causal graphical model and derive decomposed Kullback-Leibler divergence metrics to measure the learnability of specific sub-tasks. Our experiments show that evaluating learnability without causal intervention leads to incorrect conclusions due to confounders in correlational analysis, and serve as a warning about correlational pitfalls in natural-language settings.

Key Contributions

This paper presents research in the following areas:

  • cs.CL
  • cs.FL

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.CL.

Authors

  • Vésteinn Snæbjarnarson
  • Anej Svete
  • Josef Valvoda
  • Reda Boumasmoud
  • Brian DuSell
  • Ryan Cotterell

Paper Information

  • arXiv ID: 2606.09822v1
  • Categories: cs.CL, cs.FL
  • Published: June 8, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »