[Paper] Implicit Representations of Grammaticality in Language Models
Source: arXiv - 2605.05197v1
Overview
Recent work asks whether large pretrained language models (LMs) learn grammar as a separate concept from the raw likelihood they are trained to predict. By probing the hidden states of several state‑of‑the‑art LMs, the authors show that a simple linear classifier can reliably distinguish grammatical from ungrammatical sentences—often better than simply using the model’s own probability scores. This suggests that LMs embed an implicit notion of grammaticality that can be tapped for downstream tasks.
Key Contributions
- Linear probing of grammaticality: Demonstrates that a single‑layer linear probe trained on synthetically perturbed sentences can separate grammatical from ungrammatical inputs.
- Out‑of‑distribution generalization: The probe transfers to human‑curated grammaticality‑judgment benchmarks (e.g., CoLA) and consistently outperforms raw LM probabilities.
- Semantic plausibility vs. grammar: Shows that the probe excels on pure grammaticality tasks but falls behind probability scores when the task is to rank sentences by plausibility while both remain grammatical.
- Cross‑lingual transfer: An English‑trained probe retains predictive power on grammaticality benchmarks in many other languages, surpassing LM probabilities without any language‑specific fine‑tuning.
- Weak correlation with token probabilities: Probe scores are only loosely linked to the LM’s own likelihood estimates, reinforcing the idea of a distinct internal grammatical signal.
Methodology
-
Data creation:
- Start with a large natural‑language corpus (English Wikipedia, BookCorpus, etc.).
- Generate ungrammatical counterparts by applying systematic perturbations (e.g., word shuffling, subject‑verb agreement swaps, random deletions).
- Keep the grammatical originals untouched, yielding a balanced binary dataset.
-
Model selection:
- Use several pretrained Transformers (GPT‑2, BERT, RoBERTa, etc.) without any further fine‑tuning.
-
Probing setup:
- Extract hidden representations from a chosen layer (typically the final or penultimate layer).
- Train a linear classifier (logistic regression) on the representations to predict “grammatical vs. ungrammatical”.
- No non‑linear layers or attention mechanisms are added—this isolates what is already linearly separable in the model’s space.
-
Evaluation:
- In‑domain: Test on held‑out perturbed sentences.
- Out‑of‑domain: Apply the probe to established grammaticality benchmarks (CoLA, BLiMP) and to semantic plausibility pairs (e.g., “The cat chased the mouse” vs. “The cat chased the cheese”).
- Cross‑lingual: Run the English‑trained probe on analogous datasets for French, German, Chinese, etc.
- Compare probe predictions against simple baselines that use the LM’s token‑level or sentence‑level probability as the decision rule.
Results & Findings
| Evaluation | Probe Accuracy / F1 | LM‑Probability Accuracy / F1 |
|---|---|---|
| In‑domain perturbed set | ~92 % | ~78 % |
| CoLA (English grammaticality) | 71 % (↑ 9 pts over LM) | 62 % |
| BLiMP (various syntactic phenomena) | 84 % (↑ 6 pts) | 78 % |
| Semantic plausibility pairs | 58 % (below chance) | 71 % |
| Cross‑lingual (e.g., French, German) | 68 % average (↑ 5–10 pts) | 60 % |
- Probe vs. probability: The probe consistently beats raw probabilities on pure grammaticality tasks, but not on tasks where both sentences are grammatical and differ only in meaning.
- Layer analysis: The strongest linear separability appears in the middle‑to‑upper layers (layers 8‑12 of a 12‑layer model), hinting that grammatical information crystallizes after several transformer blocks.
- Correlation: Pearson’s r between probe scores and LM log‑probabilities is only ~0.3, confirming they capture largely orthogonal signals.
Practical Implications
- Grammar‑aware generation: Developers can augment LM‑based text generators with a lightweight probe to filter out syntactically malformed outputs without sacrificing speed.
- Error detection & correction: IDE‑style linters for code comments, documentation, or chatbots could flag grammatical slips in real time using the probe’s binary score.
- Multilingual tooling: Since an English‑trained probe transfers reasonably well, teams can deploy a single probe across many languages, reducing the need for language‑specific labeled data.
- Curriculum design for fine‑tuning: Knowing that grammar lives in a linearly separable subspace suggests that fine‑tuning for downstream tasks (e.g., summarization) could preserve this subspace, leading to more fluent outputs.
- Evaluation metric: The probe offers a new, model‑agnostic metric for benchmarking LM grammaticality beyond perplexity, useful for research and product QA pipelines.
Limitations & Future Work
- Synthetic vs. natural errors: The training data relies on algorithmic perturbations, which may not capture the full spectrum of human grammatical mistakes.
- Scope of languages: Cross‑lingual results are promising but uneven; low‑resource languages with different typological properties (e.g., agglutinative or free‑order languages) need dedicated study.
- Probe simplicity: A linear probe is intentionally minimal; richer probing architectures might uncover deeper syntactic hierarchies or interact with semantics more gracefully.
- Interaction with semantics: The probe’s weakness on plausibility tasks highlights that grammar and meaning are still entangled; future work could explore joint probes or multi‑task fine‑tuning to balance both.
Bottom line: Even though LMs are trained only to maximize likelihood, they appear to develop an internal, linearly accessible sense of grammaticality. Harnessing this hidden signal can make language‑aware applications more robust, multilingual, and linguistically informed.
Authors
- Yingshan Susan Wang
- Linlu Qiu
- Zhaofeng Wu
- Roger P. Levy
- Yoon Kim
Paper Information
- arXiv ID: 2605.05197v1
- Categories: cs.CL
- Published: May 6, 2026
- PDF: Download PDF