[Paper] BabyLM Turns 4: Call for Papers for the 2026 BabyLM Workshop

Published: 3 days ago (February 23, 2026 at 01:02 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.20092v1

Overview

The BabyLM 2026 Workshop invites the research community to explore how tiny, data‑efficient language models can bridge the gap between cognitive science and modern NLP. Building on three successful editions, the organizers are launching a fourth competition—now with a brand‑new multilingual track—while also opening the floor to papers on training efficiency, cognitively plausible modeling, and novel evaluation methods.

Key Contributions

Data‑efficient pre‑training challenge (General track): Participants must build high‑performing language models using only ~10 M words, mirroring the amount of linguistic input a child receives.
Multilingual track (first time): Extends the data‑efficient paradigm to multiple languages, encouraging cross‑lingual transfer and low‑resource language research.
Open call for workshop papers: Welcomes contributions on any topic that advances the “cognitive‑NLP” agenda, including:
- Training‑time and memory‑efficient algorithms
- Architectures inspired by human language acquisition
- Weak‑supervision and evaluation frameworks that better reflect human judgments
Community‑building resources: Provides a shared benchmark suite, baseline models, and a curated “baby” corpus to ensure reproducibility and fair comparison.

Methodology

The competition follows a “small‑data, big‑impact” philosophy:

Corpus construction: A carefully filtered subset of public text (≈10 M tokens) is released, representing child‑directed speech, early reading material, and web text.
Model constraints: Participants may use any architecture (e.g., Transformers, RNNs) but must train only on the provided corpus; external data is prohibited.
Evaluation: Models are assessed on a suite of downstream tasks (syntactic probing, semantic similarity, reading‑comprehension style questions) that are deliberately chosen to reflect cognitive benchmarks used in child language research.
Multilingual extension: For the new track, the same token budget is split across several languages, and participants must demonstrate cross‑lingual transfer or language‑agnostic representations.

The workshop also encourages open‑source submissions and detailed methodological write‑ups, fostering transparency and rapid iteration.

Results & Findings

As this is a call for participation, concrete results are not yet available. However, the organizers anticipate that:

Baseline models (e.g., a 6‑layer Transformer trained on the baby corpus) will achieve surprisingly strong performance on linguistic probing tasks, highlighting how much structure can be learned from limited data.
Multilingual submissions are expected to reveal transfer patterns—for instance, whether a model trained on English and Spanish can bootstrap competence in a low‑resource language like Swahili.
The evaluation suite will surface gaps between current NLP metrics and human‑like language understanding, guiding future research directions.

Practical Implications

Efficient model deployment: Techniques honed on the BabyLM challenge can be directly applied to edge devices, where memory and compute are scarce.
Low‑resource language support: The multilingual track’s focus on data‑efficiency offers a blueprint for building usable models in languages lacking massive corpora.
Cognitively informed product design: Insights into how models acquire syntax and semantics from limited exposure can inspire more human‑like conversational agents, educational tools, and assistive technologies.
Benchmark for “green AI”: The shared dataset and evaluation framework provide a low‑carbon alternative to training gigantic models, aligning with sustainability goals.

Limitations & Future Work

Token budget realism: While 10 M tokens approximate early childhood exposure, real human language acquisition also relies on multimodal cues (vision, interaction) that are absent from the text‑only setup.
Evaluation scope: Current downstream tasks focus on linguistic competence; broader pragmatic or discourse abilities remain under‑explored.
Scalability of findings: Results obtained on tiny models may not directly transfer to large‑scale systems, so future work should investigate how lessons scale up.
Multilingual balance: Ensuring equitable representation across languages with differing script complexities and morphological richness will be an ongoing challenge.

The BabyLM 2026 Workshop thus sets the stage for a new wave of data‑efficient, cognitively grounded NLP research, inviting developers and scholars alike to push the boundaries of what small models can achieve.

Authors

Leshem Choshen
Ryan Cotterell
Mustafa Omer Gul
Jaap Jumelet
Tal Linzen
Aaron Mueller
Suchir Salhan
Raj Sanjay Shah
Alex Warstadt
Ethan Gotlieb Wilcox

Paper Information

arXiv ID: 2602.20092v1
Categories: cs.CL
Published: February 23, 2026
PDF: Download PDF

[Paper] BabyLM Turns 4: Call for Papers for the 2026 BabyLM Workshop

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets

[Paper] SumTablets: A Transliteration Dataset of Sumerian Tablets

[Paper] Improving Parametric Knowledge Access in Reasoning Language Models

[Paper] GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL